Techno Blender
Digitally Yours.

Unimpressed With Your Scatter and Bar Plots? Give These Four Classic Alternatives A Try. | by Avi Chawla | Jan, 2023

0 35


Photo by Jim Wilson on Unsplash

If you have ever visualized your data (which I am sure you have), the first plot type that possibly came to your mind was either a scatter, bar, or line plot.

To recall quickly, these are shown below:

Scatter, Bar, and Line plot illustration (Image by Author)

While these plots do cover a wide variety of visualization use cases, I have seen many data scientists using them excessively in every possible place.

Although they are simple and easy to interpret, they are not the right choice to cover every possible use case.

Therefore, in this blog, I will demonstrate a few alternatives to these popular plots. Moreover, I will also explain how these can be more beneficial to use.

Let’s begin 🚀!

Alternative to scatter plot.

Scatter plots are extremely useful for visualizing two sets of numerical variables.

But when you have, say, thousands of data points, scatter plots can get too dense to interpret. This is shown below:

Scatter plot on dummy data (Code and Image by Author)

Hexbins can be a good choice in such cases. As the name suggests, they bin the area of a chart into hexagonal regions.

Moreover, each region is assigned a color intensity based on the method of aggregation used (the number of points, for instance).

Hexbin plot on dummy data (Code and Image by Author)

When to use them?

Hexbins are especially useful for understanding the spread of data. It is often considered an elegant alternative to a scatter plot.

Moreover, binning makes it easier to identify data clusters and depict patterns.

Another alternative to scatter plot.

As we noticed above, when the number of data points is large, interpreting a scatter plot to determine its distribution is immensely difficult.

Similar to a hexbin plot which depicts the density of points, a 2D density plot illustrates the distribution of a set of points in a two-dimensional space.

2D Density plot on dummy data (Code and Image by Author)

A contour is created by connecting points of equal density. In other words, a single contour line depicts an equal density of data points.

When to use them?

As mentioned above, if a scatter plot is hard to interpret, a 2D density plot can be your way to proceed.

They can be especially useful when you want to identify patterns and outliers in the data. Scatter plots, on the other hand, are mainly used to depict the relationship between two numeric variables.

Alternative to bar and line plot.

Bar plots are extremely useful for visualizing categorical variables against a continuous value.

But when you have many categories to depict, they can get too dense to interpret.

Moreover, in a bar plot with many bars, we’re often not paying attention to the individual bar lengths. Instead, we mostly consider the individual endpoints of each bar that denote the total value.

Consider the following data:

Here, we have a dummy population for two countries (Country A and Country B) from the year 1995–2010.

Let’s create a bar plot:

Bar plot on dummy data (Code and Image by Author)

The individual bars take up plenty of space, which makes the graph cluttered.

A dot plot can be a better choice in such cases. They are like scatter plots but with one categorical and one continuous axis.

Dot plot on dummy data (Code and Image by Author)

When to use them?

Compared to a bar plot, they are less cluttered and offer better comprehension.

This is especially true in cases where we have many categories and/or multiple categorical columns to depict in a plot.

Alternative to bar and line plot.

If you want to visualize the variation/progress/change in a value over some period, a line (or bar) plot may not always be an apt choice.

Both the line plot and the bar plot depict the actual values in the chart. Thus, sometimes, it can get difficult to visually estimate the scale of incremental changes.

Consider the following data:

Here, we have dummy month-wise data.

We can create a line plot as follows:

Line plot on dummy data (Code and Image by Author)

And a bat plot as follows:

Bar plot on dummy data (Code and Image by Author)

Although these do depict the data as needed, it is difficult to visually estimate the scale of rolling changes.

To address this, you can use a waterfall chart.

To create one, you can use the waterfallcharts library in Python.

Next, we should find the rolling difference and represent it in a new column. The final data should look as follows:

The Delta value for the first month is the same as the start value.

Waterfall chart on dummy data (Code and Image by Author)

Much better, isn’t it?

Here, the start and final values are represented by the first and last bars. Also, the marginal changes are automatically color-coded, making them easier to interpret.

When to use them?

A waterfall chart is extremely useful to depict the incremental contributions of individual steps to a total value, and how these contributions changed over time.


Photo by Jim Wilson on Unsplash

If you have ever visualized your data (which I am sure you have), the first plot type that possibly came to your mind was either a scatter, bar, or line plot.

To recall quickly, these are shown below:

Scatter, Bar, and Line plot illustration (Image by Author)

While these plots do cover a wide variety of visualization use cases, I have seen many data scientists using them excessively in every possible place.

Although they are simple and easy to interpret, they are not the right choice to cover every possible use case.

Therefore, in this blog, I will demonstrate a few alternatives to these popular plots. Moreover, I will also explain how these can be more beneficial to use.

Let’s begin 🚀!

Alternative to scatter plot.

Scatter plots are extremely useful for visualizing two sets of numerical variables.

But when you have, say, thousands of data points, scatter plots can get too dense to interpret. This is shown below:

Scatter plot on dummy data (Code and Image by Author)

Hexbins can be a good choice in such cases. As the name suggests, they bin the area of a chart into hexagonal regions.

Moreover, each region is assigned a color intensity based on the method of aggregation used (the number of points, for instance).

Hexbin plot on dummy data (Code and Image by Author)

When to use them?

Hexbins are especially useful for understanding the spread of data. It is often considered an elegant alternative to a scatter plot.

Moreover, binning makes it easier to identify data clusters and depict patterns.

Another alternative to scatter plot.

As we noticed above, when the number of data points is large, interpreting a scatter plot to determine its distribution is immensely difficult.

Similar to a hexbin plot which depicts the density of points, a 2D density plot illustrates the distribution of a set of points in a two-dimensional space.

2D Density plot on dummy data (Code and Image by Author)

A contour is created by connecting points of equal density. In other words, a single contour line depicts an equal density of data points.

When to use them?

As mentioned above, if a scatter plot is hard to interpret, a 2D density plot can be your way to proceed.

They can be especially useful when you want to identify patterns and outliers in the data. Scatter plots, on the other hand, are mainly used to depict the relationship between two numeric variables.

Alternative to bar and line plot.

Bar plots are extremely useful for visualizing categorical variables against a continuous value.

But when you have many categories to depict, they can get too dense to interpret.

Moreover, in a bar plot with many bars, we’re often not paying attention to the individual bar lengths. Instead, we mostly consider the individual endpoints of each bar that denote the total value.

Consider the following data:

Here, we have a dummy population for two countries (Country A and Country B) from the year 1995–2010.

Let’s create a bar plot:

Bar plot on dummy data (Code and Image by Author)

The individual bars take up plenty of space, which makes the graph cluttered.

A dot plot can be a better choice in such cases. They are like scatter plots but with one categorical and one continuous axis.

Dot plot on dummy data (Code and Image by Author)

When to use them?

Compared to a bar plot, they are less cluttered and offer better comprehension.

This is especially true in cases where we have many categories and/or multiple categorical columns to depict in a plot.

Alternative to bar and line plot.

If you want to visualize the variation/progress/change in a value over some period, a line (or bar) plot may not always be an apt choice.

Both the line plot and the bar plot depict the actual values in the chart. Thus, sometimes, it can get difficult to visually estimate the scale of incremental changes.

Consider the following data:

Here, we have dummy month-wise data.

We can create a line plot as follows:

Line plot on dummy data (Code and Image by Author)

And a bat plot as follows:

Bar plot on dummy data (Code and Image by Author)

Although these do depict the data as needed, it is difficult to visually estimate the scale of rolling changes.

To address this, you can use a waterfall chart.

To create one, you can use the waterfallcharts library in Python.

Next, we should find the rolling difference and represent it in a new column. The final data should look as follows:

The Delta value for the first month is the same as the start value.

Waterfall chart on dummy data (Code and Image by Author)

Much better, isn’t it?

Here, the start and final values are represented by the first and last bars. Also, the marginal changes are automatically color-coded, making them easier to interpret.

When to use them?

A waterfall chart is extremely useful to depict the incremental contributions of individual steps to a total value, and how these contributions changed over time.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment