Techno Blender
Digitally Yours.

12 Essential Visualizations and How to Implement Them – Part 1 | by Alan Jones | Oct, 2022

0 48

We look at how to create the 12 most useful graphs and charts with Python, Matplotlib and Streamlit

Photo by Tima Miroshnichenko

“When I look back over the 150+ visuals that I created for workshops and consulting projects in the past year, there were only a dozen different types of visuals that I used”, Cole Nussbaumer Knaflic in Storytelling with Data

Many people will have read the book, Storytelling with Data by Cole Nussbaumer Knaflic (see note 1), who, according to the book’s foreword, has “worked at and with some of the most data-driven organizations on the planet”, has taught data visualization at Google over several years and now has created her own teaching company.

The book is dedicated to describing how to effectively communicate using charts and graphs, and provides a wealth of information about many aspects of communicating with graphics.

But one of the first things you learn in the book is that the author relies on only 12 different types of visualization. The book describes these visuals and their use but does not go into implementation, so that’s what we will do here.

The aim of this article is to begin to describe the 12 visuals and show how they can be implemented in Python. All the code and data used in this article are available to download from my Github page. (The downloadable code may also include additional examples not included in the article.)

This article will look at the first 6 visuals: Simple Text, Tables, Heatmaps, Scatter Plots, Line Plots and Slopegraphs.

The six visuals — Image by author

I will deal with the remaining charts in a subsequent article. These will be, Vertical and Horizontal Bar charts, Vertical and Horizontal Stacked Bar charts, Waterfall charts and Square Area charts.

Sometimes, as Cole Nussbaumer Knaflic (CNK, from now on) tells us, a graphic isn’t necessary, or even the best option to communicate data. When only a couple of values are to be presented, simple text is fine and may even be better than a graph. Let’s take an example.

The weather in London, UK, seems to be getting hotter in the Summer. The maximum temperature in July 2022 was 27.2 degrees Celsius, which is quite hot for the UK. In 2012 it was 24.2 degrees.

We are going to design a visual that communicates this increase that consists of text only and we’ll see how well a number of different designs work.

First, let’s set up some variables that represent the maximum temperatures in London for those two years and a couple of captions. Then we’ll display them in a number of different formats.

# Set up variables
years = ['2012','2022']
temps = [24.2,27.2]
caption = f"The maximum temperature in July 2022 was {temps[1]}°C"
caption2 = f"That's {temps[1]-temps[0]}° up from 2012"

Now, look at the bar graph, below — it shows a temperature change from 2012 to 2022 using the date we have just set up. But while it’s clear that the temperature went up a few degrees, you can’t quite see exactly how much or precisely what those temperatures are.

Image by author

A bar graph is not ideal for presenting this sort of data, so, let’s see how some text-only visuals can give us a better idea of what is going on.

Streamlit gives us an attractive method of displaying two values and the change between them — st.metric(). This gives us an attractive and effective way of showing the same data and is coded very simply, like this:

col3.metric("Temperature", temps[1],temps[0])

If we combine this with some explanatory text and use a column layout, we can achieve a visual that tells us exactly what is going on without needing any sort of chart.

col3, col4 = st.columns([1,4])
col3.metric("Temperature", temps[1],temps[0])
col4.markdown(f"#### {caption}")
Image by author

This visual provides the same data as the bar chart but actually communicates it better than the chart.

Using markdown you can achieve something quite similar, like this:

col1, col2 = st.columns([1,4])
col1.markdown(f"# {temps[1]}")
col2.markdown(f"#### {caption}")
Image by author

These two methods are specific to Streamlit. An alternative, more generic, Python method is positioning text in a Matplotlib chart. The code below does just this.

You can see that we create a Mathplotlib chart but with no figure plotted in it — we simply position text in the right places and turn off the axes, ticks etc. with the statement ax2.axis('off').

fig2, ax2 = plt.subplots(figsize=(5,1))ax2.text(0, 0.9, temps[1],
verticalalignment='top', horizontalalignment='left',
color='red', fontsize=18, fontweight = 'bold')
ax2.text(0.2, 0.9, caption,
verticalalignment='top', horizontalalignment='left',
color='Black', fontsize=10)
ax2.text(0.2, 0.55, caption2,
verticalalignment='top', horizontalalignment='left',
color='darkgrey', fontsize=6)

This gives us the figure, below.

Image by author

This is a bit bolder and eye-catching than the other two methods but, of course, we could make a subtler small figure if we wished by changing the font size, colour and positioning of the text.

CNK tells us that a table is a suitable visual for showing data to a varied audience each of whom may be interested in a particular row. She also advises that we should let the data be the centre of attention and so shouldn’t make the table borders too heavy but rather use light borders or white space to separate the data items.

Streamlit gives us two methods for displaying tables, st.table() and st.dataframe().

Using the same data as in the previous example, here is the code for displaying the data as a table.

import streamlit as st
import pandas as pd
temps = pd.DataFrame()
temps['Year'] = ('2012', '2022')
temps['Temperature'] = (24.2,27.2)

Which looks like this:

Image by author

If the table is too wide for your liking then it is a simple matter to enclose it in a column and adjust that column to the width that you want.

A dataframe is very similar:

Image by author

Clicking on a dataframe column header will order the dataframe by that column.

Again, these are Streamlit-specific methods. It is possible to display a table with Mathplotlib but this is really designed to be an addition to a chart. I’ve played around with various forms of table in Matplotlib but have not been very satisfied with any of the results. So, I’m not sure that this provides a suitable solution for standalone Python programs.

However, if you are not using Streamlit then, as a data scientist, the chances you are using Jupyter Notebooks and they provide a very neat rendering of a dataframe — you simply write the name of the dataframe in a Jupyter cell, for example:

import pandas as pdtemps = pd.DataFrame()
temps['Year'] = ('2012', '2022')
temps['Temperature'] = (24.2,27.2)

and it is rendered like this:

Image by author

A heatmap is a figure that uses colour rather than numbers to highlight value differences in a table.

We are going to look at a very suitable use for a heatmap — one where we want to show increases in heat! Well, temperature, really.

The figure below shows the relative global temperatures over the last 150 years or so and is taken from my article Topical Plots: Global Warming Heatmaps. (The data I use here is included in the downloadable code and is freely usable — see note 2).

A heatmap is great for demonstrating the trend of global warming over the last decades. You can easily see the colours getting lighter, meaning rising temperatures, as the decades progress. (The figures are not absolute temperatures, of course, but relative to a period between 1961 and 1990.)

Image by author

One of the easiest ways of creating a heatmap is with the Seaborn library. You can see from the code below that Seaborn simply takes a Pandas dataframe as a parameter and displays the appropriate map as a matplotlib chart.

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
gwdec = pd.read_csv(url)
import seaborn as sns
fig, ax = plt.subplots()

You can achieve a similar chart using the imshow() function in matplotlib (see the downloadable code for an example).

Scatterplots are used to show the relationship between two variables. The example below uses a data file that records weather data for each month in London in 2018 (see note 4) it plots the level of rainfall in millimetres, against the number of hours of sunshine.

Of course, when it is raining the sun is not normally shining, so you would expect to see fewer hours of sunshine when it rains more. The scatter diagram clearly shows this relationship.

Image by author

The code below uses the Matplotlib scatter plot to create the chart.

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
weather = pd.read_csv('data/london2018.csv')fig, ax = plt.subplots()
weather.plot.scatter(x='Rain', y='Sun', ax=ax)

Line graphs are used for continuous data and, often, for time series data, too.

Using the same weather data as above, we are going to look at three different line graphs.

First, we’ll plot the mean temperature over the year, then we’ll see how we can place multiple plots in the same figure by plotting mean, maximum and minimum temperatures. Finally, in the third chart, we will see how we can show a range of values by plotting the mean temperature with shading around it to indicate the range of maximum and minimum around that mean — you can use the same technique to show confidence levels.

First, we read the data. It contains various monthly readings for temperature, hours of sunshine and rainfall.

weather =  pd.read_csv('data/london2018.csv')
weather['Tmean'] = (weather['Tmax'] + weather['Tmin'])/2

It looks like this:

Image by author

It didn’t have the mean column originally — we created it on the second line of code, above.

Now, the simple line plot.

fig, ax = plt.subplots()
ax = weather.plot.line(x='Month', y = 'Tmean', ax=ax)

A very straightforward line plot of the mean temperature using Pandas and Matplotlib.

Image by author

And creating mulitple plots in the same figure is just a matter of creating new axes for those plots.

ax = weather.plot.line(x='Month', y = 'Tmax', color = 'lightgrey', ax=ax)
ax = weather.plot.line(x='Month', y = 'Tmin', color = 'lightgrey', ax=ax)

The code above adds two more axes for the minimum and maximum temperatures — which I’ve coloured differently to the mean plot — and then re-plots the figure.

Image by author

You can see that this gives us a range but that we can make it more visually appealing and better convey the idea of a range better, by shading the area between the max and min lines.

We do this using the Matplotlib function fill_between() as follows:

weather['Tmin'], color='lightgrey', alpha=0.5)
ax.set_ylabel('Temperature Range °C')

The fill colour is set to lightgrey so it blends in the upper and lower plots. I’ve also hidden the legend and given the y-axis a label to show what we are attempting to represent.

Image by author

As you can see this would also be a very suitable representation of confidence levels. You could, for example, create the upper and lower plots using a fixed percentage of the original one. So, for example, the upper plot line could take the value of the centre value plus 5%, and the lower one the centre value minus 5%.

A Slopegraph is simply a line graph that conforms a particular style but that only compares two sets of values.

According to CNK, “slopegraphs can be useful when you have two time periods or points of comparison and want to quickly show relative increases and decreases”.

Unfortunately, the slopegraph is not often found in standard visualization libraries. You could simply use a line graph instead and it should convey the same meaning. We are going to do that but also create a more typical slopegraph by combing line graphs and scatter graphs and adding appropriately positioned text.

Carrying on with our weather theme, I’m going to create a couple of graphs that display the change in temperature that we saw in the text figure, above, but this time we compare London to Wick, in Scotland:

Image by author

This data represents the maximum temperature in two cities in two separate years. In the following code, we draw two plots in the same figure. The first is a simple line graph of the data, then we superimpose a scatter chart with only four points to give us the archetypal blobs at the end of the slopegraph lines.

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
st.header("Slope graph")st.subheader("Here is the data")
df = pd.DataFrame()
st.table(df)st.subheader("A Slopegraph as a line graph")
fig, ax = plt.subplots()
ax = df.plot(x='year', color = ('red', 'blue'), ax=ax)
ax = df.plot.scatter(x='year',y='London', color= 'red', ax=ax)
df.plot.scatter(x='year',y='Wick', color = 'blue', ax=ax)

What I’ve done here is draw the lines and then superimpose blobs on the ends of the lines using a scatter plot. I’ve also set the ticks to display only the two years we are interested in and adjusted the x-axis limit to give some spacing on either side of the plots. Each of these adjustments makes the line graph a little more like a slope graph.

It looks reasonably ok but is not the typical form of a slope chart, certainly not the way they are represented in the CNK’s book.

Image by author

To make a more conventional looking slopegraph in the style of CNK, we need to do a bit of manipulation with Matplotlib.

Here’s the type of rendering that CNK has in her book:

Image by author

It’s different to a conventional line graph, in that the y values, and the legend text, are written at the ends of the lines and the conventional axes are removed.

Running this code will display the graph above.

ax.text(df.year[0] -5, df.London[0], df.columns[1])
ax.text(df.year[0] -2.5,df.London[0], f'{df.London[0]}°C')
ax.text(df.year[1] +1, df.London[1],f'{df.London[1]}°C')
ax.text(df.year[0] -5, df.Wick[0], df.columns[2])
ax.text(df.year[0] -2.5, df.Wick[0],f'{df.Wick[0]}°C')
ax.text(df.year[1] +1, df.Wick[1],f'{df.Wick[1]}°C')
ax.xaxis.grid(visible=True, color = 'black')

The first six lines add the text and values to the end of the lines, next we remove the splines (the frame of the chart) and then we add an x-axis grid which gives us the vertical lines. Finally, we hide the legend.

Is that too much effort for a not much different result? I’ll let you decide.

We look at how to create the 12 most useful graphs and charts with Python, Matplotlib and Streamlit

Photo by Tima Miroshnichenko

“When I look back over the 150+ visuals that I created for workshops and consulting projects in the past year, there were only a dozen different types of visuals that I used”, Cole Nussbaumer Knaflic in Storytelling with Data

Many people will have read the book, Storytelling with Data by Cole Nussbaumer Knaflic (see note 1), who, according to the book’s foreword, has “worked at and with some of the most data-driven organizations on the planet”, has taught data visualization at Google over several years and now has created her own teaching company.

The book is dedicated to describing how to effectively communicate using charts and graphs, and provides a wealth of information about many aspects of communicating with graphics.

But one of the first things you learn in the book is that the author relies on only 12 different types of visualization. The book describes these visuals and their use but does not go into implementation, so that’s what we will do here.

The aim of this article is to begin to describe the 12 visuals and show how they can be implemented in Python. All the code and data used in this article are available to download from my Github page. (The downloadable code may also include additional examples not included in the article.)

This article will look at the first 6 visuals: Simple Text, Tables, Heatmaps, Scatter Plots, Line Plots and Slopegraphs.

The six visuals — Image by author

I will deal with the remaining charts in a subsequent article. These will be, Vertical and Horizontal Bar charts, Vertical and Horizontal Stacked Bar charts, Waterfall charts and Square Area charts.

Sometimes, as Cole Nussbaumer Knaflic (CNK, from now on) tells us, a graphic isn’t necessary, or even the best option to communicate data. When only a couple of values are to be presented, simple text is fine and may even be better than a graph. Let’s take an example.

The weather in London, UK, seems to be getting hotter in the Summer. The maximum temperature in July 2022 was 27.2 degrees Celsius, which is quite hot for the UK. In 2012 it was 24.2 degrees.

We are going to design a visual that communicates this increase that consists of text only and we’ll see how well a number of different designs work.

First, let’s set up some variables that represent the maximum temperatures in London for those two years and a couple of captions. Then we’ll display them in a number of different formats.

# Set up variables
years = ['2012','2022']
temps = [24.2,27.2]
caption = f"The maximum temperature in July 2022 was {temps[1]}°C"
caption2 = f"That's {temps[1]-temps[0]}° up from 2012"

Now, look at the bar graph, below — it shows a temperature change from 2012 to 2022 using the date we have just set up. But while it’s clear that the temperature went up a few degrees, you can’t quite see exactly how much or precisely what those temperatures are.

Image by author

A bar graph is not ideal for presenting this sort of data, so, let’s see how some text-only visuals can give us a better idea of what is going on.

Streamlit gives us an attractive method of displaying two values and the change between them — st.metric(). This gives us an attractive and effective way of showing the same data and is coded very simply, like this:

col3.metric("Temperature", temps[1],temps[0])

If we combine this with some explanatory text and use a column layout, we can achieve a visual that tells us exactly what is going on without needing any sort of chart.

col3, col4 = st.columns([1,4])
col3.metric("Temperature", temps[1],temps[0])
col4.markdown(f"#### {caption}")
Image by author

This visual provides the same data as the bar chart but actually communicates it better than the chart.

Using markdown you can achieve something quite similar, like this:

col1, col2 = st.columns([1,4])
col1.markdown(f"# {temps[1]}")
col2.markdown(f"#### {caption}")
Image by author

These two methods are specific to Streamlit. An alternative, more generic, Python method is positioning text in a Matplotlib chart. The code below does just this.

You can see that we create a Mathplotlib chart but with no figure plotted in it — we simply position text in the right places and turn off the axes, ticks etc. with the statement ax2.axis('off').

fig2, ax2 = plt.subplots(figsize=(5,1))ax2.text(0, 0.9, temps[1],
verticalalignment='top', horizontalalignment='left',
color='red', fontsize=18, fontweight = 'bold')
ax2.text(0.2, 0.9, caption,
verticalalignment='top', horizontalalignment='left',
color='Black', fontsize=10)
ax2.text(0.2, 0.55, caption2,
verticalalignment='top', horizontalalignment='left',
color='darkgrey', fontsize=6)

This gives us the figure, below.

Image by author

This is a bit bolder and eye-catching than the other two methods but, of course, we could make a subtler small figure if we wished by changing the font size, colour and positioning of the text.

CNK tells us that a table is a suitable visual for showing data to a varied audience each of whom may be interested in a particular row. She also advises that we should let the data be the centre of attention and so shouldn’t make the table borders too heavy but rather use light borders or white space to separate the data items.

Streamlit gives us two methods for displaying tables, st.table() and st.dataframe().

Using the same data as in the previous example, here is the code for displaying the data as a table.

import streamlit as st
import pandas as pd
temps = pd.DataFrame()
temps['Year'] = ('2012', '2022')
temps['Temperature'] = (24.2,27.2)

Which looks like this:

Image by author

If the table is too wide for your liking then it is a simple matter to enclose it in a column and adjust that column to the width that you want.

A dataframe is very similar:

Image by author

Clicking on a dataframe column header will order the dataframe by that column.

Again, these are Streamlit-specific methods. It is possible to display a table with Mathplotlib but this is really designed to be an addition to a chart. I’ve played around with various forms of table in Matplotlib but have not been very satisfied with any of the results. So, I’m not sure that this provides a suitable solution for standalone Python programs.

However, if you are not using Streamlit then, as a data scientist, the chances you are using Jupyter Notebooks and they provide a very neat rendering of a dataframe — you simply write the name of the dataframe in a Jupyter cell, for example:

import pandas as pdtemps = pd.DataFrame()
temps['Year'] = ('2012', '2022')
temps['Temperature'] = (24.2,27.2)

and it is rendered like this:

Image by author

A heatmap is a figure that uses colour rather than numbers to highlight value differences in a table.

We are going to look at a very suitable use for a heatmap — one where we want to show increases in heat! Well, temperature, really.

The figure below shows the relative global temperatures over the last 150 years or so and is taken from my article Topical Plots: Global Warming Heatmaps. (The data I use here is included in the downloadable code and is freely usable — see note 2).

A heatmap is great for demonstrating the trend of global warming over the last decades. You can easily see the colours getting lighter, meaning rising temperatures, as the decades progress. (The figures are not absolute temperatures, of course, but relative to a period between 1961 and 1990.)

Image by author

One of the easiest ways of creating a heatmap is with the Seaborn library. You can see from the code below that Seaborn simply takes a Pandas dataframe as a parameter and displays the appropriate map as a matplotlib chart.

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
gwdec = pd.read_csv(url)
import seaborn as sns
fig, ax = plt.subplots()

You can achieve a similar chart using the imshow() function in matplotlib (see the downloadable code for an example).

Scatterplots are used to show the relationship between two variables. The example below uses a data file that records weather data for each month in London in 2018 (see note 4) it plots the level of rainfall in millimetres, against the number of hours of sunshine.

Of course, when it is raining the sun is not normally shining, so you would expect to see fewer hours of sunshine when it rains more. The scatter diagram clearly shows this relationship.

Image by author

The code below uses the Matplotlib scatter plot to create the chart.

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
weather = pd.read_csv('data/london2018.csv')fig, ax = plt.subplots()
weather.plot.scatter(x='Rain', y='Sun', ax=ax)

Line graphs are used for continuous data and, often, for time series data, too.

Using the same weather data as above, we are going to look at three different line graphs.

First, we’ll plot the mean temperature over the year, then we’ll see how we can place multiple plots in the same figure by plotting mean, maximum and minimum temperatures. Finally, in the third chart, we will see how we can show a range of values by plotting the mean temperature with shading around it to indicate the range of maximum and minimum around that mean — you can use the same technique to show confidence levels.

First, we read the data. It contains various monthly readings for temperature, hours of sunshine and rainfall.

weather =  pd.read_csv('data/london2018.csv')
weather['Tmean'] = (weather['Tmax'] + weather['Tmin'])/2

It looks like this:

Image by author

It didn’t have the mean column originally — we created it on the second line of code, above.

Now, the simple line plot.

fig, ax = plt.subplots()
ax = weather.plot.line(x='Month', y = 'Tmean', ax=ax)

A very straightforward line plot of the mean temperature using Pandas and Matplotlib.

Image by author

And creating mulitple plots in the same figure is just a matter of creating new axes for those plots.

ax = weather.plot.line(x='Month', y = 'Tmax', color = 'lightgrey', ax=ax)
ax = weather.plot.line(x='Month', y = 'Tmin', color = 'lightgrey', ax=ax)

The code above adds two more axes for the minimum and maximum temperatures — which I’ve coloured differently to the mean plot — and then re-plots the figure.

Image by author

You can see that this gives us a range but that we can make it more visually appealing and better convey the idea of a range better, by shading the area between the max and min lines.

We do this using the Matplotlib function fill_between() as follows:

weather['Tmin'], color='lightgrey', alpha=0.5)
ax.set_ylabel('Temperature Range °C')

The fill colour is set to lightgrey so it blends in the upper and lower plots. I’ve also hidden the legend and given the y-axis a label to show what we are attempting to represent.

Image by author

As you can see this would also be a very suitable representation of confidence levels. You could, for example, create the upper and lower plots using a fixed percentage of the original one. So, for example, the upper plot line could take the value of the centre value plus 5%, and the lower one the centre value minus 5%.

A Slopegraph is simply a line graph that conforms a particular style but that only compares two sets of values.

According to CNK, “slopegraphs can be useful when you have two time periods or points of comparison and want to quickly show relative increases and decreases”.

Unfortunately, the slopegraph is not often found in standard visualization libraries. You could simply use a line graph instead and it should convey the same meaning. We are going to do that but also create a more typical slopegraph by combing line graphs and scatter graphs and adding appropriately positioned text.

Carrying on with our weather theme, I’m going to create a couple of graphs that display the change in temperature that we saw in the text figure, above, but this time we compare London to Wick, in Scotland:

Image by author

This data represents the maximum temperature in two cities in two separate years. In the following code, we draw two plots in the same figure. The first is a simple line graph of the data, then we superimpose a scatter chart with only four points to give us the archetypal blobs at the end of the slopegraph lines.

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
st.header("Slope graph")st.subheader("Here is the data")
df = pd.DataFrame()
st.table(df)st.subheader("A Slopegraph as a line graph")
fig, ax = plt.subplots()
ax = df.plot(x='year', color = ('red', 'blue'), ax=ax)
ax = df.plot.scatter(x='year',y='London', color= 'red', ax=ax)
df.plot.scatter(x='year',y='Wick', color = 'blue', ax=ax)

What I’ve done here is draw the lines and then superimpose blobs on the ends of the lines using a scatter plot. I’ve also set the ticks to display only the two years we are interested in and adjusted the x-axis limit to give some spacing on either side of the plots. Each of these adjustments makes the line graph a little more like a slope graph.

It looks reasonably ok but is not the typical form of a slope chart, certainly not the way they are represented in the CNK’s book.

Image by author

To make a more conventional looking slopegraph in the style of CNK, we need to do a bit of manipulation with Matplotlib.

Here’s the type of rendering that CNK has in her book:

Image by author

It’s different to a conventional line graph, in that the y values, and the legend text, are written at the ends of the lines and the conventional axes are removed.

Running this code will display the graph above.

ax.text(df.year[0] -5, df.London[0], df.columns[1])
ax.text(df.year[0] -2.5,df.London[0], f'{df.London[0]}°C')
ax.text(df.year[1] +1, df.London[1],f'{df.London[1]}°C')
ax.text(df.year[0] -5, df.Wick[0], df.columns[2])
ax.text(df.year[0] -2.5, df.Wick[0],f'{df.Wick[0]}°C')
ax.text(df.year[1] +1, df.Wick[1],f'{df.Wick[1]}°C')
ax.xaxis.grid(visible=True, color = 'black')

The first six lines add the text and values to the end of the lines, next we remove the splines (the frame of the chart) and then we add an x-axis grid which gives us the vertical lines. Finally, we hide the legend.

Is that too much effort for a not much different result? I’ll let you decide.


Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment