Create presentation-ready line and bar charts in matplotlib

By Jessie Hobb On Jan 31, 2023

Simple formatting tricks to make matplotplib charts presentation-ready

Charts and graphs are far and away the best way to communicate a message to an audience. There’s really no two ways about it — pictures are better than words. Depending who you ask, one is worth roughly a thousand of the other.

That doesn’t mean that creating a good visualisation is easy! Creating a powerful — and good-looking — exhibit that gets a message across is difficult. It’s even more challenging if the exhibit is being debuted during a live presentation: not only is the audience listening intently to what’s being said (hopefully), but they’re trying to understand the message behind the chart while simultaneously forming questions.

I could probably show you tons of my own charts which prove just how difficult it is to make a good one but I’ll save you the trauma. Instead, in this article we’ll see how to:

Create “base” line and bar charts.
Change titles and signage to improve both the visual impact of the chart and its ability to deliver a message.
Remove clutter to improve chart readability.
Change the appearance of the chart to really drive home a message.

We’ll be borrowing a few tricks from our last outing where we looked at how formatting pandas DataFrames can improve message delivery and storytelling. I’d (obviously) recommend having a read of that if you’re interested in presenting some slick tables along with great charts:

Comprehensive guide to formatting pandas DataFrames | Towards Data Science

Let’s get cracking — first checking out how we can make beautiful line plots before moving onto taking a swing at a bar chart. In both cases, we’re going to embrace our inner New Year’s resolutioner and use (imaginary) data relating to exercise and training.

Aside: the tips here relate to matplotlib , my go-to package for plotting in Python. That’s not to say the same tips and tricks can’t be done or don’t hold in other packages like seaborn, but you might need some adjustments in your approach.

We’ll start with line plots — simple visualisations which are great when trying to communicate trends or patterns over time.

The data

I’ve conjured up some data which captures the proportion of respondents of an imaginary survey which asked men aged between 18–60 which form of exercise they preferred to undertake. Respondents had one of 4 options to choose from: running, cycling, swimming, and a mixed regime.

Aside: by “conjured up some data”, I really mean “created sample data in Excel”.

Let’s get set up and take a look at the data:

# functionality
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as mtick# data import
file_location = r'C:\Users\...\Charts'
file_name = r'data.csv'
df = pd.read_csv(os.path.join(file_location,file_name))
# convert date column to datetime
df['Date'] = pd.to_datetime(df['Date'],format = '%d/%m/%Y')
# set date column to index
df.set_index('Date',inplace = True)
df

Quite a simple data set. Let’s start somewhere and create our “base” chart.

The base line plot

Nothing too swanky here — just a bit of matplotlib :

# plot size and configuration
fig,ax = plt.subplots(figsize = (20,7.5))# lines
for activity in df.columns:
plt.plot(df.index,df[activity],marker = '^',label = f'{activity}')
# format y-axis
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1,decimals = 0))
# add title and axis labels
plt.title('Exercise types (survey respondents, men aged 18 - 60)')
plt.ylabel('Proportion')
plt.xlabel('Survey date')
# misc - grid and legend
plt.grid(axis = 'both',alpha = 0.45)
plt.legend(loc = 'best',ncol = 1)
# results
plt.show()

… which gives:

Now, my inner nerd might think that this visualisation is perfectly fine to present. He would probably say something like:

The title, axis labels and legend entries combine to tell the reader that this chart relates to the proportion of survey respondents (men, aged 18–60) who engage in a certain type of exercise.
The data is being plotted across time, so there is probably some trend or pattern that is present.
In fact, when we look through the chart, we see that running and cycling are becoming less favourable over time while a mixed regime is becoming more popular in this cohort.

Oddly enough, this is exactly the message that we’re trying to get across! We need to find a way to get this across without the audience having to do so much heavy lifting.

Usually the best way to do this, is to just tell the audience what you’re trying to say. Let’s do that by making the title more useful, relying on a subtitle only if it adds value to the visualisation.

Using descriptive titles

We’ll add descriptive titles to the plot using the text command¹.

We’ll capture the message as crisply as we can:

The headline (i.e. title) is that mixed regime training is becoming more popular over time
The by-line (i.e. subtitle) is that the gain in mixed regime popularity comes from decreasing popularity of running and cycling.

Code-wise, this looks something like (formatted for ease of reading):

# plot
fig,ax = plt.subplots(figsize = (20,7.5))# informative title + subtitle
title = 'Mixed training is gaining in popularity over time'
subtitle = 'Men aged 18-60 are ditching running and 
cycling in favour of a mixed training regime'
# add title + subtitle to plot
plt.text(
x = 0.125,y = 0.90,s = title,fontname = 'Arial',
fontsize = 20,ha='left',transform = fig.transFigure
)
plt.text(
x = 0.125,y = 0.86,s = subtitle,fontname = 'Arial',
fontsize = 16,ha = 'left',transform = fig.transFigure
)
# lines
for activity in df.columns:
plt.plot(df.index,df[activity],marker = '^',label = f'{activity}')
# format y-axis
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1,decimals = 0))
# axis labels
plt.ylabel('Proportion')
plt.xlabel('Survey date')
# misc - grid and legend
plt.grid(axis = 'both',alpha = 0.45)
plt.legend(loc = 'best',ncol = 1)
# fiddle with space above chart
plt.subplots_adjust(top=0.8, wspace=0.3)
plt.show()

… which gives:

I’d argue that that is already a great change for the better, as the message is right up in the face of the reader. The chart itself still looks quite busy, so we’ll address this next.

Aside: if you’re extremely detail-oriented, you’ll probably feel like the gap between the titles and the chart itself might be a tad big. As it stands, you’re probably right, but keep reading — I’ve got something planned for that later on.

Removing the clutter

Cluttered charts are difficult to read. In fact, anything that takes attention away from the chart’s message makes the chart more difficult to read.

“Clutter” in this context could really mean anything — axis labels, markers, badly placed titles, or even grid lines. We’ll take a look at all of these elements, but we’ll start with the biggest offender — the legend.

A chart’s legend can be quite controversial. On one hand, traditional thinking is that a good chart contains a good legend: one that can easily allow the reader to distinguish between different quantities being charted.

On the other hand, if the reader is looking from the chart to the legend (and back again, probably many times), the legend is quite obviously a distraction.

We’re going to settle on a middle ground by changing the legend from being an almost stand-alone chart element, to being integrated into the visualisation itself. We’ll do this by annotating³ each individual line.

So we can get something like this:

Not too shabby — we’ve gotten rid of the legend without actually losing any of its usefulness.

Annotating the lines is quite straightforward, once you’ve got the hang of it:

# annotate
plt.annotate(
text = 'Running',
xy = (pd.to_datetime('01-01-2022'),df['Run'][-1]),
textcoords = 'offset points',
xytext = (5,-4),fontname = 'Arial',fontsize = 13,color = 'tab:blue'
)

I still think that the grid lines and the plot borders (or “spines”) are a distraction. Getting rid of those is simple:

# grid lines
# keep only toned down vertical lines
plt.grid(axis = 'y',alpha = 0.3)# turn off spines
plt.gca().spines[['left','right', 'top']].set_visible(False)

We’re really starting to get there now! Looking at it, I think we need more separation between our titles and the chart itself. So we’ll add in a separating line.

We also need to add in some description for the y-axis. We’ve been perhaps too ruthless in cutting out “clutter”, so we’ll add something back in there. It’s also good practice to cite the source of information, so we’ll add in a footnote to do just that.

Code-wise, this was pretty simple:

# line between titles and chart
plt.gca().plot(
[0.1, .9], # x co-ords
[.87, .87], # y co-ords
transform = fig.transFigure,
clip_on = False,
color = 'k',
linewidth = 1.5
)# axis description
description = 'Proportion of survey respondents (%)'
plt.text(
x = 0.1,
y = 0.8,
s = description,
fontname = 'Arial',
fontsize = 14,
ha='left',
transform = fig.transFigure
)
# foot note
footnote = "Source: Brad's imagination, January 2023"
plt.text(
x = 0.1,
y = 0.05,
s = footnote,
fontname = 'Arial',
fontstyle = 'italic',
fontsize = 12,
ha = 'left',
transform = fig.transFigure
)

Now there’s one final touch we can make to really drive home the message — change the colours.

Even though the chart is really pretty, the coloured lines arguably detract from the overall message. So to hammer home the result that mixed training is gathering steam over time, we’re going to make the mixed regime line stand out by:

Making it really thick, bold, and red.
Making the other lines grey.

Hey presto! A great-looking chart, full of information, and easy to digest (if I do say so myself).

Bar and column charts are often used to compare different quantitative or qualitative quantities. They’re really useful when you’re doing a small number of comparisons, but in my opinion don’t really work well if you have many comparisons to make, or you’re trying to visualise trends over time.

Now, if you were to Google “bar charts v column charts” you’ll find tonnes of articles explaining and outlining the exact differences between the two visualisations.

I won’t do that because I honestly don’t know the difference. To a “practitioner” like myself, the semantics are a distraction. What is important, is knowing that in some circumstances it’s more useful to have a chart with horizontal bars than it is to have a chart with vertical bars. We’ll see a good example of this in a minute.

Data

Let’s create some data to visualise. Again, it’s going to be fitness-related, this time capturing the proportion of of gym-goers who visit the weight room by time of day. To make things simple, our “time of day” variable is going to be roughly grouped into 5 categories.

# data
df = pd.DataFrame(
{
'Time':['Early morning','Morning','Midday','Afternoon','Evening'],
'Athletes':[0.17,0.075,0.23,0.125,0.4]
}
)

The base bar chart

Again, we’ll create a “base” chart that we can improve. A simple bit of code gives us a fairly standard plot — nothing special, but it does the job.

# plot
fig,ax = plt.subplots(figsize = (20,7.5))# bars
plt.bar(df['Time'],df['Athletes'])
# format y-axis
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1,decimals = 0))
# grid line
plt.grid(axis = 'y',alpha = 0.45)
# labels and title
plt.title('Proportion of athletes training, by time of day')
plt.ylabel('Proportion')
plt.xlabel('Time of day')
plt.show()

Let’s go through the chart and change a few things.

It’s pretty clear at the moment that the evening session is the most popular time to head to the gym. However, it’s not immediately obvious when the next most popular (or least popular) times are. So we’ll change the ordering in which the bars are drawn.

While some people prefer to sort values in an increasing order (i.e. ascending = True ), I think that we should change the chart orientation and plot the most popular times from top to bottom.

Appearances matter, so we’ll remove chart clutter and do some styling. We’ll also add a bit of design flair to spice up the aesthetic.

Finally, we’ll reiterate the message we’re trying to convey by changing some colours.

Let’s get cracking.

Changing the orientation

… and a little ordering.

Nothing too special to highlight here. The DataFrame has been reordered using sort_values , and we use barh rather than bar to get the horizontal orientation.

That gives us:

That’s already better — see how much easier it is to see the busiest times, and do comparisons between each time slot.

Reading exact values off the x-axis is a bit challenging, so we’ll bear that in mind when we improve the clutter and styling.

Clutter and styling

Time for a bit of cosmetic surgery: axis labels, grid lines and spines are going to get binned, and we’re going to put our titles to work. We’ll also add bar labels so that we don’t lose any information when we ditch the x-axis.

Of course, we always cite and attribute our data so we’ll add in a footnote.

On top of all of that, we’ll make some aesthetic changes so the chart catches the eye. Since there’s no good reason for our bars to be so long, part of the styling will be some general resizing but the most powerful change will come from changes to emphasise the message.

We’ve seen how to do most of the changes before so the next bits will be light on code, showing only new concepts.

Ready for the new and improved chart? I am!

That’s a nice bit of progress. Note how using bar labels allows us to get rid of the x-axis entirely. That’s a small code change:

# plot
fig,ax = plt.subplots(figsize = (20,7.5))# add bars
bars = plt.barh(df['Time'],df['Athletes'],color = 'k')
# add labels
plt.bar_label(
bars,
labels = [f'{x:.0%}' for x in bars.datavalues],
padding = 10,
fontsize = 14
)

The red graphic is a combination of a rectangle and a line. It’s a bit fiddly to put together, but I think adds a bit of style to the chart — quite Economist-esque.

# add a little graphic flair
# rectangle first
plt.gca().add_patch(
plt.Rectangle(
(-0.05,.95), # location
0.0125, # width
-0.13, # height
facecolor = 'tab:red',
transform = fig.transFigure,
clip_on = False,
linewidth = 0
)
)# now the line
plt.gca().plot(
[-0.049, .95], # length of line
[.82, .82], # height
transform = fig.transFigure,
clip_on = False,
color = 'tab:red', 
linewidth = 3
)

Last, but very definitely not least, we’ll change the colour of the bars to hammer home the message that most people hit the iron in the evening.

This is perhaps the biggest chart change and it’s created by the smallest code change: rather than using a single string, we feed a list of colours into the color argument in barh .

Excellent!

In what’s turning into a bad habit of mine, I’ll simultaneously summarise and ramble on.

Using some example line and bar charts, we’ve seen how descriptive titles and visual tricks can improve the chart’s message delivery. We also saw how removing chart clutter can get the reader to focus in on the message we’re trying to convey.

Now, I love a colourful chart and I’m not ashamed to admit it (clearly). But I do have to admit that toning down the colour and using it tactically can really enhance your chart’s ability to get a message across to an audience.

Like a slide in a presentation, you need to really think about what a chart is trying to say, and then give it the best opportunity to do so. That might mean changing chart types, changing colour palettes, adding in descriptive titles or even removing some chart elements. As with great interior design, don’t be afraid to make bold decisions. If worse comes to worst, you can always rewrite the chart code!

If you’re short of design ideas, I recommend taking a look at publications like the Economist and Financial Times — they are usually excellent at getting a message across in a great-looking chart. I’ve picked up a lot of code tips and tricks reading through the matplotlib documentation and browsing through various StackOverflow threads.

My final — and possibly most important — tip is to practice and then review your charts. Even better, get someone else to review your charts and see if they “get” the message you’re trying to communicate. Sounds lame, I know, but it helps.

If you’ve made it this far, thank you. I hope you enjoyed reading this as much as I did writing it (charting has been oddly cathartic). I’m still learning — and practicing— how to make better charts, so any tips or tricks would be greatly appreciated!

Simple formatting tricks to make matplotplib charts presentation-ready

I could probably show you tons of my own charts which prove just how difficult it is to make a good one but I’ll save you the trauma. Instead, in this article we’ll see how to:

Create “base” line and bar charts.
Change titles and signage to improve both the visual impact of the chart and its ability to deliver a message.
Remove clutter to improve chart readability.
Change the appearance of the chart to really drive home a message.

Comprehensive guide to formatting pandas DataFrames | Towards Data Science

We’ll start with line plots — simple visualisations which are great when trying to communicate trends or patterns over time.

The data

Aside: by “conjured up some data”, I really mean “created sample data in Excel”.

Let’s get set up and take a look at the data:

# functionality
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as mtick# data import
file_location = r'C:\Users\...\Charts'
file_name = r'data.csv'
df = pd.read_csv(os.path.join(file_location,file_name))
# convert date column to datetime
df['Date'] = pd.to_datetime(df['Date'],format = '%d/%m/%Y')
# set date column to index
df.set_index('Date',inplace = True)
df

Quite a simple data set. Let’s start somewhere and create our “base” chart.

The base line plot

Nothing too swanky here — just a bit of matplotlib :

# plot size and configuration
fig,ax = plt.subplots(figsize = (20,7.5))# lines
for activity in df.columns:
plt.plot(df.index,df[activity],marker = '^',label = f'{activity}')
# format y-axis
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1,decimals = 0))
# add title and axis labels
plt.title('Exercise types (survey respondents, men aged 18 - 60)')
plt.ylabel('Proportion')
plt.xlabel('Survey date')
# misc - grid and legend
plt.grid(axis = 'both',alpha = 0.45)
plt.legend(loc = 'best',ncol = 1)
# results
plt.show()

… which gives:

Now, my inner nerd might think that this visualisation is perfectly fine to present. He would probably say something like:

The title, axis labels and legend entries combine to tell the reader that this chart relates to the proportion of survey respondents (men, aged 18–60) who engage in a certain type of exercise.
The data is being plotted across time, so there is probably some trend or pattern that is present.
In fact, when we look through the chart, we see that running and cycling are becoming less favourable over time while a mixed regime is becoming more popular in this cohort.

Oddly enough, this is exactly the message that we’re trying to get across! We need to find a way to get this across without the audience having to do so much heavy lifting.

Using descriptive titles

We’ll add descriptive titles to the plot using the text command¹.

We’ll capture the message as crisply as we can:

The headline (i.e. title) is that mixed regime training is becoming more popular over time
The by-line (i.e. subtitle) is that the gain in mixed regime popularity comes from decreasing popularity of running and cycling.

Code-wise, this looks something like (formatted for ease of reading):

# plot
fig,ax = plt.subplots(figsize = (20,7.5))# informative title + subtitle
title = 'Mixed training is gaining in popularity over time'
subtitle = 'Men aged 18-60 are ditching running and 
cycling in favour of a mixed training regime'
# add title + subtitle to plot
plt.text(
x = 0.125,y = 0.90,s = title,fontname = 'Arial',
fontsize = 20,ha='left',transform = fig.transFigure
)
plt.text(
x = 0.125,y = 0.86,s = subtitle,fontname = 'Arial',
fontsize = 16,ha = 'left',transform = fig.transFigure
)
# lines
for activity in df.columns:
plt.plot(df.index,df[activity],marker = '^',label = f'{activity}')
# format y-axis
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1,decimals = 0))
# axis labels
plt.ylabel('Proportion')
plt.xlabel('Survey date')
# misc - grid and legend
plt.grid(axis = 'both',alpha = 0.45)
plt.legend(loc = 'best',ncol = 1)
# fiddle with space above chart
plt.subplots_adjust(top=0.8, wspace=0.3)
plt.show()

… which gives:

I’d argue that that is already a great change for the better, as the message is right up in the face of the reader. The chart itself still looks quite busy, so we’ll address this next.

Removing the clutter

Cluttered charts are difficult to read. In fact, anything that takes attention away from the chart’s message makes the chart more difficult to read.

On the other hand, if the reader is looking from the chart to the legend (and back again, probably many times), the legend is quite obviously a distraction.

So we can get something like this:

Not too shabby — we’ve gotten rid of the legend without actually losing any of its usefulness.

Annotating the lines is quite straightforward, once you’ve got the hang of it:

# annotate
plt.annotate(
text = 'Running',
xy = (pd.to_datetime('01-01-2022'),df['Run'][-1]),
textcoords = 'offset points',
xytext = (5,-4),fontname = 'Arial',fontsize = 13,color = 'tab:blue'
)

I still think that the grid lines and the plot borders (or “spines”) are a distraction. Getting rid of those is simple:

# grid lines
# keep only toned down vertical lines
plt.grid(axis = 'y',alpha = 0.3)# turn off spines
plt.gca().spines[['left','right', 'top']].set_visible(False)

We’re really starting to get there now! Looking at it, I think we need more separation between our titles and the chart itself. So we’ll add in a separating line.

Code-wise, this was pretty simple:

# line between titles and chart
plt.gca().plot(
[0.1, .9], # x co-ords
[.87, .87], # y co-ords
transform = fig.transFigure,
clip_on = False,
color = 'k',
linewidth = 1.5
)# axis description
description = 'Proportion of survey respondents (%)'
plt.text(
x = 0.1,
y = 0.8,
s = description,
fontname = 'Arial',
fontsize = 14,
ha='left',
transform = fig.transFigure
)
# foot note
footnote = "Source: Brad's imagination, January 2023"
plt.text(
x = 0.1,
y = 0.05,
s = footnote,
fontname = 'Arial',
fontstyle = 'italic',
fontsize = 12,
ha = 'left',
transform = fig.transFigure
)

Now there’s one final touch we can make to really drive home the message — change the colours.

Making it really thick, bold, and red.
Making the other lines grey.

Hey presto! A great-looking chart, full of information, and easy to digest (if I do say so myself).

Now, if you were to Google “bar charts v column charts” you’ll find tonnes of articles explaining and outlining the exact differences between the two visualisations.

Data

# data
df = pd.DataFrame(
{
'Time':['Early morning','Morning','Midday','Afternoon','Evening'],
'Athletes':[0.17,0.075,0.23,0.125,0.4]
}
)

The base bar chart

Again, we’ll create a “base” chart that we can improve. A simple bit of code gives us a fairly standard plot — nothing special, but it does the job.

# plot
fig,ax = plt.subplots(figsize = (20,7.5))# bars
plt.bar(df['Time'],df['Athletes'])
# format y-axis
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1,decimals = 0))
# grid line
plt.grid(axis = 'y',alpha = 0.45)
# labels and title
plt.title('Proportion of athletes training, by time of day')
plt.ylabel('Proportion')
plt.xlabel('Time of day')
plt.show()

Let’s go through the chart and change a few things.

While some people prefer to sort values in an increasing order (i.e. ascending = True ), I think that we should change the chart orientation and plot the most popular times from top to bottom.

Appearances matter, so we’ll remove chart clutter and do some styling. We’ll also add a bit of design flair to spice up the aesthetic.

Finally, we’ll reiterate the message we’re trying to convey by changing some colours.

Let’s get cracking.

Changing the orientation

… and a little ordering.

Nothing too special to highlight here. The DataFrame has been reordered using sort_values , and we use barh rather than bar to get the horizontal orientation.

That gives us:

That’s already better — see how much easier it is to see the busiest times, and do comparisons between each time slot.

Reading exact values off the x-axis is a bit challenging, so we’ll bear that in mind when we improve the clutter and styling.

Clutter and styling

Of course, we always cite and attribute our data so we’ll add in a footnote.

We’ve seen how to do most of the changes before so the next bits will be light on code, showing only new concepts.

Ready for the new and improved chart? I am!

That’s a nice bit of progress. Note how using bar labels allows us to get rid of the x-axis entirely. That’s a small code change:

# plot
fig,ax = plt.subplots(figsize = (20,7.5))# add bars
bars = plt.barh(df['Time'],df['Athletes'],color = 'k')
# add labels
plt.bar_label(
bars,
labels = [f'{x:.0%}' for x in bars.datavalues],
padding = 10,
fontsize = 14
)

The red graphic is a combination of a rectangle and a line. It’s a bit fiddly to put together, but I think adds a bit of style to the chart — quite Economist-esque.

# add a little graphic flair
# rectangle first
plt.gca().add_patch(
plt.Rectangle(
(-0.05,.95), # location
0.0125, # width
-0.13, # height
facecolor = 'tab:red',
transform = fig.transFigure,
clip_on = False,
linewidth = 0
)
)# now the line
plt.gca().plot(
[-0.049, .95], # length of line
[.82, .82], # height
transform = fig.transFigure,
clip_on = False,
color = 'tab:red', 
linewidth = 3
)

Last, but very definitely not least, we’ll change the colour of the bars to hammer home the message that most people hit the iron in the evening.

This is perhaps the biggest chart change and it’s created by the smallest code change: rather than using a single string, we feed a list of colours into the color argument in barh .

Excellent!

In what’s turning into a bad habit of mine, I’ll simultaneously summarise and ramble on.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.