Plotting Chord Diagrams in Python | by Wei-Meng Lee | Feb, 2023
How to use Holoviews to plot chord diagrams to show relationships between various data attributes
So far when we talk about data visualisation, a few usual types of charts come to mind — bar charts, pie charts, histograms, etc. However, there is another type that is very interesting but seldom discussed — chord diagram. So what is a chord diagram?
A chord diagram represents the flows or connections between several entities (known as nodes). Using a chord diagram, you can easily visualize the connections or relationships between various data points in your dataset. Consider the flights delay dataset. It contains detailed information of flights from one airport to another. If you want to visualize the relationships between the various airports, a chord diagram (see the figure at the start of this article) is an excellent way to present this information.
In this article, I will show you how you can plot a chord diagram using a third-party library known as HoloViews.
All images by author unless otherwise stated.
HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. HoloViews is dependent on two other Python libraries — pyviz and bokeh. So the best way to install HoloViews is to use the following command:
conda install -c pyviz holoviews bokeh
As usual, my favourite dataset to use for illustrating concepts is the 2015 Flights Delay dataset. You can download the dataset from: https://www.kaggle.com/datasets/usdot/flight-delays.
Licensing: CC0: Public Domain
In this article, all code samples will be run using Jupyter Notebook.
First, load the flights.csv file using Pandas:
import pandas as pddf = pd.read_csv('flights.csv')
df
Here’s the dataframe:
For this article, we are going to need two specific columns:
- ORIGIN_AIRPORT
- DESTINATION_AIRPORT
We are interested to see the relationships between the origin and destination airports.
The next thing to do would be to find the unique combinations of origin and destination airports. You could do this using the groupby()
function, followed by the count()
function:
df_between_airports = df.groupby(by=["ORIGIN_AIRPORT", "DESTINATION_AIRPORT"]).count()
df_between_airports
The output is a multi-index dataframe:
Apparently, you only need one of the non-index columns. So let’s extract the YEAR column and rename it as COUNT, and then reset the index:
df_between_airports = df_between_airports['YEAR'].rename('COUNT').reset_index()
df_between_airports
So the output now contains the count of flights from one airport to another:
Notice that some of the airport codes are 5 digit numbers (e.g. 10135, 10397, etc). These are actually FAA’s Airport ID, and are used as replacement values for IATA codes (such as XNA, SFO, SLC, etc). Ideally, we should replace all these 5-digit Airport IDs with the actual IATA codes, but to keep things simple in this article, we shall remove them:
df_between_airports = df_between_airports.query(
'ORIGIN_AIRPORT.str.len() <= 3 & DESTINATION_AIRPORT.str.len() <= 3')
df_between_airports
And now you have a much clearer idea of what we are trying to achieve. For example, there are a total of 898 flights from ABE to ATL, and there are a total of 711 flights from ABE to DTW, and so on:
Observe that there are altogether 4693 combinations, and it would be very messy to generate a chord diagram with so many combinations. And so, let’s sort them in descending order:
df_between_airports = df_between_airports.sort_values(by="COUNT",
ascending=False)
df_between_airports
And let’s extract the top 40 combinations:
top = 40
df_between_airports.head(top)['ORIGIN_AIRPORT'].unique()
With the top 40 combinations, there are a total of 18 originating airports:
array(['SFO', 'LAX', 'JFK', 'LAS', 'LGA', 'ORD', 'OGG', 'HNL', 'ATL',
'MCO', 'DFW', 'SEA', 'BOS', 'DCA', 'FLL', 'PHX', 'DEN', 'TPA'],
dtype=object)
We are now ready to display the chord diagram. First, import holoviews
and specify bokeh
as the extension:
import holoviews as hv
hv.extension('bokeh')
HoloViews uses the %%opts
cell magic to modify how the cell is executed to display its output. You use the Chord
class to display a chord diagram:
%%opts Chord [height=500 width=500 title="Flights between airports" ]
chord = hv.Chord(df_between_airports.head(top))
chord
The following output shows the relationships between the top 40 flights combinations:
Each circle (known as a node) on the chord diagram represents an airport. To see the relationships between airports, hover your mouse over a circle:
The above diagram shows the flights originating from the DFW (Dallas/Fort Worth International Airport). Apparently, it is not possible to see what the destination airports are. So, let’s get the list of origin and destination airports and then use it to create a hv.Dataset
object:
# get the top count of flights between airports
df_between_airports = df_between_airports.head(top)# find all the unique origin and destination airports
airports = list(set(df_between_airports["ORIGIN_AIRPORT"].unique().tolist() +
df_between_airports["DESTINATION_AIRPORT"].unique().tolist()))
airports_dataset = hv.Dataset(pd.DataFrame(airports, columns=["Airport"]))
To display the names of the airports at each node, set the labels
attribute in the %%opts
cell magic and pass the airports_dataset
variable into the Chord
class initializer:
%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
chord = hv.Chord((df_between_airports, airports_dataset))
chord
Take note that the
df_between_airports
andairports_dataset
variables are wrapped as a tuple.
The chord diagram now has the airport code displayed on each node:
When you now hover over DFW, you can now see clearly that flights from DFW flys to ORD (O’Hare International Airport) and ATL (Hartsfield-Jackson Atlanta International Airport):
If you click on the DFW node, the rest of the flight paths are greyed out:
The chord diagram supports Bokeh Palettes. You can see the list of palette colors at https://docs.bokeh.org/en/latest/docs/reference/palettes.html.
The Bokeh Palettes provides a collection of palettes for color mapping.
Here are two commonly used Bokeh palettes:
Let’s now apply colors to our chord diagram by adding another %%opts
cell magic statement:
%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Category20')
chord = hv.Chord((df_between_airports, airports_dataset))
chord
The node_cmap
indicates the palette to apply to the nodes while the edge_color
indicates the palette to apply for the edge of the chord diagram:
Here is another example with different palettes applied:
%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Bokeh')
chord = hv.Chord((df_between_airports, airports_dataset))
chord
For the final section in this article, I just want to ensure that the chord diagram is displaying the correct information. Selecting JFK shows that flights are flying to LAX and SFO:
To confirm this, use the following statement:
df_between_airports.query('ORIGIN_AIRPORT == "JFK"')
The output confirms the answer in the chord diagram:
Selecting ORD in the chord diagram shows that flights are flying to:
The following statement confirms what we have seen:
df_between_airports.query('ORIGIN_AIRPORT == "ORD"')
If you like reading my articles and that it helped your career/study, please consider signing up as a Medium member. It is $5 a month, and it gives you unlimited access to all the articles (including mine) on Medium. If you sign up using the following link, I will earn a small commission (at no additional cost to you). Your support means that I will be able to devote more time on writing articles like this.
Now that you have seen how to create chord diagrams using HoloViews, when do you use it? You should use chord diagram in the following scenarios:
- When you want to use a simple representation to show the interconnections between large datasets.
- When you want to create eye-catching visual representation that is aesthetically pleasing.
- When you need to find and compare interrelationships between groups of data.
Have fun with chord diagram and remember to let me know how you are using it in your real-life projects!
How to use Holoviews to plot chord diagrams to show relationships between various data attributes
So far when we talk about data visualisation, a few usual types of charts come to mind — bar charts, pie charts, histograms, etc. However, there is another type that is very interesting but seldom discussed — chord diagram. So what is a chord diagram?
A chord diagram represents the flows or connections between several entities (known as nodes). Using a chord diagram, you can easily visualize the connections or relationships between various data points in your dataset. Consider the flights delay dataset. It contains detailed information of flights from one airport to another. If you want to visualize the relationships between the various airports, a chord diagram (see the figure at the start of this article) is an excellent way to present this information.
In this article, I will show you how you can plot a chord diagram using a third-party library known as HoloViews.
All images by author unless otherwise stated.
HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. HoloViews is dependent on two other Python libraries — pyviz and bokeh. So the best way to install HoloViews is to use the following command:
conda install -c pyviz holoviews bokeh
As usual, my favourite dataset to use for illustrating concepts is the 2015 Flights Delay dataset. You can download the dataset from: https://www.kaggle.com/datasets/usdot/flight-delays.
Licensing: CC0: Public Domain
In this article, all code samples will be run using Jupyter Notebook.
First, load the flights.csv file using Pandas:
import pandas as pddf = pd.read_csv('flights.csv')
df
Here’s the dataframe:
For this article, we are going to need two specific columns:
- ORIGIN_AIRPORT
- DESTINATION_AIRPORT
We are interested to see the relationships between the origin and destination airports.
The next thing to do would be to find the unique combinations of origin and destination airports. You could do this using the groupby()
function, followed by the count()
function:
df_between_airports = df.groupby(by=["ORIGIN_AIRPORT", "DESTINATION_AIRPORT"]).count()
df_between_airports
The output is a multi-index dataframe:
Apparently, you only need one of the non-index columns. So let’s extract the YEAR column and rename it as COUNT, and then reset the index:
df_between_airports = df_between_airports['YEAR'].rename('COUNT').reset_index()
df_between_airports
So the output now contains the count of flights from one airport to another:
Notice that some of the airport codes are 5 digit numbers (e.g. 10135, 10397, etc). These are actually FAA’s Airport ID, and are used as replacement values for IATA codes (such as XNA, SFO, SLC, etc). Ideally, we should replace all these 5-digit Airport IDs with the actual IATA codes, but to keep things simple in this article, we shall remove them:
df_between_airports = df_between_airports.query(
'ORIGIN_AIRPORT.str.len() <= 3 & DESTINATION_AIRPORT.str.len() <= 3')
df_between_airports
And now you have a much clearer idea of what we are trying to achieve. For example, there are a total of 898 flights from ABE to ATL, and there are a total of 711 flights from ABE to DTW, and so on:
Observe that there are altogether 4693 combinations, and it would be very messy to generate a chord diagram with so many combinations. And so, let’s sort them in descending order:
df_between_airports = df_between_airports.sort_values(by="COUNT",
ascending=False)
df_between_airports
And let’s extract the top 40 combinations:
top = 40
df_between_airports.head(top)['ORIGIN_AIRPORT'].unique()
With the top 40 combinations, there are a total of 18 originating airports:
array(['SFO', 'LAX', 'JFK', 'LAS', 'LGA', 'ORD', 'OGG', 'HNL', 'ATL',
'MCO', 'DFW', 'SEA', 'BOS', 'DCA', 'FLL', 'PHX', 'DEN', 'TPA'],
dtype=object)
We are now ready to display the chord diagram. First, import holoviews
and specify bokeh
as the extension:
import holoviews as hv
hv.extension('bokeh')
HoloViews uses the %%opts
cell magic to modify how the cell is executed to display its output. You use the Chord
class to display a chord diagram:
%%opts Chord [height=500 width=500 title="Flights between airports" ]
chord = hv.Chord(df_between_airports.head(top))
chord
The following output shows the relationships between the top 40 flights combinations:
Each circle (known as a node) on the chord diagram represents an airport. To see the relationships between airports, hover your mouse over a circle:
The above diagram shows the flights originating from the DFW (Dallas/Fort Worth International Airport). Apparently, it is not possible to see what the destination airports are. So, let’s get the list of origin and destination airports and then use it to create a hv.Dataset
object:
# get the top count of flights between airports
df_between_airports = df_between_airports.head(top)# find all the unique origin and destination airports
airports = list(set(df_between_airports["ORIGIN_AIRPORT"].unique().tolist() +
df_between_airports["DESTINATION_AIRPORT"].unique().tolist()))
airports_dataset = hv.Dataset(pd.DataFrame(airports, columns=["Airport"]))
To display the names of the airports at each node, set the labels
attribute in the %%opts
cell magic and pass the airports_dataset
variable into the Chord
class initializer:
%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
chord = hv.Chord((df_between_airports, airports_dataset))
chord
Take note that the
df_between_airports
andairports_dataset
variables are wrapped as a tuple.
The chord diagram now has the airport code displayed on each node:
When you now hover over DFW, you can now see clearly that flights from DFW flys to ORD (O’Hare International Airport) and ATL (Hartsfield-Jackson Atlanta International Airport):
If you click on the DFW node, the rest of the flight paths are greyed out:
The chord diagram supports Bokeh Palettes. You can see the list of palette colors at https://docs.bokeh.org/en/latest/docs/reference/palettes.html.
The Bokeh Palettes provides a collection of palettes for color mapping.
Here are two commonly used Bokeh palettes:
Let’s now apply colors to our chord diagram by adding another %%opts
cell magic statement:
%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Category20')
chord = hv.Chord((df_between_airports, airports_dataset))
chord
The node_cmap
indicates the palette to apply to the nodes while the edge_color
indicates the palette to apply for the edge of the chord diagram:
Here is another example with different palettes applied:
%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Bokeh')
chord = hv.Chord((df_between_airports, airports_dataset))
chord
For the final section in this article, I just want to ensure that the chord diagram is displaying the correct information. Selecting JFK shows that flights are flying to LAX and SFO:
To confirm this, use the following statement:
df_between_airports.query('ORIGIN_AIRPORT == "JFK"')
The output confirms the answer in the chord diagram:
Selecting ORD in the chord diagram shows that flights are flying to:
The following statement confirms what we have seen:
df_between_airports.query('ORIGIN_AIRPORT == "ORD"')
If you like reading my articles and that it helped your career/study, please consider signing up as a Medium member. It is $5 a month, and it gives you unlimited access to all the articles (including mine) on Medium. If you sign up using the following link, I will earn a small commission (at no additional cost to you). Your support means that I will be able to devote more time on writing articles like this.
Now that you have seen how to create chord diagrams using HoloViews, when do you use it? You should use chord diagram in the following scenarios:
- When you want to use a simple representation to show the interconnections between large datasets.
- When you want to create eye-catching visual representation that is aesthetically pleasing.
- When you need to find and compare interrelationships between groups of data.
Have fun with chord diagram and remember to let me know how you are using it in your real-life projects!