Techno Blender
Digitally Yours.

Plotting Chord Diagrams in Python | by Wei-Meng Lee | Feb, 2023

0 47


Image by author

So far when we talk about data visualisation, a few usual types of charts come to mind — bar charts, pie charts, histograms, etc. However, there is another type that is very interesting but seldom discussed — chord diagram. So what is a chord diagram?

A chord diagram represents the flows or connections between several entities (known as nodes). Using a chord diagram, you can easily visualize the connections or relationships between various data points in your dataset. Consider the flights delay dataset. It contains detailed information of flights from one airport to another. If you want to visualize the relationships between the various airports, a chord diagram (see the figure at the start of this article) is an excellent way to present this information.

In this article, I will show you how you can plot a chord diagram using a third-party library known as HoloViews.

All images by author unless otherwise stated.

HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. HoloViews is dependent on two other Python libraries — pyviz and bokeh. So the best way to install HoloViews is to use the following command:

conda install -c pyviz holoviews bokeh

As usual, my favourite dataset to use for illustrating concepts is the 2015 Flights Delay dataset. You can download the dataset from: https://www.kaggle.com/datasets/usdot/flight-delays.

Licensing: CC0: Public Domain

In this article, all code samples will be run using Jupyter Notebook.

First, load the flights.csv file using Pandas:

import pandas as pd

df = pd.read_csv('flights.csv')
df

Here’s the dataframe:

For this article, we are going to need two specific columns:

  • ORIGIN_AIRPORT
  • DESTINATION_AIRPORT

We are interested to see the relationships between the origin and destination airports.

The next thing to do would be to find the unique combinations of origin and destination airports. You could do this using the groupby() function, followed by the count() function:

df_between_airports = df.groupby(by=["ORIGIN_AIRPORT", "DESTINATION_AIRPORT"]).count()
df_between_airports

The output is a multi-index dataframe:

Apparently, you only need one of the non-index columns. So let’s extract the YEAR column and rename it as COUNT, and then reset the index:

df_between_airports = df_between_airports['YEAR'].rename('COUNT').reset_index() 
df_between_airports

So the output now contains the count of flights from one airport to another:

Notice that some of the airport codes are 5 digit numbers (e.g. 10135, 10397, etc). These are actually FAA’s Airport ID, and are used as replacement values for IATA codes (such as XNA, SFO, SLC, etc). Ideally, we should replace all these 5-digit Airport IDs with the actual IATA codes, but to keep things simple in this article, we shall remove them:

df_between_airports = df_between_airports.query(
'ORIGIN_AIRPORT.str.len() <= 3 & DESTINATION_AIRPORT.str.len() <= 3')
df_between_airports

And now you have a much clearer idea of what we are trying to achieve. For example, there are a total of 898 flights from ABE to ATL, and there are a total of 711 flights from ABE to DTW, and so on:

Observe that there are altogether 4693 combinations, and it would be very messy to generate a chord diagram with so many combinations. And so, let’s sort them in descending order:

df_between_airports = df_between_airports.sort_values(by="COUNT", 
ascending=False)
df_between_airports

And let’s extract the top 40 combinations:

top = 40
df_between_airports.head(top)['ORIGIN_AIRPORT'].unique()

With the top 40 combinations, there are a total of 18 originating airports:

array(['SFO', 'LAX', 'JFK', 'LAS', 'LGA', 'ORD', 'OGG', 'HNL', 'ATL',
'MCO', 'DFW', 'SEA', 'BOS', 'DCA', 'FLL', 'PHX', 'DEN', 'TPA'],
dtype=object)

We are now ready to display the chord diagram. First, import holoviews and specify bokeh as the extension:

import holoviews as hv
hv.extension('bokeh')

HoloViews uses the %%opts cell magic to modify how the cell is executed to display its output. You use the Chord class to display a chord diagram:

%%opts Chord [height=500 width=500 title="Flights between airports" ]
chord = hv.Chord(df_between_airports.head(top))
chord

The following output shows the relationships between the top 40 flights combinations:

Each circle (known as a node) on the chord diagram represents an airport. To see the relationships between airports, hover your mouse over a circle:

The above diagram shows the flights originating from the DFW (Dallas/Fort Worth International Airport). Apparently, it is not possible to see what the destination airports are. So, let’s get the list of origin and destination airports and then use it to create a hv.Dataset object:

# get the top count of flights between airports
df_between_airports = df_between_airports.head(top)

# find all the unique origin and destination airports
airports = list(set(df_between_airports["ORIGIN_AIRPORT"].unique().tolist() +
df_between_airports["DESTINATION_AIRPORT"].unique().tolist()))

airports_dataset = hv.Dataset(pd.DataFrame(airports, columns=["Airport"]))

To display the names of the airports at each node, set the labels attribute in the %%opts cell magic and pass the airports_dataset variable into the Chord class initializer:

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
chord = hv.Chord((df_between_airports, airports_dataset))
chord

Take note that the df_between_airports and airports_dataset variables are wrapped as a tuple.

The chord diagram now has the airport code displayed on each node:

When you now hover over DFW, you can now see clearly that flights from DFW flys to ORD (O’Hare International Airport) and ATL (Hartsfield-Jackson Atlanta International Airport):

If you click on the DFW node, the rest of the flight paths are greyed out:

The chord diagram supports Bokeh Palettes. You can see the list of palette colors at https://docs.bokeh.org/en/latest/docs/reference/palettes.html.

The Bokeh Palettes provides a collection of palettes for color mapping.

Here are two commonly used Bokeh palettes:

Source: https://docs.bokeh.org/en/latest/docs/reference/palettes.html

Let’s now apply colors to our chord diagram by adding another %%opts cell magic statement:

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Category20')
chord = hv.Chord((df_between_airports, airports_dataset))
chord

The node_cmap indicates the palette to apply to the nodes while the edge_color indicates the palette to apply for the edge of the chord diagram:

Here is another example with different palettes applied:

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Bokeh')
chord = hv.Chord((df_between_airports, airports_dataset))
chord

For the final section in this article, I just want to ensure that the chord diagram is displaying the correct information. Selecting JFK shows that flights are flying to LAX and SFO:

To confirm this, use the following statement:

df_between_airports.query('ORIGIN_AIRPORT == "JFK"')

The output confirms the answer in the chord diagram:

Selecting ORD in the chord diagram shows that flights are flying to:

The following statement confirms what we have seen:

df_between_airports.query('ORIGIN_AIRPORT == "ORD"')

If you like reading my articles and that it helped your career/study, please consider signing up as a Medium member. It is $5 a month, and it gives you unlimited access to all the articles (including mine) on Medium. If you sign up using the following link, I will earn a small commission (at no additional cost to you). Your support means that I will be able to devote more time on writing articles like this.

Now that you have seen how to create chord diagrams using HoloViews, when do you use it? You should use chord diagram in the following scenarios:

  • When you want to use a simple representation to show the interconnections between large datasets.
  • When you want to create eye-catching visual representation that is aesthetically pleasing.
  • When you need to find and compare interrelationships between groups of data.

Have fun with chord diagram and remember to let me know how you are using it in your real-life projects!


Image by author

So far when we talk about data visualisation, a few usual types of charts come to mind — bar charts, pie charts, histograms, etc. However, there is another type that is very interesting but seldom discussed — chord diagram. So what is a chord diagram?

A chord diagram represents the flows or connections between several entities (known as nodes). Using a chord diagram, you can easily visualize the connections or relationships between various data points in your dataset. Consider the flights delay dataset. It contains detailed information of flights from one airport to another. If you want to visualize the relationships between the various airports, a chord diagram (see the figure at the start of this article) is an excellent way to present this information.

In this article, I will show you how you can plot a chord diagram using a third-party library known as HoloViews.

All images by author unless otherwise stated.

HoloViews is an open-source Python library designed to make data analysis and visualization seamless and simple. HoloViews is dependent on two other Python libraries — pyviz and bokeh. So the best way to install HoloViews is to use the following command:

conda install -c pyviz holoviews bokeh

As usual, my favourite dataset to use for illustrating concepts is the 2015 Flights Delay dataset. You can download the dataset from: https://www.kaggle.com/datasets/usdot/flight-delays.

Licensing: CC0: Public Domain

In this article, all code samples will be run using Jupyter Notebook.

First, load the flights.csv file using Pandas:

import pandas as pd

df = pd.read_csv('flights.csv')
df

Here’s the dataframe:

For this article, we are going to need two specific columns:

  • ORIGIN_AIRPORT
  • DESTINATION_AIRPORT

We are interested to see the relationships between the origin and destination airports.

The next thing to do would be to find the unique combinations of origin and destination airports. You could do this using the groupby() function, followed by the count() function:

df_between_airports = df.groupby(by=["ORIGIN_AIRPORT", "DESTINATION_AIRPORT"]).count()
df_between_airports

The output is a multi-index dataframe:

Apparently, you only need one of the non-index columns. So let’s extract the YEAR column and rename it as COUNT, and then reset the index:

df_between_airports = df_between_airports['YEAR'].rename('COUNT').reset_index() 
df_between_airports

So the output now contains the count of flights from one airport to another:

Notice that some of the airport codes are 5 digit numbers (e.g. 10135, 10397, etc). These are actually FAA’s Airport ID, and are used as replacement values for IATA codes (such as XNA, SFO, SLC, etc). Ideally, we should replace all these 5-digit Airport IDs with the actual IATA codes, but to keep things simple in this article, we shall remove them:

df_between_airports = df_between_airports.query(
'ORIGIN_AIRPORT.str.len() <= 3 & DESTINATION_AIRPORT.str.len() <= 3')
df_between_airports

And now you have a much clearer idea of what we are trying to achieve. For example, there are a total of 898 flights from ABE to ATL, and there are a total of 711 flights from ABE to DTW, and so on:

Observe that there are altogether 4693 combinations, and it would be very messy to generate a chord diagram with so many combinations. And so, let’s sort them in descending order:

df_between_airports = df_between_airports.sort_values(by="COUNT", 
ascending=False)
df_between_airports

And let’s extract the top 40 combinations:

top = 40
df_between_airports.head(top)['ORIGIN_AIRPORT'].unique()

With the top 40 combinations, there are a total of 18 originating airports:

array(['SFO', 'LAX', 'JFK', 'LAS', 'LGA', 'ORD', 'OGG', 'HNL', 'ATL',
'MCO', 'DFW', 'SEA', 'BOS', 'DCA', 'FLL', 'PHX', 'DEN', 'TPA'],
dtype=object)

We are now ready to display the chord diagram. First, import holoviews and specify bokeh as the extension:

import holoviews as hv
hv.extension('bokeh')

HoloViews uses the %%opts cell magic to modify how the cell is executed to display its output. You use the Chord class to display a chord diagram:

%%opts Chord [height=500 width=500 title="Flights between airports" ]
chord = hv.Chord(df_between_airports.head(top))
chord

The following output shows the relationships between the top 40 flights combinations:

Each circle (known as a node) on the chord diagram represents an airport. To see the relationships between airports, hover your mouse over a circle:

The above diagram shows the flights originating from the DFW (Dallas/Fort Worth International Airport). Apparently, it is not possible to see what the destination airports are. So, let’s get the list of origin and destination airports and then use it to create a hv.Dataset object:

# get the top count of flights between airports
df_between_airports = df_between_airports.head(top)

# find all the unique origin and destination airports
airports = list(set(df_between_airports["ORIGIN_AIRPORT"].unique().tolist() +
df_between_airports["DESTINATION_AIRPORT"].unique().tolist()))

airports_dataset = hv.Dataset(pd.DataFrame(airports, columns=["Airport"]))

To display the names of the airports at each node, set the labels attribute in the %%opts cell magic and pass the airports_dataset variable into the Chord class initializer:

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
chord = hv.Chord((df_between_airports, airports_dataset))
chord

Take note that the df_between_airports and airports_dataset variables are wrapped as a tuple.

The chord diagram now has the airport code displayed on each node:

When you now hover over DFW, you can now see clearly that flights from DFW flys to ORD (O’Hare International Airport) and ATL (Hartsfield-Jackson Atlanta International Airport):

If you click on the DFW node, the rest of the flight paths are greyed out:

The chord diagram supports Bokeh Palettes. You can see the list of palette colors at https://docs.bokeh.org/en/latest/docs/reference/palettes.html.

The Bokeh Palettes provides a collection of palettes for color mapping.

Here are two commonly used Bokeh palettes:

Source: https://docs.bokeh.org/en/latest/docs/reference/palettes.html

Let’s now apply colors to our chord diagram by adding another %%opts cell magic statement:

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Category20')
chord = hv.Chord((df_between_airports, airports_dataset))
chord

The node_cmap indicates the palette to apply to the nodes while the edge_color indicates the palette to apply for the edge of the chord diagram:

Here is another example with different palettes applied:

%%opts Chord [height=500 width=500 title="Flights between airports" labels="Airport"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="ORIGIN_AIRPORT" edge_cmap='Bokeh')
chord = hv.Chord((df_between_airports, airports_dataset))
chord

For the final section in this article, I just want to ensure that the chord diagram is displaying the correct information. Selecting JFK shows that flights are flying to LAX and SFO:

To confirm this, use the following statement:

df_between_airports.query('ORIGIN_AIRPORT == "JFK"')

The output confirms the answer in the chord diagram:

Selecting ORD in the chord diagram shows that flights are flying to:

The following statement confirms what we have seen:

df_between_airports.query('ORIGIN_AIRPORT == "ORD"')

If you like reading my articles and that it helped your career/study, please consider signing up as a Medium member. It is $5 a month, and it gives you unlimited access to all the articles (including mine) on Medium. If you sign up using the following link, I will earn a small commission (at no additional cost to you). Your support means that I will be able to devote more time on writing articles like this.

Now that you have seen how to create chord diagrams using HoloViews, when do you use it? You should use chord diagram in the following scenarios:

  • When you want to use a simple representation to show the interconnections between large datasets.
  • When you want to create eye-catching visual representation that is aesthetically pleasing.
  • When you need to find and compare interrelationships between groups of data.

Have fun with chord diagram and remember to let me know how you are using it in your real-life projects!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment