Using PyGWalker to Enhance Your Jupyter Notebook EDA Experience | by Andy McDonald | Mar, 2023


PyGWalker showing multiple plots within the same view. Image by the author.

Creating effective and compelling data visualisations quickly and efficiently is a key part of the data science workflow. There are several options available to do this ranging from commercial software like Tableau to free alternatives like dedicated python libraries. The amount of skill and time needed to generate plots can vary between the different options.

Over the years, several python libraries have been developed to simplify the process of exploring your data. So simple in fact, that all you need to get started are 3–5 lines of code.

One such library that has recently appeared on the EDA scene is PyGWalker.

PyGWalker (Python binding of Graphic Walker) is a python library that can help speed up the data analysis and visualisation workflow directly within a Jupyter notebook. It leverages the power of interactivity by providing an interface similar to the popular data analytics software called Tableau.

Creating a scatter plot in PygWalker using well log data. Image by the author.

With this type of interface, we can drag and drop our variables into specific sections and quickly create a plot, filter it, and understand our data.

You can visit the GitHub repository for PyGWalker using the link below.

This article will explore some of the features of PyGWalker using one of my favourite well log data sets (details at end of the article).

At the time of writing this article, the version of PygWalker is 0.1.4.6, and some of the features illustrated may have been updated since this version.

To get started with PyGWalker, we need to install it. We can do this by using pip install pygwalkeror conda install pygwalkerif you are using Anaconda.

After the PyGWalker library has been installed, we can open our Jupyter Notebook and then import PyGWalker alongside the pandas library, which will be used to load our data from a CSV file.

import pandas as pd
import pygwalker as pyg

After these have been imported, the next step is to load the data we are going to be using for this tutorial. We can load this data by calling upon the familiar pd.read_csv() function from pandas, and then pass in our CSV file.

df = pd.read_csv('Data/Xeek_Well_15-9-15.csv')

Now it is time to run PyGWalker, and we can do it with the following straightforward call.

pyg.walk(df)

Once the cell has been run, we will get a very nice interface showing the available variables within the dataset. The variables will be split based on their type.

PyGWalker User Interface directly within a Jupyter Notebook. Image by the author.

The first plot we will create is a simple scatter plot of RHOB and NPHI — a commonly used plot within petrophysics.

Before we do this, we need to turn off the aggregation on the toolbar. This will allow us to plot the actual data values rather than any form of aggregation.

Toggling the aggregation option off will allow the plotting of actual data values. Image by the author.

Now we can select the variables we want to plot from the field list on the left. These can be clicked on and dragged into the x or y-axis boxes, depending on what axis you want the variables on.

You will also notice that the items within the field list have different icons. The blue document-like icons represent categorical data, and the purple hashtags represent numeric data.

In this example, I have placed the NPHI on the x-axis and RHOB on the y-axis.

Creating a scatter plot in PygWalker using well log data. Image by the author.

Changing the PygWalker Plot Size

When the plot appears, it may look very small. However, we can increase the size of our plot very easily. This is done by going to the menu and changing the layout mode from auto to fixed.

Once that option has been changed, we can either change the size by clicking on the blue border that appears around our plot or by clicking on the cog icon next to the Layout Mode button and adjusting the sliders.

Changing the figure size within PygWalker. Image by the author.

Adding Additional Variables to the PyGWalker Scatter Plot

We can also apply more variables to the plot to help us understand our data better. These additional variables can be categorical or numeric, and we can use them to add colour, opacity, size and shape.

In the example below, I have added the LITH variable, which will colour the data points by different lithologies. We can then hover over any point in the scatter plot and view its values.

Applying a categorical variable to the scatter plot created by PyGWalker. Image by the author.

If we use a numeric variable instead, we will get a colour bar along the side of the plot. The range of values for this axis can be changed by applying a filter — we will see how to do this shortly.

Applying a numeric variable to the scatter plot created by PyGWalker. Image by the author.

Zooming and Moving Around the PygWalker Scatter Plot

If we want to change the scales or zoom in on a section of data, we first have to click on the Auto Resizing button on the toolbar and then we can zoom in or out using the mouse scroll wheel.

We can then move around the plot by using the left mouse clicking, holding that button, and dragging the cursor around the plot.

Zooming and moving around the scatter plot in PyGWalker. Image by the author.

It would be nice to be able to change the scales on the plot manually by clicking on the axis or the corners of the axis, similar to how we can do it in a Plotly chart.

Filtering Data By Categories

We can also filter the data using our variables.

When we filter using categorical data, we click and drag the variable we want into the filters section and then deselect the categories we do not want to see.

Applying categorical filters in PygWalker. Image by the author.

When filtering using a numeric variable, we get a slider where we can control the min and max range. It does not appear as if we can edit the values manually, which would be a nice feature to have.

A nice feature is that we can we can apply multiple filters by adding another variable into the filter box and setting the range or selecting the categories we want.

Applying numerical filters to data using PygWalker. Image by the author.

There are several different plot types available within PyGWalker.

When adding data and the Mark Type is set to Auto, the library will try to find the best plot for your data. However, this may not always be the most appropriate chart type.

You can change the chart type by clicking the Mark Type button on the toolbar and then selecting the type you want.

The example below shows how you can create a line plot with two variables.

Creating a line plot in PyGWalker. Image by the author.

PyGWalker provides a nice way to view the raw data within your dataframe and change the data type if required. This is handy if a column has been accidentally identified as the wrong data type and you need to change it quickly.

It would be nice to be able to do more on the data view, such as filtering the data or applying colour scales to the columns, as sometimes this can help reveal any issues within the data.

The raw data view within PyGWalker. Image by the author.

PyGWalker has provided one of the most interactive experiences and nicest-looking setups I have come across with EDA libraries in a Jupyter notebook. The interface provides an easy way for non-coders or beginner coders to start creating charts immediately.

You should give it a try for your next project. Check out my article below if you want to see other powerful Python EDA libraries.


PyGWalker showing multiple plots within the same view. Image by the author.

Creating effective and compelling data visualisations quickly and efficiently is a key part of the data science workflow. There are several options available to do this ranging from commercial software like Tableau to free alternatives like dedicated python libraries. The amount of skill and time needed to generate plots can vary between the different options.

Over the years, several python libraries have been developed to simplify the process of exploring your data. So simple in fact, that all you need to get started are 3–5 lines of code.

One such library that has recently appeared on the EDA scene is PyGWalker.

PyGWalker (Python binding of Graphic Walker) is a python library that can help speed up the data analysis and visualisation workflow directly within a Jupyter notebook. It leverages the power of interactivity by providing an interface similar to the popular data analytics software called Tableau.

Creating a scatter plot in PygWalker using well log data. Image by the author.

With this type of interface, we can drag and drop our variables into specific sections and quickly create a plot, filter it, and understand our data.

You can visit the GitHub repository for PyGWalker using the link below.

This article will explore some of the features of PyGWalker using one of my favourite well log data sets (details at end of the article).

At the time of writing this article, the version of PygWalker is 0.1.4.6, and some of the features illustrated may have been updated since this version.

To get started with PyGWalker, we need to install it. We can do this by using pip install pygwalkeror conda install pygwalkerif you are using Anaconda.

After the PyGWalker library has been installed, we can open our Jupyter Notebook and then import PyGWalker alongside the pandas library, which will be used to load our data from a CSV file.

import pandas as pd
import pygwalker as pyg

After these have been imported, the next step is to load the data we are going to be using for this tutorial. We can load this data by calling upon the familiar pd.read_csv() function from pandas, and then pass in our CSV file.

df = pd.read_csv('Data/Xeek_Well_15-9-15.csv')

Now it is time to run PyGWalker, and we can do it with the following straightforward call.

pyg.walk(df)

Once the cell has been run, we will get a very nice interface showing the available variables within the dataset. The variables will be split based on their type.

PyGWalker User Interface directly within a Jupyter Notebook. Image by the author.

The first plot we will create is a simple scatter plot of RHOB and NPHI — a commonly used plot within petrophysics.

Before we do this, we need to turn off the aggregation on the toolbar. This will allow us to plot the actual data values rather than any form of aggregation.

Toggling the aggregation option off will allow the plotting of actual data values. Image by the author.

Now we can select the variables we want to plot from the field list on the left. These can be clicked on and dragged into the x or y-axis boxes, depending on what axis you want the variables on.

You will also notice that the items within the field list have different icons. The blue document-like icons represent categorical data, and the purple hashtags represent numeric data.

In this example, I have placed the NPHI on the x-axis and RHOB on the y-axis.

Creating a scatter plot in PygWalker using well log data. Image by the author.

Changing the PygWalker Plot Size

When the plot appears, it may look very small. However, we can increase the size of our plot very easily. This is done by going to the menu and changing the layout mode from auto to fixed.

Once that option has been changed, we can either change the size by clicking on the blue border that appears around our plot or by clicking on the cog icon next to the Layout Mode button and adjusting the sliders.

Changing the figure size within PygWalker. Image by the author.

Adding Additional Variables to the PyGWalker Scatter Plot

We can also apply more variables to the plot to help us understand our data better. These additional variables can be categorical or numeric, and we can use them to add colour, opacity, size and shape.

In the example below, I have added the LITH variable, which will colour the data points by different lithologies. We can then hover over any point in the scatter plot and view its values.

Applying a categorical variable to the scatter plot created by PyGWalker. Image by the author.

If we use a numeric variable instead, we will get a colour bar along the side of the plot. The range of values for this axis can be changed by applying a filter — we will see how to do this shortly.

Applying a numeric variable to the scatter plot created by PyGWalker. Image by the author.

Zooming and Moving Around the PygWalker Scatter Plot

If we want to change the scales or zoom in on a section of data, we first have to click on the Auto Resizing button on the toolbar and then we can zoom in or out using the mouse scroll wheel.

We can then move around the plot by using the left mouse clicking, holding that button, and dragging the cursor around the plot.

Zooming and moving around the scatter plot in PyGWalker. Image by the author.

It would be nice to be able to change the scales on the plot manually by clicking on the axis or the corners of the axis, similar to how we can do it in a Plotly chart.

Filtering Data By Categories

We can also filter the data using our variables.

When we filter using categorical data, we click and drag the variable we want into the filters section and then deselect the categories we do not want to see.

Applying categorical filters in PygWalker. Image by the author.

When filtering using a numeric variable, we get a slider where we can control the min and max range. It does not appear as if we can edit the values manually, which would be a nice feature to have.

A nice feature is that we can we can apply multiple filters by adding another variable into the filter box and setting the range or selecting the categories we want.

Applying numerical filters to data using PygWalker. Image by the author.

There are several different plot types available within PyGWalker.

When adding data and the Mark Type is set to Auto, the library will try to find the best plot for your data. However, this may not always be the most appropriate chart type.

You can change the chart type by clicking the Mark Type button on the toolbar and then selecting the type you want.

The example below shows how you can create a line plot with two variables.

Creating a line plot in PyGWalker. Image by the author.

PyGWalker provides a nice way to view the raw data within your dataframe and change the data type if required. This is handy if a column has been accidentally identified as the wrong data type and you need to change it quickly.

It would be nice to be able to do more on the data view, such as filtering the data or applying colour scales to the columns, as sometimes this can help reveal any issues within the data.

The raw data view within PyGWalker. Image by the author.

PyGWalker has provided one of the most interactive experiences and nicest-looking setups I have come across with EDA libraries in a Jupyter notebook. The interface provides an easy way for non-coders or beginner coders to start creating charts immediately.

You should give it a try for your next project. Check out my article below if you want to see other powerful Python EDA libraries.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Andyartificial intelligenceEDAEnhanceExperienceJupyterlatest newsmachine learningMARmcdonaldNotebookPyGWalker
Comments (0)
Add Comment