How to make 40+ interactive plots to analyze your machine learning pipeline | by Marco vd Boom

A quick guide on how to make clean-looking, interactive Python plots to validate your data and model

Introduction

Plots have become the de facto tool to help data scientists and stakeholders understand the process and results of machine learning projects. In this story, we’ll show you how to use the ATOM library to easily make clean-looking, interactive plots, in order to quickly analyze the dataset, inspect the pipeline, assess the model’s performance and interpret the model’s results. ATOM is an open-source Python package designed to help data scientists fasten the exploration of machine learning pipelines. Read this story if you want a gentle introduction to the library.

Data plots

Let’s start with plots that can make you understand the dataset you are working with a bit better. How to transform pipelines and train models with ATOM lies outside the scope of this story. Read this story or this story to learn more about that. Here, we’ll dive directly into plot making.

After initializing atom, creating the plots is as easy as calling the appropriate method. Excluding some exceptions, plots are made using the plotly library and rendered in html. Click here for a list of all available data plot methods.

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.plot_correlation()

atom.plot_distribution(columns=0)

atom.plot_relationships(columns=(0, 1, 2))

Feature selection plots

The feature selection plots can help you analyze the features selected by the PCA or RFECV strategies. Click here for a list of all feature selection plots.

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.feature_selection("pca", n_features=5)
atom.plot_pca()

atom.plot_components(show=10)

Hyperparameter tuning plots

ATOM uses the optuna library to apply hyperparameter tuning. The plots available through optuna are also available directly through ATOM’s API, with a few additional plots. Click here for a list of all hyperparameter tuning plots.

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.run(models="RF", metric="f1", n_trials=15)
atom.plot_hyperparameters(params=(0, 1, 2))

atom.plot_parallel_coordinate(params=slice(1, 5))

atom.plot_slice(params=(0, 1, 2))

Prediction plots

After training the model, use its predictions on the train and test sets to assess the model’s performance or inspect the feature importance. One of ATOM’s most powerful features, is the possibility of plotting results for multiple models or data sets (e.g. train vs test) in the same figure.

from atom import ATOMClassifierX = pd.read_csv("https://raw.githubusercontent.com/tvdboom/ATOM/master/examples/datasets/weatherAUS.csv")
atom = ATOMClassifier(X, y="RainTomorrow")
atom.impute()
atom.encode()
atom.run(models=["LR", "RF"])
atom.plot_roc()

atom.plot_prc(models="RF", dataset="train+test")

atom.plot_feature_importance(show=10)

Interpretability plots

The SHAP (SHapley Additive exPlanations) python package uses a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions. ATOM implements methods to plot 7 of SHAP’s plotting functions directly from its API. Since the plots are not made by ATOM, they are not interactive nor can they display multiple models.

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.run(models="RF")
atom.plot_shap_bar()

atom.plot_shap_beeswarm(show=10)

atom.plot_shap_decision(index=slice(5), show=10)

Parameters

Apart from the plot-specific parameters, all plots have five parameters in common:

The title parameter adds a title to the plot. The default value doesn’t show any title. Provide a configuration (as dictionary) to customize its appearance, e.g. title=dict(text="Awesome plot", color="red"). Read more in plotly’s documentation.
The legend parameter is used to show/hide, position or customize the plot’s legend. Provide a configuration (as dictionary) to customize its appearance (e.g. legend=dict(title="Title for legend", title_font_color="red")) or choose one of the following locations: upper left, upper right, lower left, lower right, upper center, lower center, center left, center right, center, out.
The figsize parameter adjust the plot’s size.
The filename parameter is used to save the plot.
The display parameter determines whether to show or return the plot.

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.plot_distribution(
columns=0,
title=dict(
text="Custom left side title",
font_color="teal",
x=0,
xanchor="left",
),
legend="upper left",
)

Conclusion

We have shown how to use the ATOM package to make interactive plots in order to quickly analyze the results of a machine learning pipeline. For a list of all the available plots click here.

For further information about ATOM, have a look at the package’s documentation. For bugs or feature requests, don’t hesitate to open an issue on GitHub or send me an email.

A quick guide on how to make clean-looking, interactive Python plots to validate your data and model

Photo by Markus Winkler on Unsplash

Introduction

Data plots

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.plot_correlation()

atom.plot_distribution(columns=0)

atom.plot_relationships(columns=(0, 1, 2))

Feature selection plots

The feature selection plots can help you analyze the features selected by the PCA or RFECV strategies. Click here for a list of all feature selection plots.

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.feature_selection("pca", n_features=5)
atom.plot_pca()

atom.plot_components(show=10)

Hyperparameter tuning plots

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.run(models="RF", metric="f1", n_trials=15)
atom.plot_hyperparameters(params=(0, 1, 2))

atom.plot_parallel_coordinate(params=slice(1, 5))

atom.plot_slice(params=(0, 1, 2))

Prediction plots

from atom import ATOMClassifierX = pd.read_csv("https://raw.githubusercontent.com/tvdboom/ATOM/master/examples/datasets/weatherAUS.csv")
atom = ATOMClassifier(X, y="RainTomorrow")
atom.impute()
atom.encode()
atom.run(models=["LR", "RF"])
atom.plot_roc()

atom.plot_prc(models="RF", dataset="train+test")

atom.plot_feature_importance(show=10)

Interpretability plots

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.run(models="RF")
atom.plot_shap_bar()

atom.plot_shap_beeswarm(show=10)

atom.plot_shap_decision(index=slice(5), show=10)

Parameters

Apart from the plot-specific parameters, all plots have five parameters in common:

The title parameter adds a title to the plot. The default value doesn’t show any title. Provide a configuration (as dictionary) to customize its appearance, e.g. title=dict(text="Awesome plot", color="red"). Read more in plotly’s documentation.
The legend parameter is used to show/hide, position or customize the plot’s legend. Provide a configuration (as dictionary) to customize its appearance (e.g. legend=dict(title="Title for legend", title_font_color="red")) or choose one of the following locations: upper left, upper right, lower left, lower right, upper center, lower center, center left, center right, center, out.
The figsize parameter adjust the plot’s size.
The filename parameter is used to save the plot.
The display parameter determines whether to show or return the plot.

from atom import ATOMClassifier
from sklearn.datasets import load_breast_cancerX, y = load_breast_cancer(return_X_y=True, as_frame=True)
atom = ATOMClassifier(X, y)
atom.plot_distribution(
columns=0,
title=dict(
text="Custom left side title",
font_color="teal",
x=0,
xanchor="left",
),
legend="upper left",
)

Conclusion

We have shown how to use the ATOM package to make interactive plots in order to quickly analyze the results of a machine learning pipeline. For a list of all the available plots click here.

For further information about ATOM, have a look at the package’s documentation. For bugs or feature requests, don’t hesitate to open an issue on GitHub or send me an email.

How to make 40+ interactive plots to analyze your machine learning pipeline | by Marco vd Boom | Mar, 2023

A quick guide on how to make clean-looking, interactive Python plots to validate your data and model

Introduction

Data plots

Feature selection plots

Hyperparameter tuning plots

Prediction plots

Interpretability plots

Parameters

Conclusion

A quick guide on how to make clean-looking, interactive Python plots to validate your data and model

Introduction

Data plots

Feature selection plots

Hyperparameter tuning plots

Prediction plots

Interpretability plots

Parameters

Conclusion

Related Posts