Techno Blender
Digitally Yours.

Make your sklearn models up to 100 times faster | by Marco vd Boom | Mar, 2023

0 39


Photo by Markus Winkler on Unsplash

Introduction

With the Intel® Extension for Scikit-learn package (or sklearnex, for brevity) you can accelerate sklearn models and transformers, keeping full conformance with sklearn’s API. Sklearnex is a free software AI accelerator that offers you a way to make sklearn code 10–100 times faster.

The software acceleration is achieved through the use of vector instructions, IA hardware-specific memory optimizations, threading, and optimizations for all upcoming Intel platforms at launch time.

In this story, we’ll explain how to use the ATOM library to leverage the speed of sklearnex. ATOM is an open-source Python package designed to help data scientists with the exploration of machine learning pipelines. Read this other story if you want a gentle introduction to the library.

Hardware requirements

Additional hardware requirements for sklearnex to take into account:

  • The processor must have x86 architecture.
  • The processor must support at least one of SSE2, AVX, AVX2, AVX512 instruction sets.
  • ARM* architecture is not supported.
  • Intel® processors provide better performance than other CPUs.

Note: sklearnex and ATOM are also capable of acceleration through GPU, but we won’t discuss that option this story. For now, let’s focus on CPU acceleration.

Example

Let’s walk you through an example to understand how to get started. We initialize atom the usual way, and specify the engine parameter. The engine parameter stipulates which library to use for the models. The options are:

  • sklearn (default)
  • sklearnex (our choice for this story)
  • cuml (for GPU acceleration)
from atom import ATOMClassifier
from sklearn.datasets import make_classification

# Create a dummy dataset
X, y = make_classification(n_samples=100000, n_features=40)

atom = ATOMClassifier(X, y, engine="sklearnex", n_jobs=1, verbose=2)

Next, call the run method to train a model. See here a list of models that support sklearnex acceleration.

atom.run(models="RF")

print(f"\nThe estimator used is {atom.rf.estimator}")
print(f"The module of the estimator is {atom.rf.estimator.__module__}")

It took 1.7 seconds to train and validate the model. Note how the model is from daal4py. This library is the backend engine for sklearnex.

For comparison purposes, let’s train also another Random Forest model, but now on sklearn instead of sklearnex. We can specify the engine parameter also directly on the run method.

atom.run(models="LR_sklearn", engine="sklearn")

print(f"\nThe estimator used is {atom.rf.estimator}")
print(f"The module of the estimator is {atom.rf.estimator.__module__}")

This time it took 1.5 min instead of merely seconds! The former model is almost 60 times faster, and it even performs slightly better on the test set.

Let’s visualize the results.

atom.plot_results(metric="time")

It’s importance to note that there are no big differences between the models, both in terms of performance and in the logic used by the model to make its predictions. The latter statement can be visualized with a feature importance plot (where features have similar importance) and a comparison of shap decision plots (where the decision patterns match).

atom.plot_feature_importance(show=10)
atom.rf.plot_shap_decision(show=10, title="sklearnex")
atom.rf_sklearn.plot_shap_decision(show=10, title="sklearn")


Photo by Markus Winkler on Unsplash

Introduction

With the Intel® Extension for Scikit-learn package (or sklearnex, for brevity) you can accelerate sklearn models and transformers, keeping full conformance with sklearn’s API. Sklearnex is a free software AI accelerator that offers you a way to make sklearn code 10–100 times faster.

The software acceleration is achieved through the use of vector instructions, IA hardware-specific memory optimizations, threading, and optimizations for all upcoming Intel platforms at launch time.

In this story, we’ll explain how to use the ATOM library to leverage the speed of sklearnex. ATOM is an open-source Python package designed to help data scientists with the exploration of machine learning pipelines. Read this other story if you want a gentle introduction to the library.

Hardware requirements

Additional hardware requirements for sklearnex to take into account:

  • The processor must have x86 architecture.
  • The processor must support at least one of SSE2, AVX, AVX2, AVX512 instruction sets.
  • ARM* architecture is not supported.
  • Intel® processors provide better performance than other CPUs.

Note: sklearnex and ATOM are also capable of acceleration through GPU, but we won’t discuss that option this story. For now, let’s focus on CPU acceleration.

Example

Let’s walk you through an example to understand how to get started. We initialize atom the usual way, and specify the engine parameter. The engine parameter stipulates which library to use for the models. The options are:

  • sklearn (default)
  • sklearnex (our choice for this story)
  • cuml (for GPU acceleration)
from atom import ATOMClassifier
from sklearn.datasets import make_classification

# Create a dummy dataset
X, y = make_classification(n_samples=100000, n_features=40)

atom = ATOMClassifier(X, y, engine="sklearnex", n_jobs=1, verbose=2)

Next, call the run method to train a model. See here a list of models that support sklearnex acceleration.

atom.run(models="RF")

print(f"\nThe estimator used is {atom.rf.estimator}")
print(f"The module of the estimator is {atom.rf.estimator.__module__}")

It took 1.7 seconds to train and validate the model. Note how the model is from daal4py. This library is the backend engine for sklearnex.

For comparison purposes, let’s train also another Random Forest model, but now on sklearn instead of sklearnex. We can specify the engine parameter also directly on the run method.

atom.run(models="LR_sklearn", engine="sklearn")

print(f"\nThe estimator used is {atom.rf.estimator}")
print(f"The module of the estimator is {atom.rf.estimator.__module__}")

This time it took 1.5 min instead of merely seconds! The former model is almost 60 times faster, and it even performs slightly better on the test set.

Let’s visualize the results.

atom.plot_results(metric="time")

It’s importance to note that there are no big differences between the models, both in terms of performance and in the logic used by the model to make its predictions. The latter statement can be visualized with a feature importance plot (where features have similar importance) and a comparison of shap decision plots (where the decision patterns match).

atom.plot_feature_importance(show=10)
atom.rf.plot_shap_decision(show=10, title="sklearnex")
atom.rf_sklearn.plot_shap_decision(show=10, title="sklearn")

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment