Make your sklearn models up to 100 times faster | by Marco vd Boom | Mar, 2023

By Jessie Hobb On Mar 16, 2023

How to considerable reduce training time changing only 1 line of code

Introduction

With the Intel® Extension for Scikit-learn package (or sklearnex, for brevity) you can accelerate sklearn models and transformers, keeping full conformance with sklearn’s API. Sklearnex is a free software AI accelerator that offers you a way to make sklearn code 10–100 times faster.

The software acceleration is achieved through the use of vector instructions, IA hardware-specific memory optimizations, threading, and optimizations for all upcoming Intel platforms at launch time.

In this story, we’ll explain how to use the ATOM library to leverage the speed of sklearnex. ATOM is an open-source Python package designed to help data scientists with the exploration of machine learning pipelines. Read this other story if you want a gentle introduction to the library.

Hardware requirements

Additional hardware requirements for sklearnex to take into account:

The processor must have x86 architecture.
The processor must support at least one of SSE2, AVX, AVX2, AVX512 instruction sets.
ARM* architecture is not supported.
Intel® processors provide better performance than other CPUs.

Note: sklearnex and ATOM are also capable of acceleration through GPU, but we won’t discuss that option this story. For now, let’s focus on CPU acceleration.

Example

Let’s walk you through an example to understand how to get started. We initialize atom the usual way, and specify the engine parameter. The engine parameter stipulates which library to use for the models. The options are:

sklearn (default)
sklearnex (our choice for this story)
cuml (for GPU acceleration)

from atom import ATOMClassifier
from sklearn.datasets import make_classification# Create a dummy dataset
X, y = make_classification(n_samples=100000, n_features=40)
atom = ATOMClassifier(X, y, engine="sklearnex", n_jobs=1, verbose=2)

Next, call the run method to train a model. See here a list of models that support sklearnex acceleration.

atom.run(models="RF")print(f"\nThe estimator used is {atom.rf.estimator}")
print(f"The module of the estimator is {atom.rf.estimator.__module__}")

It took 1.7 seconds to train and validate the model. Note how the model is from daal4py. This library is the backend engine for sklearnex.

For comparison purposes, let’s train also another Random Forest model, but now on sklearn instead of sklearnex. We can specify the engine parameter also directly on the run method.

atom.run(models="LR_sklearn", engine="sklearn")print(f"\nThe estimator used is {atom.rf.estimator}")
print(f"The module of the estimator is {atom.rf.estimator.__module__}")

This time it took 1.5 min instead of merely seconds! The former model is almost 60 times faster, and it even performs slightly better on the test set.

Let’s visualize the results.

atom.plot_results(metric="time")