Create a Python Package with Super- Fast Rust Code in 3 Steps | by Mike Huls | Feb, 2023


This Python is getting a bit Rusty! (image by Dall-e 2!)

Python is a pretty easy language to pick up and it’s super quick to write some code in, compared to some other languages. All this ease-of-use comes with a downside: speed is sacrificed. Sometimes Python is just too slow!

To solve this problem we’ll re-write a bit of our Python-code in Rust and import this code, as a Python package, into our original project. We end up with a super-fast Python package that we can import and use like any other package. As a bonus we’ll multi-process our Rusty Python package and end up with a function that is roughly 150x faster. Let’s code!

A quick summary of what we’re going to do in this article. We’ll tackle the problem in 6 steps (of which step 2, 3 and 4 are devoted to actually writing the package):

  1. Examining our slow function; why is it slow?
  2. Preparing our project
  3. We re-write this function in Rust
  4. Compile the Rust code and put it in a Python package
  5. Import the Python package into our project
  6. Benchmarking the Python function vs the Rust one

We’ll use a Python package called maturin. This package will compile our Rust code and convert into a package. The result will be like any other Python package that we can import and use (like pandas).

It’s first important that we understand why our function is slow. Let’s imagine that our project requires a function that counts the number of primes between two numbers:

def primecounter_py(range_from:int, range_til:int) -> (int, int):
""" Returns the number of found prime numbers using range"""
check_count = 0
prime_count = 0
range_from = range_from if range_from >= 2 else 2
for num in range(range_from, range_til + 1):
for divnum in range(2, num):
check_count += 1
if ((num % divnum) == 0):
break
else:
prime_count += 1
return prime_count, check_count

Please note that:

The number of prime-checks isn’t really necessary for this function but it allows us to compare Python vs Rust in a later part of this article.

The Python code and Rust code in this article are far from optimized for finding primes. The important thing is to demonstrate that we can optimize small chunks of Python with Rust and that we can compare the performance of these functions.

If you insert primecounter_py(10, 20) it returns 4 (11, 13, 17, and 19 are primes) and the number of prime-checks the function has performed. These small ranges are executed very quickly but when we use larger ranges you’ll see that performance starts to suffer:

range      milliseconds
1-1K 4
1-10K 310
1-25K 1754
1-50K 6456
1-75K 14019
1-100K 24194

You see that as our input-size increases ten-fold; the duration increases much more. In other words: the larger a range becomes; the slower it gets (relatively).

Why is the primecounter_py function slow?

Code can be slow for many reasons. It can be I/O-based like waiting for an API, hardware-related or based on the design of Python as a language. In this article it’s the last case. The way Python is designed and how it handles variables e.g. make it very easy to use but you suffer a small speed penalty that becomes apparent when you have to perform a lot of calculations. On the bright side; this function is very suitable for optimization with Rust.

If you are interested in what Python’s limitations are, I recommend reading the article below. It explains the cause and potential solutions for slowness due to how Python is designed.

Is concurrency the problem?

Doing multiple things simultaneously can solve a lot of speed problems. In our case we could opt for using multiple processes to divide all tasks over multiple cores in stead of the default 1. Still we go for the optimization in Rust since we can also multi-process the faster function as you’ll see at the end of this article.

Many cases that involve a lot of I/O can be optimized by using threads (like waiting for an API)? Check out this article or the one below on how to put multiple CPUs to work to increase execution speed.

This is the part where we install dependencies and create all files and folders we need to write Rust and compile it into a package.

a. Create a venv

Create a virtual environment and activate it. Then install maturin; this package will help us convert out Rust code to a Python package:

python -m venv venv
source venv/bin/activate
pip install maturin

b. Rust files and folders

We’ll create a directory called my_rust_module that will contain our rust code and cd into that directory.

mkdir my_rust_module
cd my_rust_module

c. Initializing maturin

Then we call maturin init. It shows you some options. Choose pyo3. Maturin now creates some folders and files. Your project should look like this now:

my_folder
|- venv
|- my_rust_module
|- .github
|- src
|- lib.rs
|- .gitignore
|- Cargo.toml
|- pyproject.toml

The most important one is /my_rust_module/src/lib.rs. This file will contains our Rust code that we’re about to turn into a Python package.

Notice that maturin also created a Cargo.toml. This is the configuration of our project. It also contains all of our dependencies (like requirements.txt). In my case I’ve edited it to look like this:

[package]
name = "my_rust_module"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "my_rust_module"
crate-type = ["cdylib"]
[dependencies]
pyo3 = { version = "0.17.3", features = ["extension-module"] }

We are now ready to recreate our Python function in Rust. We won’t dive too deep in the Rust syntax but focus more on the way we can make the Rust code work with Python. We’ll first create a pure-Rust function and then put it in a package that we can import and use in Python.

If you’ve never seen Rust code then the code below maybe a little confusing. The most important thing is that the primecounter function below is pure Rust; it has nothing to do with Python. Open /my_rust_module/src/lib.rs and give it the following content:

use pyo3::prelude::*;

#[pyfunction]
fn primecounter(range_from:u64, range_til:u64) -> (u32, u32) {
/* Returns the number of found prime numbers between [range_from] and [range_til] """ */
let mut prime_count:u32 = 0;
let mut check_count:u32 = 0;
let _from:u64 = if range_from < 2 { 2 } else { range_from };
let mut prime_found:bool;

for num in _from..=range_til {
prime_found = false;
for divnum in 2..num {
check_count += 1;
if num % divnum == 0 {
prime_found = true;
break;
}
}
if !prime_found {
prime_count += 1;
}
}
return (prime_count, check_count)
}

/// Put the function in a Python module
#[pymodule]
fn my_rust_module(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(primecounter, m)?)?;
Ok(())
}

Let’s run through the most important things:

  1. The primecounter function is pure Rust
  2. The primecounter function is decorated with #[pyfunction]. This indicates that we want to transform it into a Python function
  3. In the last few lines we build a pymodule. The my_rust_module function packages the Rust code into a Python module.

This part may seem the hardest but with the help of the maturin package it becomes very easy for us. Just call
maturin build --release.
This compiles all Rust code and wraps it into a Python package that ends up in this directory: your_project_dir/my_rust_module/target/wheels. We install the wheel in the next part.

For windows users:
In the examples below I work in a Debian environment (via Windows WSL). This makes compiling code with Rust a little easier since the compilers we need are already installed. Building on Windows is possible as well but you’ll likely receive a message like Microsoft Visual C++ 14.0 or greater is required. This means you don’t have a compiler. You can solve this by installing C++ build tools that you can download here.

We can directly pip install the wheel we’ve created in the previous part:

pip install target/wheels/my_rust_module-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl

Then it’s just a matter of importing our module and using the function:

import my_rust_module

primecount, eval_count = my_rust_module.primecounter(range_from=0, range_til=500)
# returns 95 22279

Let’s check out our functions compare. We’ll call both the Python and Rust version of our primecounter function and time them. We also call the function with multiple arguments. There are the results:

range   Py ms   py e/sec    rs ms   rs e/sec
1-1K 4 17.6M 0.19 417M
1-10K 310 18.6M 12 481M
1-25K 1754 18.5M 66 489M
1-50K 6456 18.8M 248 488M
1-75K 14019 18.7M 519 505M
1-100K 24194 18.8M 937 485M

Both our Python and Rust function return the result and the count of the numbers they have evaluated. In the overview above you see that Rust outperforms Python by 27x when it comes to these evaluations per second.

Counting primes in Rust is a lot faster than in Python (image by author)

The graph above provides a very clear difference in execution time.

Of course you can multi-process this new Python package! With the code below we divide all numbers that we need to evaluate over all of our cores:

# batch size is determined by the range divided over the amount of available CPU's 
batch_size = math.ceil((range_til - range_from) / mp.cpu_count())

# The lines below divide the ranges over all available CPU's.
# A range of 0 - 10 will be divided over 4 cpu's like:
# [(0, 2), (3, 5), (6, 8), (9, 9)]
number_list = list(range(range_from, range_til))
number_list = [number_list[i * batch_size:(i + 1) * batch_size] for i in range((len(number_list) + batch_size - 1) // batch_size)]
number_list_from_til = [(min(chunk), max(chunk)) for chunk in number_list]

primecount = 0
eval_count = 0
with mp.Pool() as
results = mp_pool.starmap(my_rust_module.primecounter, number_list_from_til)
for _count, _evals in results:
primecount += _count
eval_count += _evals

Let’s try find all primes between 0 and 100K again. With our current algorithm this means that we have to perform almost half a billion checks. As you see in the overview below Rust finishes these in 0.88 seconds. With multiprocessing the process finishes in 0.16 seconds; 5.5 times faster, clocking in at 2.8 billion calculations per second.

            calculations     duration    calculations/sec
rust: 455.19M 882.03 ms 516.1M/sec
rust MP: 455.19M 160.62 ms 2.8B/sec

Compared to our original (single process) Python function we’ve increased the number of calculations per second from 18.8M to 2.8 billion. This means that our function is now roughly 150x faster.

As we’ve seen in this article, it’s not all that difficult to extend Python with Rust. If you know when and how to apply this technique you can really improve the execution speed of your program.

I hope this article was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics like these:

Happy coding!

— Mike

P.S: like what I’m doing? Follow me!


This Python is getting a bit Rusty! (image by Dall-e 2!)

Python is a pretty easy language to pick up and it’s super quick to write some code in, compared to some other languages. All this ease-of-use comes with a downside: speed is sacrificed. Sometimes Python is just too slow!

To solve this problem we’ll re-write a bit of our Python-code in Rust and import this code, as a Python package, into our original project. We end up with a super-fast Python package that we can import and use like any other package. As a bonus we’ll multi-process our Rusty Python package and end up with a function that is roughly 150x faster. Let’s code!

A quick summary of what we’re going to do in this article. We’ll tackle the problem in 6 steps (of which step 2, 3 and 4 are devoted to actually writing the package):

  1. Examining our slow function; why is it slow?
  2. Preparing our project
  3. We re-write this function in Rust
  4. Compile the Rust code and put it in a Python package
  5. Import the Python package into our project
  6. Benchmarking the Python function vs the Rust one

We’ll use a Python package called maturin. This package will compile our Rust code and convert into a package. The result will be like any other Python package that we can import and use (like pandas).

It’s first important that we understand why our function is slow. Let’s imagine that our project requires a function that counts the number of primes between two numbers:

def primecounter_py(range_from:int, range_til:int) -> (int, int):
""" Returns the number of found prime numbers using range"""
check_count = 0
prime_count = 0
range_from = range_from if range_from >= 2 else 2
for num in range(range_from, range_til + 1):
for divnum in range(2, num):
check_count += 1
if ((num % divnum) == 0):
break
else:
prime_count += 1
return prime_count, check_count

Please note that:

The number of prime-checks isn’t really necessary for this function but it allows us to compare Python vs Rust in a later part of this article.

The Python code and Rust code in this article are far from optimized for finding primes. The important thing is to demonstrate that we can optimize small chunks of Python with Rust and that we can compare the performance of these functions.

If you insert primecounter_py(10, 20) it returns 4 (11, 13, 17, and 19 are primes) and the number of prime-checks the function has performed. These small ranges are executed very quickly but when we use larger ranges you’ll see that performance starts to suffer:

range      milliseconds
1-1K 4
1-10K 310
1-25K 1754
1-50K 6456
1-75K 14019
1-100K 24194

You see that as our input-size increases ten-fold; the duration increases much more. In other words: the larger a range becomes; the slower it gets (relatively).

Why is the primecounter_py function slow?

Code can be slow for many reasons. It can be I/O-based like waiting for an API, hardware-related or based on the design of Python as a language. In this article it’s the last case. The way Python is designed and how it handles variables e.g. make it very easy to use but you suffer a small speed penalty that becomes apparent when you have to perform a lot of calculations. On the bright side; this function is very suitable for optimization with Rust.

If you are interested in what Python’s limitations are, I recommend reading the article below. It explains the cause and potential solutions for slowness due to how Python is designed.

Is concurrency the problem?

Doing multiple things simultaneously can solve a lot of speed problems. In our case we could opt for using multiple processes to divide all tasks over multiple cores in stead of the default 1. Still we go for the optimization in Rust since we can also multi-process the faster function as you’ll see at the end of this article.

Many cases that involve a lot of I/O can be optimized by using threads (like waiting for an API)? Check out this article or the one below on how to put multiple CPUs to work to increase execution speed.

This is the part where we install dependencies and create all files and folders we need to write Rust and compile it into a package.

a. Create a venv

Create a virtual environment and activate it. Then install maturin; this package will help us convert out Rust code to a Python package:

python -m venv venv
source venv/bin/activate
pip install maturin

b. Rust files and folders

We’ll create a directory called my_rust_module that will contain our rust code and cd into that directory.

mkdir my_rust_module
cd my_rust_module

c. Initializing maturin

Then we call maturin init. It shows you some options. Choose pyo3. Maturin now creates some folders and files. Your project should look like this now:

my_folder
|- venv
|- my_rust_module
|- .github
|- src
|- lib.rs
|- .gitignore
|- Cargo.toml
|- pyproject.toml

The most important one is /my_rust_module/src/lib.rs. This file will contains our Rust code that we’re about to turn into a Python package.

Notice that maturin also created a Cargo.toml. This is the configuration of our project. It also contains all of our dependencies (like requirements.txt). In my case I’ve edited it to look like this:

[package]
name = "my_rust_module"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
name = "my_rust_module"
crate-type = ["cdylib"]
[dependencies]
pyo3 = { version = "0.17.3", features = ["extension-module"] }

We are now ready to recreate our Python function in Rust. We won’t dive too deep in the Rust syntax but focus more on the way we can make the Rust code work with Python. We’ll first create a pure-Rust function and then put it in a package that we can import and use in Python.

If you’ve never seen Rust code then the code below maybe a little confusing. The most important thing is that the primecounter function below is pure Rust; it has nothing to do with Python. Open /my_rust_module/src/lib.rs and give it the following content:

use pyo3::prelude::*;

#[pyfunction]
fn primecounter(range_from:u64, range_til:u64) -> (u32, u32) {
/* Returns the number of found prime numbers between [range_from] and [range_til] """ */
let mut prime_count:u32 = 0;
let mut check_count:u32 = 0;
let _from:u64 = if range_from < 2 { 2 } else { range_from };
let mut prime_found:bool;

for num in _from..=range_til {
prime_found = false;
for divnum in 2..num {
check_count += 1;
if num % divnum == 0 {
prime_found = true;
break;
}
}
if !prime_found {
prime_count += 1;
}
}
return (prime_count, check_count)
}

/// Put the function in a Python module
#[pymodule]
fn my_rust_module(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(primecounter, m)?)?;
Ok(())
}

Let’s run through the most important things:

  1. The primecounter function is pure Rust
  2. The primecounter function is decorated with #[pyfunction]. This indicates that we want to transform it into a Python function
  3. In the last few lines we build a pymodule. The my_rust_module function packages the Rust code into a Python module.

This part may seem the hardest but with the help of the maturin package it becomes very easy for us. Just call
maturin build --release.
This compiles all Rust code and wraps it into a Python package that ends up in this directory: your_project_dir/my_rust_module/target/wheels. We install the wheel in the next part.

For windows users:
In the examples below I work in a Debian environment (via Windows WSL). This makes compiling code with Rust a little easier since the compilers we need are already installed. Building on Windows is possible as well but you’ll likely receive a message like Microsoft Visual C++ 14.0 or greater is required. This means you don’t have a compiler. You can solve this by installing C++ build tools that you can download here.

We can directly pip install the wheel we’ve created in the previous part:

pip install target/wheels/my_rust_module-0.1.0-cp39-cp39-manylinux_2_28_x86_64.whl

Then it’s just a matter of importing our module and using the function:

import my_rust_module

primecount, eval_count = my_rust_module.primecounter(range_from=0, range_til=500)
# returns 95 22279

Let’s check out our functions compare. We’ll call both the Python and Rust version of our primecounter function and time them. We also call the function with multiple arguments. There are the results:

range   Py ms   py e/sec    rs ms   rs e/sec
1-1K 4 17.6M 0.19 417M
1-10K 310 18.6M 12 481M
1-25K 1754 18.5M 66 489M
1-50K 6456 18.8M 248 488M
1-75K 14019 18.7M 519 505M
1-100K 24194 18.8M 937 485M

Both our Python and Rust function return the result and the count of the numbers they have evaluated. In the overview above you see that Rust outperforms Python by 27x when it comes to these evaluations per second.

Counting primes in Rust is a lot faster than in Python (image by author)

The graph above provides a very clear difference in execution time.

Of course you can multi-process this new Python package! With the code below we divide all numbers that we need to evaluate over all of our cores:

# batch size is determined by the range divided over the amount of available CPU's 
batch_size = math.ceil((range_til - range_from) / mp.cpu_count())

# The lines below divide the ranges over all available CPU's.
# A range of 0 - 10 will be divided over 4 cpu's like:
# [(0, 2), (3, 5), (6, 8), (9, 9)]
number_list = list(range(range_from, range_til))
number_list = [number_list[i * batch_size:(i + 1) * batch_size] for i in range((len(number_list) + batch_size - 1) // batch_size)]
number_list_from_til = [(min(chunk), max(chunk)) for chunk in number_list]

primecount = 0
eval_count = 0
with mp.Pool() as
results = mp_pool.starmap(my_rust_module.primecounter, number_list_from_til)
for _count, _evals in results:
primecount += _count
eval_count += _evals

Let’s try find all primes between 0 and 100K again. With our current algorithm this means that we have to perform almost half a billion checks. As you see in the overview below Rust finishes these in 0.88 seconds. With multiprocessing the process finishes in 0.16 seconds; 5.5 times faster, clocking in at 2.8 billion calculations per second.

            calculations     duration    calculations/sec
rust: 455.19M 882.03 ms 516.1M/sec
rust MP: 455.19M 160.62 ms 2.8B/sec

Compared to our original (single process) Python function we’ve increased the number of calculations per second from 18.8M to 2.8 billion. This means that our function is now roughly 150x faster.

As we’ve seen in this article, it’s not all that difficult to extend Python with Rust. If you know when and how to apply this technique you can really improve the execution speed of your program.

I hope this article was as clear as I hope it to be but if this is not the case please let me know what I can do to clarify further. In the meantime, check out my other articles on all kinds of programming-related topics like these:

Happy coding!

— Mike

P.S: like what I’m doing? Follow me!

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewscodeCreateFastFebHulslatest newsMikepackagepythonRustStepsSuperTechnology
Comments (0)
Add Comment