Machine Learning and Rust (Part 4): Neural Networks in Torch | by Stefano Bosisio | Aug, 2022
Can we use PyTorch in Rust? What are Rust bindings? What’s tch-rs? A look on neural networks in Rust
It’s been a while since the last time when we had a look at Rust and its application to Machine Learning — please, scroll down to the bottom for the previous tutorials on ML and Rust. Today I would like to present you a step forward, introducing neural networks in Rust. There exists a Rust Torch, which allows us to create any kind of neural network we want. The Bindings are the key point to landing a Rust Torch. Bindings allow the creation of foreign function interfaces or FFIs, which create a bridge between Rust and functions/codes written in a language. Good examples can be found in the Rust nomicon
To create bindings with C and C++ we can use bindgen, a library that automatically generated Rust FFI. From bindings to C++ api of PyTorch, Laurent Mazare has helped the Rust community to have a Rustacean version of PyTorch. As the GitHub page says, tch provides thin wrappers around the C++ libtorch . The big advantage is that the library is strictly similar to the original ones, so there are no learning barriers to overcome. The core code is quite easy to read.
First of all, let’s have a look at the code. This is the best starting point to get an additional understanding of the Rust infrastructure.
Firstly, to have an idea about Rust FFI we can peep these files . Most of them are automatically generated, while Laurent and coworkers have put together magnificent pieces of code to connect C++ Torch APIs with Rust.
Following, we can start reading the core code in src
, in particular, let’s have a look at init.rs
. After the definition of an enum Init
there is a public function pub fn f_init
, which matches the input initialisation method and returns a tensor for weights and one for biases. We can learn the use of match
which reflects switch
in C and match
in Python 3.10. Weights and bias tensors are initialised through random, uniform, Kaiming, or orthogonal methods (fig.1).
Then, for the type enum Init
we have the methods implementation impl Init
. The implemented method is a setter pub fn set(self, tensor: &mut Tensor)
which is a great example to further appreciate the concept of ownership and borrowship in Rust:
We talked about borrowship in our very first tutorial. It’s the right time to understand better this concept. Suppose we could have a similar set
function:
pub fn set(self, tensor: Tensor){}
In the main code, we could call this function, passing a tensor Tensor
. The Tensor
will be set and we will be happy. However, what if we are calling set
on Tensor
again? Well, we would run into the error value used here after move
. What does this mean? This error is telling you that you moved Tensor
into set
. A move
means that you have transferred ownership to self
in set
When you’re calling set(self, tensor: Tensor)
again, you would like to have ownership back of Tensor
for setting up again. Luckily in Rust this is not possible, differently in C++. In Rust, once a move
has been done the memory allocated for the process gets deallocated. Thus, what we want to do here is to borrow the value of Tensor
to set
so we can keep ownership. To do that we need to call Tensor
by reference, so tensor: &Tensor
. Since we are expecting Tensor
to mutate we’ll have to add mut
so: tensor: &mut Tensor
Moving forward, we can see another important element, which is simple and makes use of the Init
class: Linear
, namely a fully connected neural network layer:
Fig. 3 shows how easy is to set up a fully connected layer, which is made of a weight matrix ws_init
and bias matrix bs_init
. The default initialisation is made with super::Init::KaimingUniform
for weights, a function we saw above.
The main fully connected layer can then be created with the function linear
. As you can see in the function signature, namely what’s between the <...>
, there are a few interesting things (fig.4). Firstly, the lifetime annotation'a
. As we said above Rust automatically recognises when a variable has gone out of scope and can be freed. We can annotate some variables to have a specific lifetime, so we can decide how long they can live. The standard annotation is 'a
where '
denotes a lifetime parameter. One important thing to remember is that this signature doesn’t modify anything within the function, but it tells the function borrower to recognise all those variables whose lifetime can satisfy the constraints we are imposing.
The second argument is T: Borrow<super::Path<'a>
This annotation means: take nn::Path
specified in var_store.rs
and borrow this type to T
. Any type in Rust is free to borrow as several different types. This type will be used to define the input hardware (e.g. GPU), as you can see with vs:T
. Finally, the input and output dimensions of the network are specified as integers in_dim: i64, out_dim: i64
along with the LinearConfig
for initialization of weight and bias c: LinearConfig.
It’s time to get our hands dirty and play with Torch Rust. Let’s set up a simple linear neural network, then a sequential network, and finally a convolutional neural network using the MNIST dataset. As always you can find all the materials on my ML ❤ Rust repo. Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset and it has been made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.
A simple neural network in Rust
As always, the first step for a new Rust project is cargo new NAME_OF_THE_PROJECT
in this case simple_neural_networks
. Then, we can start setting up the Cargo.toml
with all the packages we need: we’ll be using mnist
, ndarry
and obviously tch
— fig.5. I decided to use mnist
to extract the original MNIST data, so we can see how to transform and deal with array and tensors. Feel free to use the vision
resource already present in tch.
We’ll be using mnist
to download the MNIST dataset, and ndarray
to perform some transforms on the image vectors, and convert them into tch::Tensor
.
Let’s jump to the main.rs
code. In a nutshell, we need:
- to download and extract the MNIST images and return a vector for training, validation, and test data.
- From these vectors, we’ll have to perform some conversion to
Tensor
so we’ll be able to usetch
. - Finally, we’ll implement a series of epochs, in each epoch we’ll multiply the input data with the neural network weight matrix and we’ll perform backpropagation to update the weight values.
mnist
automatically downloads the input files from here. We need to add features = ['download']
in Cargo.toml
to activate the download functionality. After files have been downloaded, raw data is extracted — download_and_extract()
— and subdivided into training, validation and test sets. Note that the main function will not return anything, so you need to specify -> Results<(), Box<dyn, Error>>
and Ok(())
at the end of the code (fig.6)
Now, the very first Torch thing of the code: convert an array to Tensor.
The output data from mnist
is Vec<u8>
. The training vector structure has aTRAIN_SIZE
number of images, whose dimensions areHEIGHT
times WIDTH
. These three parameters can be specified as usize
type and, together with the input data-vector, they can be passed to image_to_tensor
function, as shown in fig.7, returning Tensor
The input Vec<u8>
data can be reshaped to Array3
with from_shape_vec
and values are normalised and converted to f32
, namely .map(|x| *x as f32/256.0)
. From an array it is easy to build up a torch Tensor as shown on line 14, Tensor::of_slice(inp_data.as_slice().unwrap());
. The output tensor size will be dim1 x (dim2*dim3)
For our training data, setting TRAIN_SIZE=50'000
, HEIGHT=28
and WIDTH=28
, the output training tensor size will be 50'000 x 784
.
Similarly, we’ll convert the labels to a tensor, whose size will be dim1
— so for the training labels we’ll have a 50'000
long tensor https://github.com/Steboss/ML_and_Rust/blob/aa7d495c4a2c7a416d0b03fe62e522b6225180ab/tutorial_3/simple_neural_networks/src/main.rs#L42
We’re now ready to start tackling with linear neural network. After a zero-initialization of weight and bias matrices:
let mut ws = Tensor::zeros(&[(HEIGHT*WIDTH) as i64, LABELS], kind::FLOAT_CPU).set_requires_grad(true);let mut bs = Tensor::zeros(&[LABELS], kind::FLOAT_CPU).set_requires_grad(true);
which resembles the PyTorch implementation, we can start computing the neural network weights.
Fig.8 shows the main routine to run the training of a linear neural network. Firstly, we can give a name to the outermost for loop with 'train
The apostrophe, in this case, is not an indicator of a lifetime, but of loop name. We are monitoring the loss for each epoch. If two consecutive losses difference is less than THRES
we can stop the outermost cycle as we reached convergence — you can disagree, but for the moment let’s keep it 🙂 The entire implementation is super simple to read, just a little caveat in extracting the accuracy from the computed logits
and the jobs is done 🙂
When you are ready you can directly run the entire main.rs
code with cargo run
On my 2019 MacBook Pro, 2.6GHZ, 6-CORE Intel Core i7, 16GB RAM, the computation takes less than a minute, achieving a test accuracy of 90.45% after 65 epochs
Sequential neural network
Let’s now see the sequential neural network implementation https://github.com/Steboss/ML_and_Rust/tree/master/tutorial_3/custom_nnet
Fig.9 explains how the sequential network is created. Firstly, we need to import tch::nn::Module
. Then we can create a function for the neural network fn net(vs: &nn::Path) -> impl Module
. This function returns an implementation for Module
and receives as input nn::Path
which is structural info about the hardware to use for running the network (e.g. CPU or GPU). Then, the sequential network is implemented as a combination of linear layer of input size IMAGE_DIM
and HIDDEN_NODES
nodes, a relu
and a final linear layer with HIDDEN_NODES
inputs and LABELS
output.
Thus, in the main code we’ll call the neural network creation as:
// set up variable store to check if cuda is available
let vs = nn::VarStore::new(Device::cuda_if_available());// set up the seq net
let net = net(&vs.root());// set up optimizer
let mut opt = nn::Adam::default().build(&vs, 1e-4)?;
along with an Adam optimizer — remember the ?
at the end of opt
otherwise you’ll return a Result<>
type which doesn’t have the functionality we need. At this point we can simply followed the procedure as per PyTorch, so we’ll set up a number of epochs and perform the backpropagation withthe optimizer’s backward_step
method with a given loss
Convolutional neural network
Our final step for today is dealing with convolutional neural network: https://github.com/Steboss/ML_and_Rust/tree/master/tutorial_3/conv_nnet/src
At first, you can notice we are now using nn::ModuleT
. This module trait is an additional train parameter. This is commonly used to differentiate the behaviour of the network between training and evaluation. Then, we can start defining the structure of the network Net
which is made of two conv2d layers and two linear ones. The implementation of Net
states how the network is made, the two convolutional layers have a stride of 1 and 32, padding 32 and 64, and dilation of 5 and 5 respectively. The linear layers receive an input of 1024 and the final layer returns an output of 10 elements. Finally, we need to define the ModuleT
implementation for Net
. Here, the forward step forward_t
receives an additional boolean argument, train
and it will return a Tensor
. The forward step applies the convolutional layer, along with max_pool_2d
and dropout
. The dropout step is just for training purposes, so it’s bound with the boolean train
.
To increase the training performance, we’ll train the conv-layer with batches from the input tensor. For this reason you need to implement a function to split into random batches the input tensors:
generate_random_index
takes the input image array and the batch size we want to split it to. It creates an output tensor of random integers ::randint
.
Fig.13 shows the training step. The input dataset is split into n_it
batches where let n_it = (TRAIN_SIZE as i64)/BATCH_SIZE;
. For each batch we compute the loss from the network and back propagate the error with backward_step
.
Running the convolutional network on my local laptop required few minutes, achieving a validation accuracy of 97.60%.
You made it! I am proud of you! Today we had a little peep to tch
and how to set up a few computer vision experiments. We saw the inner structure of the code for the initialization and the linear layer. We reviewed some important concepts about borrowship in Rust and we learned what’s a lifetime annotation. Then, we jumped into the implementation of a simple linear neural network, a sequential neural network, and a convolutional one. Here we learned how to process how to input images and convert them to tch::Tensor.
We saw how to use the module nn:Module
for a simple neural network, to implement a forward step and we saw also its extension nn:ModuleT
. For all these experiments we saw two methods to perform backpropagation, either with zero_grad
and backward
or with backward_step
directly applied to the optimizer.
I hope you enjoyed my tutorial 🙂 Stay tuned for the next episode.
Can we use PyTorch in Rust? What are Rust bindings? What’s tch-rs? A look on neural networks in Rust
It’s been a while since the last time when we had a look at Rust and its application to Machine Learning — please, scroll down to the bottom for the previous tutorials on ML and Rust. Today I would like to present you a step forward, introducing neural networks in Rust. There exists a Rust Torch, which allows us to create any kind of neural network we want. The Bindings are the key point to landing a Rust Torch. Bindings allow the creation of foreign function interfaces or FFIs, which create a bridge between Rust and functions/codes written in a language. Good examples can be found in the Rust nomicon
To create bindings with C and C++ we can use bindgen, a library that automatically generated Rust FFI. From bindings to C++ api of PyTorch, Laurent Mazare has helped the Rust community to have a Rustacean version of PyTorch. As the GitHub page says, tch provides thin wrappers around the C++ libtorch . The big advantage is that the library is strictly similar to the original ones, so there are no learning barriers to overcome. The core code is quite easy to read.
First of all, let’s have a look at the code. This is the best starting point to get an additional understanding of the Rust infrastructure.
Firstly, to have an idea about Rust FFI we can peep these files . Most of them are automatically generated, while Laurent and coworkers have put together magnificent pieces of code to connect C++ Torch APIs with Rust.
Following, we can start reading the core code in src
, in particular, let’s have a look at init.rs
. After the definition of an enum Init
there is a public function pub fn f_init
, which matches the input initialisation method and returns a tensor for weights and one for biases. We can learn the use of match
which reflects switch
in C and match
in Python 3.10. Weights and bias tensors are initialised through random, uniform, Kaiming, or orthogonal methods (fig.1).
Then, for the type enum Init
we have the methods implementation impl Init
. The implemented method is a setter pub fn set(self, tensor: &mut Tensor)
which is a great example to further appreciate the concept of ownership and borrowship in Rust:
We talked about borrowship in our very first tutorial. It’s the right time to understand better this concept. Suppose we could have a similar set
function:
pub fn set(self, tensor: Tensor){}
In the main code, we could call this function, passing a tensor Tensor
. The Tensor
will be set and we will be happy. However, what if we are calling set
on Tensor
again? Well, we would run into the error value used here after move
. What does this mean? This error is telling you that you moved Tensor
into set
. A move
means that you have transferred ownership to self
in set
When you’re calling set(self, tensor: Tensor)
again, you would like to have ownership back of Tensor
for setting up again. Luckily in Rust this is not possible, differently in C++. In Rust, once a move
has been done the memory allocated for the process gets deallocated. Thus, what we want to do here is to borrow the value of Tensor
to set
so we can keep ownership. To do that we need to call Tensor
by reference, so tensor: &Tensor
. Since we are expecting Tensor
to mutate we’ll have to add mut
so: tensor: &mut Tensor
Moving forward, we can see another important element, which is simple and makes use of the Init
class: Linear
, namely a fully connected neural network layer:
Fig. 3 shows how easy is to set up a fully connected layer, which is made of a weight matrix ws_init
and bias matrix bs_init
. The default initialisation is made with super::Init::KaimingUniform
for weights, a function we saw above.
The main fully connected layer can then be created with the function linear
. As you can see in the function signature, namely what’s between the <...>
, there are a few interesting things (fig.4). Firstly, the lifetime annotation'a
. As we said above Rust automatically recognises when a variable has gone out of scope and can be freed. We can annotate some variables to have a specific lifetime, so we can decide how long they can live. The standard annotation is 'a
where '
denotes a lifetime parameter. One important thing to remember is that this signature doesn’t modify anything within the function, but it tells the function borrower to recognise all those variables whose lifetime can satisfy the constraints we are imposing.
The second argument is T: Borrow<super::Path<'a>
This annotation means: take nn::Path
specified in var_store.rs
and borrow this type to T
. Any type in Rust is free to borrow as several different types. This type will be used to define the input hardware (e.g. GPU), as you can see with vs:T
. Finally, the input and output dimensions of the network are specified as integers in_dim: i64, out_dim: i64
along with the LinearConfig
for initialization of weight and bias c: LinearConfig.
It’s time to get our hands dirty and play with Torch Rust. Let’s set up a simple linear neural network, then a sequential network, and finally a convolutional neural network using the MNIST dataset. As always you can find all the materials on my ML ❤ Rust repo. Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset and it has been made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.
A simple neural network in Rust
As always, the first step for a new Rust project is cargo new NAME_OF_THE_PROJECT
in this case simple_neural_networks
. Then, we can start setting up the Cargo.toml
with all the packages we need: we’ll be using mnist
, ndarry
and obviously tch
— fig.5. I decided to use mnist
to extract the original MNIST data, so we can see how to transform and deal with array and tensors. Feel free to use the vision
resource already present in tch.
We’ll be using mnist
to download the MNIST dataset, and ndarray
to perform some transforms on the image vectors, and convert them into tch::Tensor
.
Let’s jump to the main.rs
code. In a nutshell, we need:
- to download and extract the MNIST images and return a vector for training, validation, and test data.
- From these vectors, we’ll have to perform some conversion to
Tensor
so we’ll be able to usetch
. - Finally, we’ll implement a series of epochs, in each epoch we’ll multiply the input data with the neural network weight matrix and we’ll perform backpropagation to update the weight values.
mnist
automatically downloads the input files from here. We need to add features = ['download']
in Cargo.toml
to activate the download functionality. After files have been downloaded, raw data is extracted — download_and_extract()
— and subdivided into training, validation and test sets. Note that the main function will not return anything, so you need to specify -> Results<(), Box<dyn, Error>>
and Ok(())
at the end of the code (fig.6)
Now, the very first Torch thing of the code: convert an array to Tensor.
The output data from mnist
is Vec<u8>
. The training vector structure has aTRAIN_SIZE
number of images, whose dimensions areHEIGHT
times WIDTH
. These three parameters can be specified as usize
type and, together with the input data-vector, they can be passed to image_to_tensor
function, as shown in fig.7, returning Tensor
The input Vec<u8>
data can be reshaped to Array3
with from_shape_vec
and values are normalised and converted to f32
, namely .map(|x| *x as f32/256.0)
. From an array it is easy to build up a torch Tensor as shown on line 14, Tensor::of_slice(inp_data.as_slice().unwrap());
. The output tensor size will be dim1 x (dim2*dim3)
For our training data, setting TRAIN_SIZE=50'000
, HEIGHT=28
and WIDTH=28
, the output training tensor size will be 50'000 x 784
.
Similarly, we’ll convert the labels to a tensor, whose size will be dim1
— so for the training labels we’ll have a 50'000
long tensor https://github.com/Steboss/ML_and_Rust/blob/aa7d495c4a2c7a416d0b03fe62e522b6225180ab/tutorial_3/simple_neural_networks/src/main.rs#L42
We’re now ready to start tackling with linear neural network. After a zero-initialization of weight and bias matrices:
let mut ws = Tensor::zeros(&[(HEIGHT*WIDTH) as i64, LABELS], kind::FLOAT_CPU).set_requires_grad(true);let mut bs = Tensor::zeros(&[LABELS], kind::FLOAT_CPU).set_requires_grad(true);
which resembles the PyTorch implementation, we can start computing the neural network weights.
Fig.8 shows the main routine to run the training of a linear neural network. Firstly, we can give a name to the outermost for loop with 'train
The apostrophe, in this case, is not an indicator of a lifetime, but of loop name. We are monitoring the loss for each epoch. If two consecutive losses difference is less than THRES
we can stop the outermost cycle as we reached convergence — you can disagree, but for the moment let’s keep it 🙂 The entire implementation is super simple to read, just a little caveat in extracting the accuracy from the computed logits
and the jobs is done 🙂
When you are ready you can directly run the entire main.rs
code with cargo run
On my 2019 MacBook Pro, 2.6GHZ, 6-CORE Intel Core i7, 16GB RAM, the computation takes less than a minute, achieving a test accuracy of 90.45% after 65 epochs
Sequential neural network
Let’s now see the sequential neural network implementation https://github.com/Steboss/ML_and_Rust/tree/master/tutorial_3/custom_nnet
Fig.9 explains how the sequential network is created. Firstly, we need to import tch::nn::Module
. Then we can create a function for the neural network fn net(vs: &nn::Path) -> impl Module
. This function returns an implementation for Module
and receives as input nn::Path
which is structural info about the hardware to use for running the network (e.g. CPU or GPU). Then, the sequential network is implemented as a combination of linear layer of input size IMAGE_DIM
and HIDDEN_NODES
nodes, a relu
and a final linear layer with HIDDEN_NODES
inputs and LABELS
output.
Thus, in the main code we’ll call the neural network creation as:
// set up variable store to check if cuda is available
let vs = nn::VarStore::new(Device::cuda_if_available());// set up the seq net
let net = net(&vs.root());// set up optimizer
let mut opt = nn::Adam::default().build(&vs, 1e-4)?;
along with an Adam optimizer — remember the ?
at the end of opt
otherwise you’ll return a Result<>
type which doesn’t have the functionality we need. At this point we can simply followed the procedure as per PyTorch, so we’ll set up a number of epochs and perform the backpropagation withthe optimizer’s backward_step
method with a given loss
Convolutional neural network
Our final step for today is dealing with convolutional neural network: https://github.com/Steboss/ML_and_Rust/tree/master/tutorial_3/conv_nnet/src
At first, you can notice we are now using nn::ModuleT
. This module trait is an additional train parameter. This is commonly used to differentiate the behaviour of the network between training and evaluation. Then, we can start defining the structure of the network Net
which is made of two conv2d layers and two linear ones. The implementation of Net
states how the network is made, the two convolutional layers have a stride of 1 and 32, padding 32 and 64, and dilation of 5 and 5 respectively. The linear layers receive an input of 1024 and the final layer returns an output of 10 elements. Finally, we need to define the ModuleT
implementation for Net
. Here, the forward step forward_t
receives an additional boolean argument, train
and it will return a Tensor
. The forward step applies the convolutional layer, along with max_pool_2d
and dropout
. The dropout step is just for training purposes, so it’s bound with the boolean train
.
To increase the training performance, we’ll train the conv-layer with batches from the input tensor. For this reason you need to implement a function to split into random batches the input tensors:
generate_random_index
takes the input image array and the batch size we want to split it to. It creates an output tensor of random integers ::randint
.
Fig.13 shows the training step. The input dataset is split into n_it
batches where let n_it = (TRAIN_SIZE as i64)/BATCH_SIZE;
. For each batch we compute the loss from the network and back propagate the error with backward_step
.
Running the convolutional network on my local laptop required few minutes, achieving a validation accuracy of 97.60%.
You made it! I am proud of you! Today we had a little peep to tch
and how to set up a few computer vision experiments. We saw the inner structure of the code for the initialization and the linear layer. We reviewed some important concepts about borrowship in Rust and we learned what’s a lifetime annotation. Then, we jumped into the implementation of a simple linear neural network, a sequential neural network, and a convolutional one. Here we learned how to process how to input images and convert them to tch::Tensor.
We saw how to use the module nn:Module
for a simple neural network, to implement a forward step and we saw also its extension nn:ModuleT
. For all these experiments we saw two methods to perform backpropagation, either with zero_grad
and backward
or with backward_step
directly applied to the optimizer.
I hope you enjoyed my tutorial 🙂 Stay tuned for the next episode.