Machine Learning and Rust (Part 4): Neural Networks in Torch | by Stefano Bosisio | Aug, 2022

By Jessie Hobb On Aug 17, 2022

Can we use PyTorch in Rust? What are Rust bindings? What’s tch-rs? A look on neural networks in Rust

It’s been a while since the last time when we had a look at Rust and its application to Machine Learning — please, scroll down to the bottom for the previous tutorials on ML and Rust. Today I would like to present you a step forward, introducing neural networks in Rust. There exists a Rust Torch, which allows us to create any kind of neural network we want. The Bindings are the key point to landing a Rust Torch. Bindings allow the creation of foreign function interfaces or FFIs, which create a bridge between Rust and functions/codes written in a language. Good examples can be found in the Rust nomicon

To create bindings with C and C++ we can use bindgen, a library that automatically generated Rust FFI. From bindings to C++ api of PyTorch, Laurent Mazare has helped the Rust community to have a Rustacean version of PyTorch. As the GitHub page says, tch provides thin wrappers around the C++ libtorch . The big advantage is that the library is strictly similar to the original ones, so there are no learning barriers to overcome. The core code is quite easy to read.

First of all, let’s have a look at the code. This is the best starting point to get an additional understanding of the Rust infrastructure.

Firstly, to have an idea about Rust FFI we can peep these files . Most of them are automatically generated, while Laurent and coworkers have put together magnificent pieces of code to connect C++ Torch APIs with Rust.

Following, we can start reading the core code in src, in particular, let’s have a look at init.rs. After the definition of an enum Init there is a public function pub fn f_init , which matches the input initialisation method and returns a tensor for weights and one for biases. We can learn the use of match which reflects switch in C and match in Python 3.10. Weights and bias tensors are initialised through random, uniform, Kaiming, or orthogonal methods (fig.1).

Fig.1: match case in Rust, which reflects switch in C and match in Python 3.10

Then, for the type enum Init we have the methods implementation impl Init . The implemented method is a setter pub fn set(self, tensor: &mut Tensor) which is a great example to further appreciate the concept of ownership and borrowship in Rust:

Fig.2: Implementation of init. Note the &mut Tensor, which is a great example for explaining borrowship in Rust.

We talked about borrowship in our very first tutorial. It’s the right time to understand better this concept. Suppose we could have a similar set function:

pub fn set(self, tensor: Tensor){}

In the main code, we could call this function, passing a tensor Tensor. The Tensor will be set and we will be happy. However, what if we are calling set on Tensor again? Well, we would run into the error value used here after move. What does this mean? This error is telling you that you moved Tensor into set. A move means that you have transferred ownership to self in set When you’re calling set(self, tensor: Tensor) again, you would like to have ownership back of Tensor for setting up again. Luckily in Rust this is not possible, differently in C++. In Rust, once a move has been done the memory allocated for the process gets deallocated. Thus, what we want to do here is to borrow the value of Tensor to set so we can keep ownership. To do that we need to call Tensor by reference, so tensor: &Tensor . Since we are expecting Tensor to mutate we’ll have to add mut so: tensor: &mut Tensor

Moving forward, we can see another important element, which is simple and makes use of the Init class: Linear , namely a fully connected neural network layer:

Fig.3: Define the linear structure and implement the Default configuration for it

Fig. 3 shows how easy is to set up a fully connected layer, which is made of a weight matrix ws_init and bias matrix bs_init . The default initialisation is made with super::Init::KaimingUniform for weights, a function we saw above.

The main fully connected layer can then be created with the function linear. As you can see in the function signature, namely what’s between the <...> , there are a few interesting things (fig.4). Firstly, the lifetime annotation'a. As we said above Rust automatically recognises when a variable has gone out of scope and can be freed. We can annotate some variables to have a specific lifetime, so we can decide how long they can live. The standard annotation is 'a where ' denotes a lifetime parameter. One important thing to remember is that this signature doesn’t modify anything within the function, but it tells the function borrower to recognise all those variables whose lifetime can satisfy the constraints we are imposing.

Fig.4: function to implement a fully connected neural network layer. In the function signature you can notice a lifetime annotation and a generic variable T which borrows a value from nn::Path

The second argument is T: Borrow<super::Path<'a> This annotation means: take nn::Path specified in var_store.rs and borrow this type to T . Any type in Rust is free to borrow as several different types. This type will be used to define the input hardware (e.g. GPU), as you can see with vs:T . Finally, the input and output dimensions of the network are specified as integers in_dim: i64, out_dim: i64 along with the LinearConfig for initialization of weight and bias c: LinearConfig.

It’s time to get our hands dirty and play with Torch Rust. Let’s set up a simple linear neural network, then a sequential network, and finally a convolutional neural network using the MNIST dataset. As always you can find all the materials on my ML ❤ Rust repo. Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset and it has been made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.

A simple neural network in Rust

As always, the first step for a new Rust project is cargo new NAME_OF_THE_PROJECT in this case simple_neural_networks . Then, we can start setting up the Cargo.toml with all the packages we need: we’ll be using mnist , ndarry and obviously tch — fig.5. I decided to use mnist to extract the original MNIST data, so we can see how to transform and deal with array and tensors. Feel free to use the vision resource already present in tch.

Fig.5: Cargo.toml for setting up a simple linear neural network.

We’ll be using mnist to download the MNIST dataset, and ndarray to perform some transforms on the image vectors, and convert them into tch::Tensor .

Let’s jump to the main.rs code. In a nutshell, we need:

to download and extract the MNIST images and return a vector for training, validation, and test data.
From these vectors, we’ll have to perform some conversion to Tensor so we’ll be able to use tch .
Finally, we’ll implement a series of epochs, in each epoch we’ll multiply the input data with the neural network weight matrix and we’ll perform backpropagation to update the weight values.

mnist automatically downloads the input files from here. We need to add features = ['download'] in Cargo.toml to activate the download functionality. After files have been downloaded, raw data is extracted — download_and_extract() — and subdivided into training, validation and test sets. Note that the main function will not return anything, so you need to specify -> Results<(), Box<dyn, Error>> and Ok(()) at the end of the code (fig.6)

Fig.6: Download, extract and create training, validation and test sets from mnist::MnistBuilder.

Now, the very first Torch thing of the code: convert an array to Tensor. The output data from mnist is Vec<u8> . The training vector structure has aTRAIN_SIZE number of images, whose dimensions areHEIGHT times WIDTH . These three parameters can be specified as usize type and, together with the input data-vector, they can be passed to image_to_tensor function, as shown in fig.7, returning Tensor

Fig.7: image_to_tensor function, given the input data vector, the number of image, height and width, we’ll return a tch::Tensor

The input Vec<u8> data can be reshaped to Array3 with from_shape_vec and values are normalised and converted to f32, namely .map(|x| *x as f32/256.0) . From an array it is easy to build up a torch Tensor as shown on line 14, Tensor::of_slice(inp_data.as_slice().unwrap()); . The output tensor size will be dim1 x (dim2*dim3) For our training data, setting TRAIN_SIZE=50'000 , HEIGHT=28 and WIDTH=28 , the output training tensor size will be 50'000 x 784 .

Similarly, we’ll convert the labels to a tensor, whose size will be dim1 — so for the training labels we’ll have a 50'000 long tensor https://github.com/Steboss/ML_and_Rust/blob/aa7d495c4a2c7a416d0b03fe62e522b6225180ab/tutorial_3/simple_neural_networks/src/main.rs#L42

We’re now ready to start tackling with linear neural network. After a zero-initialization of weight and bias matrices:

let mut ws = Tensor::zeros(&[(HEIGHT*WIDTH) as i64, LABELS], kind::FLOAT_CPU).set_requires_grad(true);let mut bs = Tensor::zeros(&[LABELS], kind::FLOAT_CPU).set_requires_grad(true);

which resembles the PyTorch implementation, we can start computing the neural network weights.

Fig.8: main training functions. For N_EPOCHS we are performing a matmul between input data and weights and biases. Accuracy and loss are computed for each epoch. If the difference between two consecutive losses is less than THRES we stop the learning iterations.

Fig.8 shows the main routine to run the training of a linear neural network. Firstly, we can give a name to the outermost for loop with 'train The apostrophe, in this case, is not an indicator of a lifetime, but of loop name. We are monitoring the loss for each epoch. If two consecutive losses difference is less than THRES we can stop the outermost cycle as we reached convergence — you can disagree, but for the moment let’s keep it 🙂 The entire implementation is super simple to read, just a little caveat in extracting the accuracy from the computed logits and the jobs is done 🙂

When you are ready you can directly run the entire main.rs code with cargo run On my 2019 MacBook Pro, 2.6GHZ, 6-CORE Intel Core i7, 16GB RAM, the computation takes less than a minute, achieving a test accuracy of 90.45% after 65 epochs

Sequential neural network

Let’s now see the sequential neural network implementation https://github.com/Steboss/ML_and_Rust/tree/master/tutorial_3/custom_nnet

Fig.9 explains how the sequential network is created. Firstly, we need to import tch::nn::Module. Then we can create a function for the neural network fn net(vs: &nn::Path) -> impl Module. This function returns an implementation for Module and receives as input nn::Path which is structural info about the hardware to use for running the network (e.g. CPU or GPU). Then, the sequential network is implemented as a combination of linear layer of input size IMAGE_DIM and HIDDEN_NODES nodes, a relu and a final linear layer with HIDDEN_NODES inputs and LABELS output.

Fig.9: Implementation of Sequential neural network

Thus, in the main code we’ll call the neural network creation as:

// set up variable store to check if cuda is available
let vs = nn::VarStore::new(Device::cuda_if_available());// set up the seq net
let net = net(&vs.root());// set up optimizer
let mut opt = nn::Adam::default().build(&vs, 1e-4)?;

along with an Adam optimizer — remember the ? at the end of opt otherwise you’ll return a Result<> type which doesn’t have the functionality we need. At this point we can simply followed the procedure as per PyTorch, so we’ll set up a number of epochs and perform the backpropagation withthe optimizer’s backward_step method with a given loss

Fig.10: training the sequential neural network for a given number of epochs, N_EPOCHS, and set up the backprop with opt.backward_step(&loss);

Convolutional neural network

Our final step for today is dealing with convolutional neural network: https://github.com/Steboss/ML_and_Rust/tree/master/tutorial_3/conv_nnet/src

Fig.11: Convolutional neural network structure

At first, you can notice we are now using nn::ModuleT. This module trait is an additional train parameter. This is commonly used to differentiate the behaviour of the network between training and evaluation. Then, we can start defining the structure of the network Net which is made of two conv2d layers and two linear ones. The implementation of Net states how the network is made, the two convolutional layers have a stride of 1 and 32, padding 32 and 64, and dilation of 5 and 5 respectively. The linear layers receive an input of 1024 and the final layer returns an output of 10 elements. Finally, we need to define the ModuleT implementation for Net. Here, the forward step forward_t receives an additional boolean argument, train and it will return a Tensor. The forward step applies the convolutional layer, along with max_pool_2d and dropout. The dropout step is just for training purposes, so it’s bound with the boolean train.

To increase the training performance, we’ll train the conv-layer with batches from the input tensor. For this reason you need to implement a function to split into random batches the input tensors:

Fig.12: generate random indexes for creating batches from the input pool of images

generate_random_index takes the input image array and the batch size we want to split it to. It creates an output tensor of random integers ::randint.

Fig.13: Training epochs for convolutional neural network. For each epoch we batch through the input dataset and we train the model computing the cross entropy.

Fig.13 shows the training step. The input dataset is split into n_it batches where let n_it = (TRAIN_SIZE as i64)/BATCH_SIZE;. For each batch we compute the loss from the network and back propagate the error with backward_step.

Running the convolutional network on my local laptop required few minutes, achieving a validation accuracy of 97.60%.

You made it! I am proud of you! Today we had a little peep to tch and how to set up a few computer vision experiments. We saw the inner structure of the code for the initialization and the linear layer. We reviewed some important concepts about borrowship in Rust and we learned what’s a lifetime annotation. Then, we jumped into the implementation of a simple linear neural network, a sequential neural network, and a convolutional one. Here we learned how to process how to input images and convert them to tch::Tensor. We saw how to use the module nn:Module for a simple neural network, to implement a forward step and we saw also its extension nn:ModuleT. For all these experiments we saw two methods to perform backpropagation, either with zero_grad and backward or with backward_step directly applied to the optimizer.

I hope you enjoyed my tutorial 🙂 Stay tuned for the next episode.