How to use M1 Mac GPU on PyTorch

By Jessie Hobb On Jun 1, 2022

How do the new M1 chips perform with the new PyTorch update?

The release of M1 Macs in November 2020 marked a significant step up in the processing power of Apple machines [1]. Unfortunately, these new features were not integrated into PyTorch until now.

Today’s deep learning models owe a great deal of their exponential performance gains to ever increasing model sizes. Those larger models require more computations to train and run.

Video version of this article 🙌🏼

These models are simply too big to be run on CPU hardware, which performs large step-by-step computations. Instead, they need massively parallel computations, like those performed by GPUs.

GPUs use a highly parallel structure, originally designed to process images for visual heavy processes. They became essential components in gaming for rendering real-time 3D images.

That ability to render 3D images works well with the multi-dimensional computations required in deep learning models. Naturally, GPUs became the go to architecture for model training and inference.

GPUs are essential for the scale of today’s models. Using CPUs makes many of these models too slow to be useful, which can make deep learning on M1 machines rather disappointing.

TensorFlow supported GPU-accelerated from the outset [2], but TensorFlow represents just one of the two dominant libraries for deep learning. PyTorch fell behind in their M1 support. Fortunately, they just caught up.

PyTorch v1.12 introduces GPU-accelerated training on Apple silicon. It comes as a collaborative effort between PyTorch and the Metal engineering team at Apple.

It uses Apple’s Metal Performance Shaders (MPS) as the backend for PyTorch operations. MPS is fine-tuned for each family of M1 chips. In short, this means that the integration is fast.

Training and inference/evaluation using the new MPS backend. Source.

Taking a look at the baselines (using the M1 Ultra chip) demonstrates a ~7x speedup on training and ~14x speedup on inference for the popular BERT model.

Using my own first generation M1 MacBook Pro I unfortunately don’t see the same speedup, particularly when using a batch size of 64 shown above.

BERT inference time across various batch sizes using the base spec M1 MacBook Pro.

Maybe that is down to inefficient code or my comparatively puny base spec MacBook Pro, but I’ll take a 200% speedup any day. Now, rather than looking at charts and numbers let’s see how to use this new MPS-enabled PyTorch.

OS and Python Prerequisites

There are a few things that might trip you up before even getting started. The first are prerequisites. MPS-enabled PyTorch requires MacOS 12.3+ and a ARM Python installation. We can check both of these with:

import platformplatform.platform()[GOOD] >> macOS-12.4-arm64-arm-64bit
[BAD]  >> macOS-11.8-x86_64-i386-64bit

There are two things that this shows us, the [ 12.4 | 11.8 ] refers to the MacOS version, this must be 12.3 or more. If it isn’t, update your MacOS! The other is [ arm64 | x86 ]. We want arm64, if you see x86 then we need to create a new ARM environment for Python.

If using Anaconda we switch to a terminal window and create a new ARM environment like so:

CONDA_SUBDIR=osx-arm64 conda create -n ml python=3.9 -c conda-forge

Here we are setting the conda version variable to use the ARM environment. We then create a new conda environment with name (-n ) ml. Next, we set the environment to use Python 3.9 and ensure the conda-forge package repository is included in our channels (-c).

(If using another version of Python, check where you installed it from for an ARM version).

With our environment initialized we activate it with conda activate ml and modify the CONDA_SUBDIR variable to permanently use osx-arm64. Otherwise, we may default back to an incorrect x84 environment for future pip installs.

conda env config vars set CONDA_SUBDIR=osx-arm64

You may see a message asking you to reactivate the environment for these changes to take effect. If so, switch out of and back into the ml environment with:

conda activate
conda activate ml

PyTorch Installation

To get started we need to install PyTorch v1.12. For now, this is only available as a nightly release.

pip3 install -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

During downloads you should be able to see something like **Downloading torch-1.1x.x. — arm64.whl**. That final **arm64.whl** part is important and tells us we are downloading the correct version.

We will be using the transformers and datasets libraries, which are installed with a pip install transformers datasets.

Side note: The transformers library uses tokenizers built in Rust (it makes them faster). Because we are using this new ARM64 environment we may get ERROR: Failed building wheel for tokenizers. If so, we install Rust (in the same environment) with:

curl — proto ‘=https’ — tlsv1.2 -sSf https://sh.rustup.rs | sh

And then `pip install transformers datasets` again.

In Python we can confirm MPS is working using torch.has_mps.

That’s it for this introduction to the new MPS-enabled PyTorch and how to use when performing inference and even training with popular models like BERT.

If you’d like to keep up to date with what I’m doing I post weekly on YouTube, and you get in touch directly via Discord. I hope to see you around!

*All images are by the author except where stated otherwise*

How do the new M1 chips perform with the new PyTorch update?