Transfer Learning: The Secret Weapon for Boosting Your ML Models | by Dimitris Poulopoulos | Dec, 2022

By Jessie Hobb On Dec 21, 2022

Unleashing the Power of Pretrained Models for Improved Performance and Faster Training Times

The development of Machine Learning models involves training algorithms on large sets of (labeled) data, which can be time-consuming and resource-intensive. Thus, we need techniques, such as distributed training or transfer learning, that allow us to iterate faster and can reduce the time from research to market.

Transfer learning is a powerful technique in machine learning that allows you to leverage the knowledge gained from solving one problem and apply it to a different yet related problem. In other words, transfer learning enables you to “transfer” the knowledge from a previously trained model to a new model, saving you the time and resources required to train a model from scratch.

In recent years, transfer learning has become increasingly popular in the machine learning community, and for good reason. It has been shown to significantly improve the performance of models, particularly when dealing with small datasets or when you want to fine-tune a model for a specific task. It has also been shown to significantly reduce training times, making it an attractive option for organizations looking to deploy machine learning models in a timely manner.

In this article, we will explore the concept of transfer learning in more detail. We’ll cover the basics of how it works in the world of computer vision and NLP and see how you can implement it in your own projects using PyTorch or Keras.

By the end of this article, you’ll have a good understanding of how transfer learning can benefit your ML projects and how you can use it to achieve better results.

Learning Rate is a newsletter for those who are curious about the world of AI and MLOps. You’ll hear from me on the first Saturday of every month with updates and thoughts on the latest AI news and articles. Subscribe here!

Transfer learning is a powerful technique that can be used in computer vision to improve the performance of machine learning models and reduce training times. As we saw before, it involves using the knowledge gained from a previously trained model and applying it to a new, related problem.

In the context of computer vision, transfer learning can be used to fine-tune a pre-trained model on a new dataset or to train a new model using a smaller dataset. This can be particularly useful when dealing with small or specialized datasets, where it may be difficult to train a model from scratch due to a lack of data.

For example, suppose you want to train a model to classify images of animals into specific categories. You might start by using a pre-trained image classification model, such as a convolutional neural network (CNN) trained on ImageNet, as your base. Then, you’d have to change the output layer of the model to comply with the categories or labels in your dataset. This allows you to leverage the knowledge gained from the pre-trained model and apply it to your specific problem.

Alternatively, you could use a smaller dataset to train a new model using transfer learning. In this case, you would start with a pre-trained model and use it as a starting point for your new model. This allows you to train a model with fewer data points, potentially reducing training times and allowing you to deploy your model more quickly.

Transfer learning is also widely used in natural language processing (NLP), a field of machine learning that focuses on the analysis and interpretation of human language. In NLP, transfer learning can be used to improve the performance of models and reduce training times, similar to its use in computer vision.

One common application of transfer learning in NLP is language modeling. Language models are used to predict the next word in a sequence of words, given the context of the previous words. These models are typically trained on large datasets of text, such as books or articles. When you train such a model, you get a system that understands human language pretty well.

The next step is to make the task of such a model more specific; for example, fine-tune it for tasks such as language translation, text generation, and text summarization. Jeremy Howard et al. popularized this technique in a seminal paper for NLP: Universal Language Model Fine-tuning for Text Classification.

Let’s see an example of using transfer learning with PyTorch. We will train a simple image classification model on the CIFAR-10 dataset:

First, we will start by installing PyTorch:

pip install torch
pip install torchvision

Then, download the CIFAR-10 dataset:

import torch
import torchvision
import torchvision.transforms as transformstrain_dataset = torchvision.datasets.CIFAR10(
root='.', train=True, download=True,
transform=transforms.Compose([transforms.ToTensor()]))

Next, we will define a simple convolutional neural network (CNN) to use as our base model. We will use a pre-trained VGG-16 model from the torchvision library as the base model and add a few additional layers on top for classification:

import torch.nn as nn
import torch.optim as optimclass TransferLearningModel(nn.Module):
def __init__(self, num_classes=10):
super(TransferLearningModel, self).__init__()
# Use a pre-trained VGG-16 model as the base
self.base_model = torchvision.models.vgg16(
weights=torchvision.models.VGG16_Weights.DEFAULT)
# Replace the classifier layer with a new one
self.base_model.classifier = nn.Sequential(
nn.Linear(in_features=25088, out_features=4096, bias=True),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5, inplace=False),
nn.Linear(in_features=4096, out_features=4096, bias=True),
nn.ReLU(inplace=True),
nn.Dropout(p=0.5, inplace=False),
nn.Linear(in_features=4096, out_features=num_classes, bias=True))
def forward(self, x):
return self.base_model(x)
model = TransferLearningModel()

We can then use the standard PyTorch API to define a loss function, optimizer, and data loaders for training and evaluation:

# Define a loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
# Define a data loader for the training set
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=32, shuffle=True)

Finally, we can train the model using standard PyTorch training loops:

# Train the model
for epoch in range(10):
for inputs, labels in train_loader:
# Clear the gradients
optimizer.zero_grad()# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)
# Backward pass
loss.backward()
optimizer.step()

This training loop trains the model for 10 epochs, where an epoch is a complete pass through the training dataset. In each epoch, the model processes the training data in batches, using the optimizer to update the model’s weights based on the computed gradients.

Here is an example of using transfer learning with Keras and TensorFlow to train a simple image classification model on the CIFAR-10 dataset.

First, we will start by installing TensorFlow:

pip install tensorflow

Then, we need to download the CIFAR-10 dataset:

import tensorflow as tf(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()

Next, we will define a simple convolutional neural network (CNN) to use as our base model. We will use a pre-trained VGG-16 model from the Keras Applications library as the base model and add a few additional layers on top for classification:

# Load the VGG-16 model
base_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False, input_shape=(32, 32, 3))
# Add a few layers on top of the base model
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(1024, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(10, activation='softmax')])
# Freeze the base model's layers
for layer in base_model.layers:
layer.trainable = False

We can then use the standard Keras API to compile the model and define a loss function and optimizer:

# Compile the model
model.compile(loss='sparse_categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])

Finally, we can train the model using the fit method:

# Train the model
model.fit(X_train, y_train, epochs=10, batch_size=32)

This trains the model for 10 epochs, where an epoch is a complete pass through the training dataset. In each epoch, the model processes the training data in batches, using the optimizer to update the model’s weights based on the computed gradients.

In conclusion, transfer learning is a powerful technique in machine learning that allows you to leverage the knowledge gained from a previously trained model and apply it to a new, related problem. It has been shown to significantly improve the performance of models and reduce training times, making it an attractive option for organizations looking to deploy machine learning models in a timely manner.

Transfer learning is widely used in both computer vision and natural language processing (NLP) for a variety of tasks, including image classification, object detection, language modeling, and text generation.

In this article, we saw what transfer learning is and how you can implement an application of this using either PyTorch or Keras.

My name is Dimitris Poulopoulos, and I’m a machine learning engineer working for Arrikto. I have designed and implemented AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA.

If you are interested in reading more posts about Machine Learning, Deep Learning, Data Science, and DataOps, follow me on Medium, LinkedIn, or @james2pl on Twitter.

Opinions expressed are solely my own and do not express the views or opinions of my employer.