Cutout, Mixup, and Cutmix: Implementing Modern Image Augmentations in PyTorch | by Leonie Monigatti | Apr, 2023

By Jessie Hobb On Apr 14, 2023

Data augmentation techniques for Computer Vision implemented in Python

Cutmix image augmentation (Background image drawn by the author, artificial photograph of statue generated with DALLE)

It’s almost guaranteed that applying data augmentations will improve the performance of your neural network. Augmentations are a regularization technique that artificially expands your training data and helps your Deep Learning model generalize better. Thus, image augmentations can improve the model performance.

Image augmentations can improve the model performance

Classical image augmentation techniques for convolutional neural networks in computer vision are scaling, cropping, flipping, or rotating an image.

In a recent article about intermediate Deep Learning techniques, we learned that the most effective image augmentation techniques aside from the classical ones are:

Data Augmentation Techniques: Mixup, Cutout, Cutmix

This article will briefly describe the above image augmentations and their implementations in Python for the PyTorch Deep Learning framework.

This tutorial will use a toy example of a “vanilla” image classification problem. The task is to classify images of tulips and roses:

Toy dataset [1] for image classification: Roses or tulips.

Insert your data here! — To follow along in this article, your dataset should look something like this:

Toy dataset [1] for image classification. Insert your data here.

PyTorch (version 1.11.0), OpenCV (version 4.5.4), and albumentations (version 1.3.0).

import torch
from torch.utils.data import DataLoader, Dataset
import torch.utils.data as data_utilsimport cv2
import numpy as np
import albumentations as A
from albumentations.pytorch import ToTensorV2

The PyTorch Dataset loads an image with OpenCV.

class ExampleDataset(Dataset):
def __init__(self, 
data, 
transform = None):
self.file_paths = data['file_paths'].values
self.labels = data['labels'].values
self.transform = transformdef __len__(self):
return len(self.file_paths)
def __getitem__(self, idx):
# Get file_path and label for index
label = self.labels[idx]
file_path = self.file_paths[idx]
# Read an image with OpenCV
image = cv2.imread(file_path)
# Convert the image to RGB color space.
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Apply augmentations
if self.transform:
augmented = self.transform(image=image)
image = augmented['image']
return image, label

The PyTorch DataLoader then partitions the dataset into batches of 8 images each for this example. The basic image transformation resizes the images to 256 by 256 pixels.

transforms = A.Compose([
A.Resize(256, 256), # Resize images
ToTensorV2()])example_dataset = ExampleDataset(train_df,
transform = transforms)
train_dataloader = DataLoader(example_dataset, 
batch_size = 8, 
shuffle = True, 
num_workers = 0)

Below you can see the training code for a vanilla image classification problem. All distracting code parts are omitted to simplify the code and only bring attention to the relevant parts.

Since we have a binary classification problem, we will be using nn.CrossEntropyLoss(). This is noteworthy because we will be implementing a custom loss function later.

# Define device, model, optimizer, learning rate scheduler
device = ...
model = ...
criterion = nn.CrossEntropyLoss()
optimizer = ... 
scheduler = ...for epoch in range(NUM_EPOCHS):        
# Train
model.train()
# Define any variables for metrics
...
# Iterate over data
for samples, labels in (train_dataloader):
samples, labels = samples.to(device), labels.to(device)
# Normalize
samples = samples/255
# Zero the parameter gradients
...
with torch.set_grad_enabled(True):
# Forward: Get model outputs and calculate loss
output = model(samples)
loss = criterion(output, labels)
# Backward: Optimize, step optimizer and calculate predictions
...
# Step scheduler
...
# Calculate any statistics
...
# Validate 
...

Cutout [2] was introduced in a paper called “Improved regularization of convolutional neural networks with cutout.” by DeVries & Taylor in 2017.

Brief description

The core idea behind Cutout image augmentation is to randomly remove a square region of pixels in an input image during training.

Cutout can prevent overfitting by forcing the model to learn more robust features.

Strengths:

Weaknesses:

Can remove important features, especially in sparse images

Implementation in Python with PyTorch

Luckily, Cutout is available in Albumentations, an image augmentation library. You can use the CoarseDropout class (a Cutout class was available in earlier library versions but has been deprecated).

transforms_cutout = A.Compose([
A.Resize(256, 256), 
A.CoarseDropout(max_holes = 1, # Maximum number of regions to zero out. (default: 8)
max_height = 128, # Maximum height of the hole. (default: 8) 
max_width = 128, # Maximum width of the hole. (default: 8) 
min_holes=None, # Maximum number of regions to zero out. (default: None, which equals max_holes)
min_height=None, # Maximum height of the hole. (default: None, which equals max_height)
min_width=None, # Maximum width of the hole. (default: None, which equals max_width)
fill_value=0, # value for dropped pixels.
mask_fill_value=None, # fill value for dropped pixels in mask. 
always_apply=False, 
p=0.5
),
ToTensorV2(),
])

The returned sample batch looks as follows:

Cutout image augmentation applied to sample batch.

Mixup [4] was introduced in a paper called “mixup: Beyond empirical risk minimization” by Zhang, Cisse, Dauphin, & Lopez-Paz also in 2017.

Brief description

The core idea behind Mixup image augmentation is to mix a random pair of input images and their labels during training.

Mixup can prevent overfitting by creating more diverse training samples and thus forcing the model to learn more generalizable features invariant to small changes in the images.

Strengths:

Weaknesses:

Can create blurred images, especially for images with complex textures

Implementation in Python with PyTorch

You must implement a mixup() function to apply Mixup image augmentation to your Deep Learning training pipeline. The following code is taken initially from this Kaggle Notebook by Riad and modified for this article.

The mixup() function applies Mixup to a full batch. The pairs are generated by shuffling the batch and selecting one image from the original batch and one from the shuffled batch.

# Copied and edited from https://www.kaggle.com/code/riadalmadani/fastai-effb0-base-model-birdclef2023
def mixup(data, targets, alpha):
indices = torch.randperm(data.size(0))
shuffled_data = data[indices]
shuffled_targets = targets[indices]lam = np.random.beta(alpha, alpha)
new_data = data * lam + shuffled_data * (1 - lam)
new_targets = [targets, shuffled_targets, lam]
return new_data, new_targets

In addition to the function that augments the images and labels, we must modify the loss function with a custom mixup_criterion() function. This function returns the loss for the two labels according to the lam.

# Copied and edited from https://www.kaggle.com/code/riadalmadani/fastai-effb0-base-model-birdclef2023
def mixup_criterion(preds, targets):
targets1, targets2, lam = targets[0], targets[1], targets[2]
criterion = nn.CrossEntropyLoss()
return lam * criterion(preds, targets1) + (1 - lam) * criterion(preds, targets2)

The mixup() and mixup_criterion() functions, are not applied in the PyTorch Dataset but in the training code as shown below.

Since the augmentation is applied to the full batch, we will also add a variable p_mixup that controls the portion of batches that will be augmented. E.g. p_mixup = 0.5 would apply Mixup augmentation to 50 % of batches in an epoch.

for epoch in range(NUM_EPOCHS):        
# Train
model.train()# Define any variables for metrics
...
for samples, labels in (train_dataloader):
samples, labels = samples.to(device), labels.to(device)
# Normalize
samples = samples/255
############################
# Apply Mixup augmentation #
############################
p = np.random.rand()
if p < p_mixup:
samples, labels = mixup(samples, labels, 0.8)
# Zero the parameter gradients
...
with torch.set_grad_enabled(True):
# Forward: Get model outputs and calculate loss
output = model(samples)
############################
# Apply Mixup criterion    #
############################      
if p < p_mixup:
loss = mixup_criterion(output, labels)
else:
loss = criterion(output, labels) 
# Backward: Optimize, step optimizer and calculate predictions
...
# Step scheduler, Calculate any statistics, validate
...

The returned sample batch looks as follows:

Mixup image augmentation applied to sample batch.

Cutmix [3] was introduced in a paper called “Cutmix: Regularization strategy to train strong classifiers with localizable features.” by Yun, Han, Oh, Chun, Choe & Yoo in 2019.

Brief description

The core idea behind cutmix image augmentation is to randomly select a pair of input images during training, cut a random patch of pixels from the first image and paste it to the second image, and then mix their labels proportionally to the area of the patch.

Cutmix can prevent overfitting by forcing the model to learn more robust and discriminative features.

Cutmix combines the strength and weaknesses of Cutout and Mixup:

Strengths:

Weaknesses:

Can create unrealistic images due to unnatural compositions
Can remove important features, especially in sparse images

Implementation in Python with PyTorch

The implementation for Cutmix is similar to the implementation of Mixup.

First, you will also need a custom function cutmix() that applies the image augmentation. The following code is taken initially from this Kaggle Notebook by Riad and modified for this article.

# Copied and edited from https://www.kaggle.com/code/riadalmadani/fastai-effb0-base-model-birdclef2023
def cutmix(data, targets, alpha):
indices = torch.randperm(data.size(0))
shuffled_data = data[indices]
shuffled_targets = targets[indices]lam = np.random.beta(alpha, alpha)
bbx1, bby1, bbx2, bby2 = rand_bbox(data.size(), lam)
data[:, :, bbx1:bbx2, bby1:bby2] = data[indices, :, bbx1:bbx2, bby1:bby2]
# adjust lambda to exactly match pixel ratio
lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (data.size()[-1] * data.size()[-2]))
new_targets = [targets, shuffled_targets, lam]
return data, new_targets
def rand_bbox(size, lam):
W = size[2]
H = size[3]
cut_rat = np.sqrt(1. - lam)
cut_w = int(W * cut_rat)
cut_h = int(H * cut_rat)
# uniform
cx = np.random.randint(W)
cy = np.random.randint(H)
bbx1 = np.clip(cx - cut_w // 2, 0, W)
bby1 = np.clip(cy - cut_h // 2, 0, H)
bbx2 = np.clip(cx + cut_w // 2, 0, W)
bby2 = np.clip(cy + cut_h // 2, 0, H)
return bbx1, bby1, bbx2, bby2

The rest is the same as for Mixup:

Define a cutmix_criterion() functions to handle the custom loss (see the implementation of mixup_criterion())
Define a variable p_cutmix to control the portion of batches that will be augmented (see p_mixup)
Apply cutmix() and cutmix_criterion() in accordance to p_cutmix in the training code (see the implementation of Mixup)

The returned sample batch looks as follows:

Cutmix image augmentation applied to sample batch.

This article has summarized three modern effective data augmentation techniques for computer vision:

Cutout [2]: randomly remove a square region of pixels in an input image
Mixup [4]: mix a random pair of input images and their labels
Cutmix [3]: randomly select a pair of input images, cut a random patch of pixels from the first image and paste it to the second image, and then mix their labels proportionally to the area of the patch.

Data Augmentation Techniques: Mixup, Cutout, Cutmix (Image by the author)

While Cutout applies the augmentation to a single image, Mixup and Cutmix create a new image from a pair of input images.

All of the discussed image augmentation techniques are easy to relatively easy to implement: For Cutout, the Albumentations library already has an implementation available out of the box. For Mixup and Cutmix the implementations are relatively simple and require the implementation of an augmentation function and a custom loss function.

Data augmentation techniques for Computer Vision implemented in Python

Image augmentations can improve the model performance

Classical image augmentation techniques for convolutional neural networks in computer vision are scaling, cropping, flipping, or rotating an image.

In a recent article about intermediate Deep Learning techniques, we learned that the most effective image augmentation techniques aside from the classical ones are:

This article will briefly describe the above image augmentations and their implementations in Python for the PyTorch Deep Learning framework.

This tutorial will use a toy example of a “vanilla” image classification problem. The task is to classify images of tulips and roses:

Insert your data here! — To follow along in this article, your dataset should look something like this:

PyTorch (version 1.11.0), OpenCV (version 4.5.4), and albumentations (version 1.3.0).

import torch
from torch.utils.data import DataLoader, Dataset
import torch.utils.data as data_utilsimport cv2
import numpy as np
import albumentations as A
from albumentations.pytorch import ToTensorV2

The PyTorch Dataset loads an image with OpenCV.

class ExampleDataset(Dataset):
def __init__(self, 
data, 
transform = None):
self.file_paths = data['file_paths'].values
self.labels = data['labels'].values
self.transform = transformdef __len__(self):
return len(self.file_paths)
def __getitem__(self, idx):
# Get file_path and label for index
label = self.labels[idx]
file_path = self.file_paths[idx]
# Read an image with OpenCV
image = cv2.imread(file_path)
# Convert the image to RGB color space.
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Apply augmentations
if self.transform:
augmented = self.transform(image=image)
image = augmented['image']
return image, label

The PyTorch DataLoader then partitions the dataset into batches of 8 images each for this example. The basic image transformation resizes the images to 256 by 256 pixels.

transforms = A.Compose([
A.Resize(256, 256), # Resize images
ToTensorV2()])example_dataset = ExampleDataset(train_df,
transform = transforms)
train_dataloader = DataLoader(example_dataset, 
batch_size = 8, 
shuffle = True, 
num_workers = 0)

Below you can see the training code for a vanilla image classification problem. All distracting code parts are omitted to simplify the code and only bring attention to the relevant parts.

Since we have a binary classification problem, we will be using nn.CrossEntropyLoss(). This is noteworthy because we will be implementing a custom loss function later.

# Define device, model, optimizer, learning rate scheduler
device = ...
model = ...
criterion = nn.CrossEntropyLoss()
optimizer = ... 
scheduler = ...for epoch in range(NUM_EPOCHS):        
# Train
model.train()
# Define any variables for metrics
...
# Iterate over data
for samples, labels in (train_dataloader):
samples, labels = samples.to(device), labels.to(device)
# Normalize
samples = samples/255
# Zero the parameter gradients
...
with torch.set_grad_enabled(True):
# Forward: Get model outputs and calculate loss
output = model(samples)
loss = criterion(output, labels)
# Backward: Optimize, step optimizer and calculate predictions
...
# Step scheduler
...
# Calculate any statistics
...
# Validate 
...

Cutout [2] was introduced in a paper called “Improved regularization of convolutional neural networks with cutout.” by DeVries & Taylor in 2017.

Brief description

The core idea behind Cutout image augmentation is to randomly remove a square region of pixels in an input image during training.

Cutout can prevent overfitting by forcing the model to learn more robust features.

Strengths:

Weaknesses:

Can remove important features, especially in sparse images

Implementation in Python with PyTorch

transforms_cutout = A.Compose([
A.Resize(256, 256), 
A.CoarseDropout(max_holes = 1, # Maximum number of regions to zero out. (default: 8)
max_height = 128, # Maximum height of the hole. (default: 8) 
max_width = 128, # Maximum width of the hole. (default: 8) 
min_holes=None, # Maximum number of regions to zero out. (default: None, which equals max_holes)
min_height=None, # Maximum height of the hole. (default: None, which equals max_height)
min_width=None, # Maximum width of the hole. (default: None, which equals max_width)
fill_value=0, # value for dropped pixels.
mask_fill_value=None, # fill value for dropped pixels in mask. 
always_apply=False, 
p=0.5
),
ToTensorV2(),
])

The returned sample batch looks as follows:

Mixup [4] was introduced in a paper called “mixup: Beyond empirical risk minimization” by Zhang, Cisse, Dauphin, & Lopez-Paz also in 2017.

Brief description

The core idea behind Mixup image augmentation is to mix a random pair of input images and their labels during training.

Mixup can prevent overfitting by creating more diverse training samples and thus forcing the model to learn more generalizable features invariant to small changes in the images.

Strengths:

Weaknesses:

Can create blurred images, especially for images with complex textures

Implementation in Python with PyTorch

The mixup() function applies Mixup to a full batch. The pairs are generated by shuffling the batch and selecting one image from the original batch and one from the shuffled batch.

# Copied and edited from https://www.kaggle.com/code/riadalmadani/fastai-effb0-base-model-birdclef2023
def mixup(data, targets, alpha):
indices = torch.randperm(data.size(0))
shuffled_data = data[indices]
shuffled_targets = targets[indices]lam = np.random.beta(alpha, alpha)
new_data = data * lam + shuffled_data * (1 - lam)
new_targets = [targets, shuffled_targets, lam]
return new_data, new_targets

# Copied and edited from https://www.kaggle.com/code/riadalmadani/fastai-effb0-base-model-birdclef2023
def mixup_criterion(preds, targets):
targets1, targets2, lam = targets[0], targets[1], targets[2]
criterion = nn.CrossEntropyLoss()
return lam * criterion(preds, targets1) + (1 - lam) * criterion(preds, targets2)

The mixup() and mixup_criterion() functions, are not applied in the PyTorch Dataset but in the training code as shown below.

for epoch in range(NUM_EPOCHS):        
# Train
model.train()# Define any variables for metrics
...
for samples, labels in (train_dataloader):
samples, labels = samples.to(device), labels.to(device)
# Normalize
samples = samples/255
############################
# Apply Mixup augmentation #
############################
p = np.random.rand()
if p < p_mixup:
samples, labels = mixup(samples, labels, 0.8)
# Zero the parameter gradients
...
with torch.set_grad_enabled(True):
# Forward: Get model outputs and calculate loss
output = model(samples)
############################
# Apply Mixup criterion    #
############################      
if p < p_mixup:
loss = mixup_criterion(output, labels)
else:
loss = criterion(output, labels) 
# Backward: Optimize, step optimizer and calculate predictions
...
# Step scheduler, Calculate any statistics, validate
...

The returned sample batch looks as follows:

Cutmix [3] was introduced in a paper called “Cutmix: Regularization strategy to train strong classifiers with localizable features.” by Yun, Han, Oh, Chun, Choe & Yoo in 2019.

Brief description

Cutmix can prevent overfitting by forcing the model to learn more robust and discriminative features.

Cutmix combines the strength and weaknesses of Cutout and Mixup:

Strengths:

Weaknesses:

Can create unrealistic images due to unnatural compositions
Can remove important features, especially in sparse images

Implementation in Python with PyTorch

The implementation for Cutmix is similar to the implementation of Mixup.

First, you will also need a custom function cutmix() that applies the image augmentation. The following code is taken initially from this Kaggle Notebook by Riad and modified for this article.

# Copied and edited from https://www.kaggle.com/code/riadalmadani/fastai-effb0-base-model-birdclef2023
def cutmix(data, targets, alpha):
indices = torch.randperm(data.size(0))
shuffled_data = data[indices]
shuffled_targets = targets[indices]lam = np.random.beta(alpha, alpha)
bbx1, bby1, bbx2, bby2 = rand_bbox(data.size(), lam)
data[:, :, bbx1:bbx2, bby1:bby2] = data[indices, :, bbx1:bbx2, bby1:bby2]
# adjust lambda to exactly match pixel ratio
lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (data.size()[-1] * data.size()[-2]))
new_targets = [targets, shuffled_targets, lam]
return data, new_targets
def rand_bbox(size, lam):
W = size[2]
H = size[3]
cut_rat = np.sqrt(1. - lam)
cut_w = int(W * cut_rat)
cut_h = int(H * cut_rat)
# uniform
cx = np.random.randint(W)
cy = np.random.randint(H)
bbx1 = np.clip(cx - cut_w // 2, 0, W)
bby1 = np.clip(cy - cut_h // 2, 0, H)
bbx2 = np.clip(cx + cut_w // 2, 0, W)
bby2 = np.clip(cy + cut_h // 2, 0, H)
return bbx1, bby1, bbx2, bby2

The rest is the same as for Mixup:

Define a cutmix_criterion() functions to handle the custom loss (see the implementation of mixup_criterion())
Define a variable p_cutmix to control the portion of batches that will be augmented (see p_mixup)
Apply cutmix() and cutmix_criterion() in accordance to p_cutmix in the training code (see the implementation of Mixup)

The returned sample batch looks as follows:

This article has summarized three modern effective data augmentation techniques for computer vision:

Cutout [2]: randomly remove a square region of pixels in an input image
Mixup [4]: mix a random pair of input images and their labels
Cutmix [3]: randomly select a pair of input images, cut a random patch of pixels from the first image and paste it to the second image, and then mix their labels proportionally to the area of the patch.

While Cutout applies the augmentation to a single image, Mixup and Cutmix create a new image from a pair of input images.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.