Hands-on Generative AI with GANs using Python: DCGAN | by Marcello Politi | Apr, 2023

By Jessie Hobb On Apr 5, 2023

Photo by Vinicius “amnx” Amano on Unsplash

Improving synthetic image generation with convolutional layers in PyTorch

Introduction

In my previous article, we have seen how to use GANs to generate images of the type of MNIST dataset. We achieved good results and succeeded in our intent. However, the two networks G (generator) and D (discriminator) were composed mostly of dense layers. At this point, you will know that usually when working with images we use CNNs (convolutional neural networks) since they use convolutional layers. So let us see how to improve our GANs by using these types of layers. GANs that use convolutional layers are called DCGANs.

What is transposed deconvolution?

Typically when we work with CNNs we are used to working with convolutional layers. In this case, though we also need the “inverse” operation the transposed deconvolution, sometimes also called deconvolution.

This operation allows us to upsample the feature space. For example, if we have an image represented by a 5×5 grid we can “enlarge” this grid to make it 28×28.

What you do in principle is quite simple, you put zeros inside the elements of the initial feature map to enlarge it and then apply a normal convolution operation by using a certain kernel size, stride and padding.

For example, suppose we want to transform a 5×5 feature space to 8×8. First, by inserting zeros we create a 9×9 feature space then by applying a 2×2 filter we shrink it again to 8×8. Let’s look at a graphical example.

Transposed Convolution (Image By Author)

Also in this network, we are going to use batch-normalization layers, which help with the internal covariance shift problem. In a nutshell, what they do is normalize each batch before a layer so that there is no change in the distribution of the data during training.

Generator Architecture

The generator will then be formed by a sequence of transposed convolutional layers, they will bring the initial random vector z to have the correct size of the image we want to produce in this case 28×28. The depth of the feature maps on the other hand will go to be smaller and smaller, unlike the convolutional layers.

Discriminator Architecture

The discriminator, on the other hand, is a classical CNN network that has to classify images. So we will have a sequence of convolution layers until we reach a single number, the probability that the input is real or fake.

Let’s code!

First, check if you have a GPU available on your hardware.

import torch
print(torch.__version__)
print("GPU available: ", torch.cuda.is_available())
device = torch.device("cuda:0") if torch.cuda.is_available() else "cpu"

If you are working on Google Colab you would need to mount your drive. And import the necessary libraries.

from google.colab import drive
drive.mount('/content/drive/')

import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

Now we define the function to create the generator G network as we described earlier.

from torch.nn.modules.batchnorm import BatchNorm2ddef make_generator(input_size: int, n_filters:int) -> nn.Module:
model = nn.Sequential(
nn.ConvTranspose2d(in_channels= input_size, out_channels= n_filters*4, kernel_size= 4, stride= 1, padding= 0 , bias = False),
nn.BatchNorm2d(num_features= n_filters*4),
nn.LeakyReLU(0.2),
nn.ConvTranspose2d(in_channels= n_filters*4, out_channels= n_filters*2, kernel_size= 3, stride= 2, padding= 1 , bias = False),
nn.BatchNorm2d(num_features= n_filters*2),
nn.LeakyReLU(0.2),
nn.ConvTranspose2d(in_channels= n_filters*2, out_channels= n_filters, kernel_size= 4, stride= 2, padding= 1 , bias = False),
nn.BatchNorm2d(num_features= n_filters),
nn.LeakyReLU(0.2),
nn.ConvTranspose2d(in_channels= n_filters, out_channels= 1, kernel_size= 4, stride= 2, padding= 1 , bias = False),
nn.Tanh()
)
return model

To define the discriminator D instead we use a Python class since we need to the output of the forward method.

class Discriminator(nn.Module):
def __init__(self, n_filters):
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels= n_filters, kernel_size= 4, stride=2, padding=1, bias = False),
nn.LeakyReLU(0.2),
nn.Conv2d(in_channels=n_filters, out_channels= n_filters*2, kernel_size= 4, stride=2, padding=1, bias = False),
nn.BatchNorm2d(num_features=n_filters*2),
nn.LeakyReLU(0.2),
nn.Conv2d(in_channels=n_filters*2, out_channels= n_filters*4, kernel_size= 3, stride=2, padding=1, bias = False),
nn.BatchNorm2d(n_filters*4),
nn.LeakyReLU(0.2),
nn.Conv2d(in_channels=n_filters*4, out_channels= 1, kernel_size= 4, stride=1, padding=0, bias = False),
nn.Sigmoid()
)def forward(self, input):
output= self.network(input)
return output.view(-1,1).squeeze(0)

Now we can finally instantiate our G and D networks. Let’s also print the model to see the summary of the layers.

z_size = 100
image_size = (28,28)
n_filters = 32gen_model = make_generator(z_size, n_filters).to(device)
print(gen_model)

disc_model = Discriminator(n_filters).to(device)
print(disc_model)

As usual, we need to define the cost function and optimizers if we want to do network training.

loss_fn = nn.BCELoss()
g_optimizer = torch.optim.Adam(gen_model.parameters(), 0.0003)
d_optimizer = torch.optim.Adam(disc_model.parameters(), 0.0002)

The input vector z is a random vector, taken from some distribution that can be either uniform or normal in our case.

def create_noise(batch_size:int, z_size:int, mode_z:str):
if mode_z == 'uniform':
input_z = torch.rand(batch_size, z_size, 1 , 1)*2 -1
elif mode_z == 'normal':
input_z = torch.randn(batch_size, z_size, 1 , 1)
return input_z

Now let’s define the train function of the discriminator D. As we also did in the previous article, D must be trained on both real and fake images. The real images are taken directly from the MNIST dataset while for the fake ones we create an input z on the fly, pass it to the generator G and take the output of G. The labels we can create ourselves knowing that they will be all ones for the real images and zeros for the fake ones. The final loss will be the loss sum of the real images plus the fake ones.

def d_train(x):
disc_model.zero_grad()
#train discriminator with a real batch
batch_size = x.size(0)
x = x.to(device)
d_labels_real = torch.ones(batch_size, 1 , device=device)
d_proba_real = disc_model(x)
d_loss_real = loss_fn(d_proba_real, d_labels_real)
#train discriminator on fake batch
input_z = create_noise(batch_size, z_size, mode_z).to(device)
g_output = gen_model(input_z)
d_proba_fake = disc_model(g_output)
d_labels_fake = torch.zeros(batch_size, 1, device = device)
d_loss_fake = loss_fn(d_proba_fake, d_labels_fake)
#gradient backprop & optimize D params
d_loss = d_loss_real + d_loss_fake
d_loss.backward()
d_optimizer.step()
return d_loss.data.item(), d_proba_real.detach(), d_proba_fake.detach()

The generator takes as input the output of the discriminator since it has to see if D has figured out whether it is a fake or real image. And based on that it calculates its loss.

## train generator
def g_train(x):
gen_model.zero_grad()
batch_size = x.size(0)
input_z = create_noise(batch_size, z_size, mode_z).to(device)
g_labels_real = torch.ones(batch_size, 1, device = device)g_output = gen_model(input_z)
d_proba_fake = disc_model(g_output)
g_loss = loss_fn(d_proba_fake, g_labels_real)
#gradient backprop & otpimize Gen
g_loss.backward()
g_optimizer.step()
return g_loss.data.item()

We are ready to import the dataset that will allow us to do network training. With PyTorch importing the MNIST dataset is very easy since it has methods already implemented to do this.

import torchvision
from torchvision import transformsimage_path = './'
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean = (0.5), std = (0.5)),
])
mnist_dataset = torchvision.datasets.MNIST(
root = image_path, train = True,
transform = transform, download = True
)

Now that we have the dataset we can instantiate the dataloader.

from torch.utils.data import DataLoaderbatch_size = 32
dataloader = DataLoader(mnist_dataset, batch_size, shuffle = False)
input_real, label = next(iter(dataloader))
input_real = input_real.view(batch_size, -1) #reshape
mode_z = 'uniform'
input_z = create_noise(batch_size, z_size, mode_z)
print('input_real shape: ', input_real.shape)
print('input_z shape: ', input_z.shape)

Since at the end of the training, we would like to have an idea of how image generation is improved from time to time, we create a function that allows us to generate and save these images at each epoch.

def create_samples(g_model, input_z):
g_output = g_model(input_z)
images = torch.reshape(g_output, (batch_size, image_size[0], image_size[1]))
return (images+1)/2.0

Finally, we are ready to start the training. Choose the number of epochs, for good results it should be around 100. I only launched 10 and so I will have an “uglier” output.

num_epochs = 10
fixed_z = create_noise(batch_size, z_size, mode_z).to(device)
epoch_samples = []
d_losses = []
g_losses = []for epoch in range(1, num_epochs+1):
gen_model.train()
for i, (x,_) in enumerate(dataloader):
d_loss, d_proba_real, d_proba_fake = d_train(x)
d_losses.append(d_loss)
g_losses.append(g_train(x))
print(f'Epoch {epoch:03d} | Avg Losses >>'
f' G/D {torch.FloatTensor(g_losses).mean():.4f}'
f' /{torch.FloatTensor(d_losses).mean():.4f}')
gen_model.eval()
epoch_samples.append(
create_samples(
gen_model, fixed_z
).detach().cpu().numpy()
)

The training with 100 epochs should take about an hour, then of course it depends a lot on the hardware you have available.
Let’s plot the results to see if the network has learned how to generate these synthetic images.

selected_epochs = [1, 2, 4, 6, 10]
fig = plt.figure(figsize = (10,14))
for i,e in enumerate(selected_epochs):
for j in range(5):
ax = fig.add_subplot(6, 5, i*5+j+1)
ax.set_xticks([])
ax.set_yticks([])
if j == 0 :
ax.text(-0.06, 0.5, f'Epoch {e}', rotation = 90, size = 18, color = 'red', 
horizontalalignment = 'right', verticalalignment = 'center', transform = ax.transAxes)
image = epoch_samples[e-1][j]
ax.imshow(image, cmap= 'gray_r')
plt.show()

In this paper we have gone beyond the simple GAN network by also including convolution operations that are very effective when working with images, thus creating what is called DCGAN. To create these synthetic images we built two networks a generator G and discriminator D that play an adversarial game. If this article was helpful to you follow me for my upcoming articles on generative networks! 😉

Marcello Politi

Linkedin, Twitter, Website