Hands-on Generative AI with GANs using Python: Image Generation | by Marcello Politi | Mar, 2023

By Jessie Hobb On Mar 28, 2023

Learn how to implement GANs with PyTorch to generate synthetic images

Introduction

In my previous article, we learned about Autoencoders, now let’s continue to talk about Generative AI. By now everyone is talking about it and everyone is excited about the practical applications that have been developed. But we continue to see the foundations of these AIs step by step.

There are several Machine Learning models that allow us to build generative AI, to name a few we have Variational Autoencoders (VAE), autoregressive models and even normalizing flow models. In this article, however, we will focus on GANs.

Autoencoders and GANs

In the previous article, we dealt with autoencoders and saw their architecture, their use and implementation in PyTorch.

In short, Autoencoders receive an input x, compress it into a vector of smaller size z, called the latent vector, and finally from z reconstruct x in a more or less approximate way.

In Autoencoder we have no data generation, but simply an approximate reconstruction of the input. Now imagine that we break the Autoencoder in two and consider only the second part, the part where from the latent vector z the image is reconstructed.

In this case, we can say that the architecture is generative. In fact, given a vector of numbers as input this creates an image! Essentially this is what a generative AI does. The main difference though with respect to autoencoders is that we know well the probability distribution from which we take the latent vector z. For example, a Gaussian(0,1).

So we thus have a way to generate images from random numbers taken from a Gaussian distribution, changing these random numbers will change the images we have in the output.

GANs Architecture

The orange network shown in the previous image can be defined as a G function that given the input z generates the synthetic output x_cap, so x_cap = G(z).

The network will be initialized with random weights, so it will not initially be able to generate output that looks real, but only images that will contain noise. So we need to do some training to improve the performance of our network.

So let’s imagine that we have a human annotator telling us each time whether the output is good or not, whether it looks real or not.

Obviously, we cannot do network training expecting a person to make continuous judgments about the output. But then what can we do?

If you think about it what the annotator does, in this case, is binary classification! And we in Machine Learning are great at developing classifiers. So we can simply train a classifier that we’ll call Discriminator, and we’ll denote with the function D(), which has to be trained to recognize synthetic (fake) images versus real images. So we will feed it both fake images and real images.

So this is how our architecture changes.

In short, the architecture is not too complex. The difficulty comes at the time of having to train these two networks G and D.

It is clear that if in training, the two networks have to improve together, they need to find some kind of balance. Because if, for example, D gets too good at distinguishing fake images from real ones before G gets good at generating them, it is quite natural that G will never get better and we will never have our generator ready to be used.

So the two networks are said to play an adversarial game in which G must fool D, and D must not be fooled by G.

GANs Objective Function

If we want to be a bit more precise, we can say that D and G have two complementary goals. Let’s suppose we want to generate images.

We define by D(x) the probability that x is a real image. Obviously, the discriminator wants to maximize its probability of recognizing real inputs from fake inputs. So we want to maximize D(x) when x is drawn from our distribution of real images.

In contrast, the purpose of the generator G is to fool the discriminator. So if G(z) is the fake image generated by G, D(G(z)) is the probability that D will recognize a fake image as real. Then 1-D(G(z)) is the probability that D correctly recognizes a fake image as fake. So G’s goal is to minimize 1-D(G(z)), since he does want to fool D.

So in the end we can sum up this game of maximization and minimization in the formula we find in the original paper (the formula looks a bit more concept but we have seen the concept):

Objective Function (src: https://arxiv.org/pdf/1406.2661.pdf)

GANs Implementation

We now implement a GAN capable of generating MNIST images automatically.

As usual, I will run my code a cloud-based environment Deepnote but you can use Google Colab as well, so even those who don’t have a GPU on their laptop can run this code.

We start by going to check whether indeed our hardware has a GPU.