Techno Blender
Digitally Yours.

Hands-on Generative AI with GANs using Python: Image Generation | by Marcello Politi | Mar, 2023

0 50


Image by Author

Learn how to implement GANs with PyTorch to generate synthetic images

Introduction

In my previous article, we learned about Autoencoders, now let’s continue to talk about Generative AI. By now everyone is talking about it and everyone is excited about the practical applications that have been developed. But we continue to see the foundations of these AIs step by step.

There are several Machine Learning models that allow us to build generative AI, to name a few we have Variational Autoencoders (VAE), autoregressive models and even normalizing flow models. In this article, however, we will focus on GANs.

Autoencoders and GANs

In the previous article, we dealt with autoencoders and saw their architecture, their use and implementation in PyTorch.

In short, Autoencoders receive an input x, compress it into a vector of smaller size z, called the latent vector, and finally from z reconstruct x in a more or less approximate way.

In Autoencoder we have no data generation, but simply an approximate reconstruction of the input. Now imagine that we break the Autoencoder in two and consider only the second part, the part where from the latent vector z the image is reconstructed.

Output Generation (Image By Author)

In this case, we can say that the architecture is generative. In fact, given a vector of numbers as input this creates an image! Essentially this is what a generative AI does. The main difference though with respect to autoencoders is that we know well the probability distribution from which we take the latent vector z. For example, a Gaussian(0,1).

So we thus have a way to generate images from random numbers taken from a Gaussian distribution, changing these random numbers will change the images we have in the output.

Generative Model (Image By Author)

GANs Architecture

The orange network shown in the previous image can be defined as a G function that given the input z generates the synthetic output x_cap, so x_cap = G(z).

The network will be initialized with random weights, so it will not initially be able to generate output that looks real, but only images that will contain noise. So we need to do some training to improve the performance of our network.

So let’s imagine that we have a human annotator telling us each time whether the output is good or not, whether it looks real or not.

Towards GANs (Image By Author)

Obviously, we cannot do network training expecting a person to make continuous judgments about the output. But then what can we do?

If you think about it what the annotator does, in this case, is binary classification! And we in Machine Learning are great at developing classifiers. So we can simply train a classifier that we’ll call Discriminator, and we’ll denote with the function D(), which has to be trained to recognize synthetic (fake) images versus real images. So we will feed it both fake images and real images.

So this is how our architecture changes.

GANs Architecture (Image By Author)

In short, the architecture is not too complex. The difficulty comes at the time of having to train these two networks G and D.

It is clear that if in training, the two networks have to improve together, they need to find some kind of balance. Because if, for example, D gets too good at distinguishing fake images from real ones before G gets good at generating them, it is quite natural that G will never get better and we will never have our generator ready to be used.

So the two networks are said to play an adversarial game in which G must fool D, and D must not be fooled by G.

GANs Objective Function

If we want to be a bit more precise, we can say that D and G have two complementary goals. Let’s suppose we want to generate images.

We define by D(x) the probability that x is a real image. Obviously, the discriminator wants to maximize its probability of recognizing real inputs from fake inputs. So we want to maximize D(x) when x is drawn from our distribution of real images.

In contrast, the purpose of the generator G is to fool the discriminator. So if G(z) is the fake image generated by G, D(G(z)) is the probability that D will recognize a fake image as real. Then 1-D(G(z)) is the probability that D correctly recognizes a fake image as fake. So G’s goal is to minimize 1-D(G(z)), since he does want to fool D.

So in the end we can sum up this game of maximization and minimization in the formula we find in the original paper (the formula looks a bit more concept but we have seen the concept):

Objective Function (src: https://arxiv.org/pdf/1406.2661.pdf)

GANs Implementation

We now implement a GAN capable of generating MNIST images automatically.

As usual, I will run my code a cloud-based environment Deepnote but you can use Google Colab as well, so even those who don’t have a GPU on their laptop can run this code.

We start by going to check whether indeed our hardware has a GPU.

Now if you’re using Colab you can connect to Google Drive.

from google.colab import drive
drive.mount('/content/drive/')

Let’s import the needed libraries.

Now we need to create the functions that will define our networks, generator and discriminator.

The MNIST images have 784 pixels (since the images are 28×28). So the generator given as input a random z vector of length 20 will have to output a vector of 784 which will be our fake image.

Instead, the discriminator will receive as input a 28×28 = 784-pixel image, it will have a single neuron in output that will classify the image as true or fake.

Generator (Image By Author)

This function is used to instantiate the generator. Each layer will use a LeakyReLU (a variation of the ReLU, that works best in GANs) as its activation function, except the output is followed by a Hyperbolic Tangent (Tanh) function that results in output a number in the range [-1,1].

Discriminator (Image By Author)

Instead, this function defines the discriminator network, which has the special feature of using dropout after hidden layers (in the base case only one hidden layer). The output goes through a sigmoid function since it must give us the probability of being a real or fake image.

Now we also download the MNIST dataset that we are going to use. The MNIST dataset is in a range [0,255], but we want it in the range [-1,1] so that it will be similar to the data generated by the Generator network. So we also apply some preprocessing to do this.

Now we come to the most important part. We need to create the functions that define the training of our network. We have already said that we should pull the discriminator separately from the generator, so we will have 2 functions.

The discriminator will be trained both on the fake data and on real data. When we train it on real data the labels will always be “real” = 1. So we create a vector of 1 with d_labels_real = torch.ones(batch_size, 1, device = device). Then we feed the input x to the model and calculate the loss using Binary Cross Entropy.

We do the same thing by feeding fake data. Here the labels will all be zero, d_labels_fake = torch.zeros(batch_size, 1, device = device). The input instead will be the fake data, that is, the output of the generator g_output = gen_model(input_z). And we calculate the loss in the same way.

The final loss will be the sum of the two losses.

As for the generator train function, the implementation is slightly different. The generator takes as input the output of the discriminator since it has to see if D has figured out whether it is a fake or real image. And based on that it calculates its loss.

Now we can initialize our two networks.

Let’s define a function to create network-generated samples, so as we go along we can see how the fake images improve as the training epochs increase.

Now we can finally train the net! We save the losses each time in a list so we can plot them later.

The training should take about an hour, depending on the hardware cha you used certainly. But in the end, you can print out your fake data and have something like this.

In my case, I trained for a few epochs so the results are not great, but you are beginning to get a glimpse that the network was learning to generate MNIST-like images.

Fake Data (Image By Author)

In this article, we looked at how the architecture of GANs in more detail. We studied their objective function and were able to implement a network capable of generating images from the MNIST dataset! The operation of these networks is not too complicated but their training certainly is. Since we need to find that balance that allows both networks to learn. If you enjoyed this article follow me to read the next one on DCGANs.😉

Marcello Politi

Linkedin, Twitter, CV




Image by Author

Learn how to implement GANs with PyTorch to generate synthetic images

Introduction

In my previous article, we learned about Autoencoders, now let’s continue to talk about Generative AI. By now everyone is talking about it and everyone is excited about the practical applications that have been developed. But we continue to see the foundations of these AIs step by step.

There are several Machine Learning models that allow us to build generative AI, to name a few we have Variational Autoencoders (VAE), autoregressive models and even normalizing flow models. In this article, however, we will focus on GANs.

Autoencoders and GANs

In the previous article, we dealt with autoencoders and saw their architecture, their use and implementation in PyTorch.

In short, Autoencoders receive an input x, compress it into a vector of smaller size z, called the latent vector, and finally from z reconstruct x in a more or less approximate way.

In Autoencoder we have no data generation, but simply an approximate reconstruction of the input. Now imagine that we break the Autoencoder in two and consider only the second part, the part where from the latent vector z the image is reconstructed.

Output Generation (Image By Author)

In this case, we can say that the architecture is generative. In fact, given a vector of numbers as input this creates an image! Essentially this is what a generative AI does. The main difference though with respect to autoencoders is that we know well the probability distribution from which we take the latent vector z. For example, a Gaussian(0,1).

So we thus have a way to generate images from random numbers taken from a Gaussian distribution, changing these random numbers will change the images we have in the output.

Generative Model (Image By Author)

GANs Architecture

The orange network shown in the previous image can be defined as a G function that given the input z generates the synthetic output x_cap, so x_cap = G(z).

The network will be initialized with random weights, so it will not initially be able to generate output that looks real, but only images that will contain noise. So we need to do some training to improve the performance of our network.

So let’s imagine that we have a human annotator telling us each time whether the output is good or not, whether it looks real or not.

Towards GANs (Image By Author)

Obviously, we cannot do network training expecting a person to make continuous judgments about the output. But then what can we do?

If you think about it what the annotator does, in this case, is binary classification! And we in Machine Learning are great at developing classifiers. So we can simply train a classifier that we’ll call Discriminator, and we’ll denote with the function D(), which has to be trained to recognize synthetic (fake) images versus real images. So we will feed it both fake images and real images.

So this is how our architecture changes.

GANs Architecture (Image By Author)

In short, the architecture is not too complex. The difficulty comes at the time of having to train these two networks G and D.

It is clear that if in training, the two networks have to improve together, they need to find some kind of balance. Because if, for example, D gets too good at distinguishing fake images from real ones before G gets good at generating them, it is quite natural that G will never get better and we will never have our generator ready to be used.

So the two networks are said to play an adversarial game in which G must fool D, and D must not be fooled by G.

GANs Objective Function

If we want to be a bit more precise, we can say that D and G have two complementary goals. Let’s suppose we want to generate images.

We define by D(x) the probability that x is a real image. Obviously, the discriminator wants to maximize its probability of recognizing real inputs from fake inputs. So we want to maximize D(x) when x is drawn from our distribution of real images.

In contrast, the purpose of the generator G is to fool the discriminator. So if G(z) is the fake image generated by G, D(G(z)) is the probability that D will recognize a fake image as real. Then 1-D(G(z)) is the probability that D correctly recognizes a fake image as fake. So G’s goal is to minimize 1-D(G(z)), since he does want to fool D.

So in the end we can sum up this game of maximization and minimization in the formula we find in the original paper (the formula looks a bit more concept but we have seen the concept):

Objective Function (src: https://arxiv.org/pdf/1406.2661.pdf)

GANs Implementation

We now implement a GAN capable of generating MNIST images automatically.

As usual, I will run my code a cloud-based environment Deepnote but you can use Google Colab as well, so even those who don’t have a GPU on their laptop can run this code.

We start by going to check whether indeed our hardware has a GPU.

Now if you’re using Colab you can connect to Google Drive.

from google.colab import drive
drive.mount('/content/drive/')

Let’s import the needed libraries.

Now we need to create the functions that will define our networks, generator and discriminator.

The MNIST images have 784 pixels (since the images are 28×28). So the generator given as input a random z vector of length 20 will have to output a vector of 784 which will be our fake image.

Instead, the discriminator will receive as input a 28×28 = 784-pixel image, it will have a single neuron in output that will classify the image as true or fake.

Generator (Image By Author)

This function is used to instantiate the generator. Each layer will use a LeakyReLU (a variation of the ReLU, that works best in GANs) as its activation function, except the output is followed by a Hyperbolic Tangent (Tanh) function that results in output a number in the range [-1,1].

Discriminator (Image By Author)

Instead, this function defines the discriminator network, which has the special feature of using dropout after hidden layers (in the base case only one hidden layer). The output goes through a sigmoid function since it must give us the probability of being a real or fake image.

Now we also download the MNIST dataset that we are going to use. The MNIST dataset is in a range [0,255], but we want it in the range [-1,1] so that it will be similar to the data generated by the Generator network. So we also apply some preprocessing to do this.

Now we come to the most important part. We need to create the functions that define the training of our network. We have already said that we should pull the discriminator separately from the generator, so we will have 2 functions.

The discriminator will be trained both on the fake data and on real data. When we train it on real data the labels will always be “real” = 1. So we create a vector of 1 with d_labels_real = torch.ones(batch_size, 1, device = device). Then we feed the input x to the model and calculate the loss using Binary Cross Entropy.

We do the same thing by feeding fake data. Here the labels will all be zero, d_labels_fake = torch.zeros(batch_size, 1, device = device). The input instead will be the fake data, that is, the output of the generator g_output = gen_model(input_z). And we calculate the loss in the same way.

The final loss will be the sum of the two losses.

As for the generator train function, the implementation is slightly different. The generator takes as input the output of the discriminator since it has to see if D has figured out whether it is a fake or real image. And based on that it calculates its loss.

Now we can initialize our two networks.

Let’s define a function to create network-generated samples, so as we go along we can see how the fake images improve as the training epochs increase.

Now we can finally train the net! We save the losses each time in a list so we can plot them later.

The training should take about an hour, depending on the hardware cha you used certainly. But in the end, you can print out your fake data and have something like this.

In my case, I trained for a few epochs so the results are not great, but you are beginning to get a glimpse that the network was learning to generate MNIST-like images.

Fake Data (Image By Author)

In this article, we looked at how the architecture of GANs in more detail. We studied their objective function and were able to implement a network capable of generating images from the MNIST dataset! The operation of these networks is not too complicated but their training certainly is. Since we need to find that balance that allows both networks to learn. If you enjoyed this article follow me to read the next one on DCGANs.😉

Marcello Politi

Linkedin, Twitter, CV

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment