Generative Adversarial Learning. From generative to “plus adversarial” | by Arun Jagota | Nov, 2022

By Jessie Hobb On Nov 9, 2022

From generative to “plus adversarial”

Say we have a dataset of real images. Such as pictures of lions in various settings. From this data set, we want to machine-learn to generate new images that look like the real ones.

Generative Adversarial Networks, GANs for short, are a compelling approach to this problem.

A GAN comprises two models, a generator and a discriminator. The generator generates synthetic images. The discriminator is trained to distinguish between real and synthetic images.

The generator learns via feedback from the discriminator. The discriminator identifies which of the synthetic images it detected as fake. The generator should obviously be generating fewer of them. Because they are easily detectable fakes.

The generator and the discriminator are locked in battle. They try to outdo each other. The back-and-forth competition improves both.

The process stops once (and if) the generator gets good enough that the discriminator is unable to reliably distinguish between the real and the synthetic images. Better than random guessing.

Let’s Start With Generative

To appreciate the role the discriminator plays, first let’s leave it out of the mix.

Say we have trained an initial unsupervised generative model on our data set of real images. How realistic are the images it generates? If not realistic enough, how can we machine-learn to improve it?

A natural way to assess the generator’s quality is via a typical unsupervised measure of model goodness. Such as how well the model fits a validation set, i.e. held-out subset of the training set. Using a maximum likelihood or simpler criterion such as sum-of-squares.

In realistic settings, this is not effective enough.

Why not?

Realistic universes are often huge. Training sets are tiny fractions drawn from complex distributions. In fact, the entire population of real images may be a tiny, tiny, tiny fraction of the universe. A random sample from the universe will almost certainly be detectable as a fake.

Take an example. Binary images that have one million pixels. The universe’s size is “two to the power one million”. A training set of real pictures of lions with a billion images would still be a tiny fraction. They are drawn from a very complex distribution. Meaning that a superficial small distortion to a real lion’s image can make it look unreal.

Let’s elaborate on “superficial small distortion” using a different example.

Say we have a very large data set of actual handwritten 0s and 1s. Say all the images of 1s are made up of straight lines. Say the first version of the generator trained on this data set generates many 1s whose contours are curvy. (Perhaps the generator got confused by the 0s in the data set. They are curvy.)

How can we tell that the generator is not yet good enough because it is generating curvy 1s?

Assessing our model’s fitness on the validation set may not work great. The validation set does not have any curvy 1s.

This tells us the following. To evaluate the generator’s fitness, we should also look at the sample of synthetic images it generates and how they relate to the real ones.

Here is an analogy from educational settings. It’s not always easy to tell which students have mastered a topic and which have not. Without giving tests and evaluating the answers of the various students as they relate to the correct ones.

Tests in educational settings can also reveal which areas a particular student is currently weak on, to guide towards improvement. The same holds here.

Onto “Plus Discriminative”

So we have concluded that we should evaluate the generator’s quality by comparing the synthetic images it generates against the real ones. How exactly do we make this comparison?

We use a discriminative classifier.

We have a data set of real images and one of the synthetic ones. Combining the two yields a labeled data set for a binary classifier.

Train the binary classifier on this data set and assess its predictive accuracy. (Use a train-test split if needed.)

If the classifier predicts better than random, pick out those images labeled synthetic that it also predicts as synthetic. These are the fakes that are easy to distinguish from the reals. Use this feedback to retrain the generator.

In our example, we would expect that the discriminator will reveal that the generator is generating many curvy 1s which are easily predictable as synthetic.

Why a Discriminative Discriminator?

Let’s reason with our example. It would be great if our classifier were to somehow automatically learn that an image that has curvy lines and some other feature(s) that distinguishes it from a 0 is not real.

A discriminative classifier has a fair chance of learning features that can flush out fakes of this type. A generative classifier probably can’t. It is incapable of learning features that discriminate between the two classes. Especially not “image has curvy lines and some other feature(s) that distinguishes it from a 0”.

That a line is “curvy” is a higher-order feature. The universe of high-order features is superlinearly huge compared to the data universe. We have already noted that the latter is huge. Generative models have difficulty finding the ‘right’ high-order features because they only have model fitness to work with. Discriminative models have a much better chance in view of their use of discriminative learning.

The Recipe

Okay, so now we have the basic recipe. We repeat the following steps.

1. Train the generator.
2. Generate synthetic images from it.
3. Train the discriminator on the real+synthetic images.
4. Identify the regime (if any) in which the generator is weak. Use
this as feedback when going back to step 1.

Note that training happens in rounds, with generator training and discriminator training alternating inside a round.

A natural way to implement this recipe is as a repeating pipeline consisting of these four steps. Each step’s input and output are well-defined. We have some freedom to choose its implementation. For instance, we can use any binary classifier for step 3.

Incremental Learning And Loss Functions

In this section, we look at the training a bit differently. From the perspective of iterative training and loss functions. This perspective also offers a glimpse of how Generative adversarial Networks (GANs) operate.

Both the generator and the discriminator learn via feedback from the latest instance of the discriminator. We start by discussing what form this feedback takes.

Let D(x) denote the score that the discriminator D assigns to a datum x. D(x) is low if D thinks that x is fake and high if x is real or D thinks that it is.

Now onto the training of the generator G. Assume the discriminator has already been trained from the output of an initial generator combined with the real data set.

Let’s imagine doing the following repeatedly.

1. Sample a synthetic image s from G.
2. If D(s) is low, update G’s parameters so as to reduce the likelihood that G will sample this particular s subsequently.

Whereas in step 2 we say this particular s what we really mean is with the same characteristics as this particular s . In other words, we expect incremental training to generalize to reduce the likelihood of sampling not only this particular s but also those with characteristics similar to this s.

Next up, is discriminator training. For this, we suppose that in addition to the data set of the real images, we have one of the synthetic images from the latest version of the generator. Let’s combine the two. Let x denote a datum and y its label: real or synthetic. Now imagine presenting each instance (x, y) in the labeled data set one by one to the discriminator. If y is real we seek to update D’s parameters so as to increase D(x). If y is synthetic we seek to update D’s parameters so as to decrease D(x).

The generator learns only from the discriminator’s scores on the synthetic images. The discriminator by contrast learns from the discriminator’s scores on both the real and the synthetic images. It would not be good was the discriminator learning limited to that from synthetic images. It might decide to assign low scores to all the images, even the real ones, which would be bad feedback for training the generator in the next round.

Mode Collapse

There is nothing to prevent the generator from favoring certain modes in the data, in the extreme case just one. Consider our digits example, expanded to 0 through 9. If the generator is having difficulty generating realistic 8’s, it may abandon generating 8’s altogether. The discriminator is only able to distinguish reals from fakes. It cannot tell that 8s are missing from the synthetic images. The modes are not even labeled in the real data set.

Summary

In this post, we discussed generative adversarial learning in the setting of learning to generate data that is really similar to one in a given data set.

We started with generative learning and reasoned why it’s unlikely to work adequately well on this task. We then introduced discriminative learning into the mix.

The generator and the discriminator compete with each other. This forces both to improve.

Further Reading