Combating Overfitting with Dropout Regularization | by Rohan Vij | Mar, 2023

By Jessie Hobb On Mar 3, 2023

Discover the Process of Implementing Dropout in Your Own Machine Learning Models

Overfitting is a common challenge that most of us have incurred or will eventually incur when training and utilizing a machine learning model. Ever since the dawn of machine learning, researchers have been trying to combat overfitting. One such technique they came up with was dropout regularization, in which neurons in the model are removed at random. In this article, we will explore how dropout regularization works, how you can implement it in your own model, as well as its benefits and disadvantages when compared to other methods.

What is Overfitting?

Overfitting is when a model is overtrained on its training data, leading it to perform poorly on new data. Essentially, in the model’s strive to be as accurate as possible, it focuses too much on fine details and noise within its training dataset. These attributes are often not present in real-world data, so the model tends to not perform well. Overfitting can occur when a model has too many parameters relative to the amount of data. This can lead the model to hyper-focus on smaller details that are not relevant to the general patterns the model must develop. For example, suppose a complex model (many parameters) is trained to identify whether a horse is present in a picture or not. In that case, it might start focusing on details about the sky or environment rather than the horse itself. This can happen when:

The model is too complex (has too many parameters) for its own good.
The model is trained for too long.
The dataset the model was trained on is too small.
The model is trained and tested on the same data.
The dataset the model is trained on has repetitive features that make it prone to overfitting.

Why is Overfitting Important?

Overfitting is more than a simple annoyance — it can destroy entire models. It gives the illusion that a model is performing well, even though it will have failed to make proper generalizations about the data provided.

Overfitting can have extremely serious consequences, especially in fields such as healthcare, where AI is becoming more and more proliferated. An AI that was not properly trained nor tested due to overfitting can lead to incorrect diagnoses.

Dropout as a Regularization Technique

Ideally, the best way to combat overfitting would be to train a plethora of models of different architecture all on the same dataset and then average their outputs. The problem with this approach is that it is incredibly resource and time intensive. While it might be affordable with relatively small models, large models that might take large amounts of time to train could easily overwhelm anyone’s resources.

Dropout works by essentially “dropping” a neuron from the input or hidden layers. Multiple neurons are removed from the network, meaning they practically do not exist — their incoming and outcoming connections are also destroyed. This artificially creates a multitude of smaller, less complex networks. This forces the model to not become solely dependent on one neuron, meaning it has to diversify its approach and develop a multitude of methods to achieve the same result. For instance, going back to the horse example, if one neuron is primarily responsible for the tree part of the horse, its being dropped will force the model to focus more on other features of the image. Dropout can also be applied directly to the input neurons, meaning that entire features go missing from the model.

Applying Dropout to a Neural Network

Dropout is applied to a neural network by randomly dropping neurons in every layer (including the input layer). A pre-defined dropout rate determines the chance of each neuron being dropped. For example, a dropout rate of 0.25 means that there is a 25% chance of a neuron being dropped. Dropout is applied during every epoch during the training of the model.

Keep in mind that there is no ideal dropout value — it heavily depends on the hyperparameters and end goal of the model.

Dropout and Sexual Reproduction

Think back to your freshman biology class — you probably covered meiosis, or sexual reproduction. During the process of meiosis, random genes mutation occur. This means that the resulting offspring might have traits that both parents do not have present in their genes. This randomness, over time, allows populations of organisms to become more suited to their environment. This process is called evolution, and without it, we would not exist today.

Both dropout and sexual reproduction seek to increase diversity and stop a system from becoming reliant on one set of parameters, with no room for improvement.

Dataset

Let’s start with a dataset that might be prone to overfitting:

# Columns: has tail, has face, has green grass, tree in background, has blue sky, 3 columns of noise | is a horse image (1) or not (0)
survey = np.array([
[1, 1, 1, 1, 1, 1], # tail, face, green grass, tree, blue sky | is a horse image
[1, 1, 1, 1, 1, 1], # tail, face, green grass, tree blue sky | is a horse image
[0, 0, 0, 0, 0, 0], # no tail, no face, no green grass, no tree, no blue sky | is not a horse image
[0, 0, 0, 0, 0, 0], # no tail, no face, no green grass, no tree, no blue sky | is not a horse image
])

This data ties back to our example of the horse and its environment. We have abstracted the qualities of the image into a simple format it is easy to interpret. As can be clearly seen, the data is not ideal as images with horses in them also happen to contain trees, green grass, or a blue sky — they might be in the same picture, but one does not influence the other.

The MLP Model

Let’s quickly create a simple MLP using Keras:

# Imports
from keras.models import Sequential
from keras.layers import Dense, Dropout
import numpy as np# Columns: has tail, has face, has green grass, tree in background, has blue sky, 3 columns of noise | is a horse image (1) or not (0)
survey = np.array([
[1, 1, 1, 1, 1, 1], # tail, face, green grass, tree, blue sky | is a horse image
[1, 1, 1, 1, 1, 1], # tail, face, green grass, tree blue sky | is a horse image
[0, 0, 0, 0, 0, 0], # no tail, no face, no green grass, no tree, no blue sky | is not a horse image
[0, 0, 0, 0, 0, 0], # no tail, no face, no green grass, no tree, no blue sky | is not a horse image
])
# Define the model
model = Sequential([
Dense(16, input_dim=5, activation='relu'),
Dense(8, activation='relu'),
Dense(1, activation='sigmoid')
])
# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Train the model
X = survey[:, :-1]
y = survey[:, -1]
model.fit(X, y, epochs=1000, batch_size=1)
# Test the model on a new example
test_example = np.array([[1, 1, 0, 0, 0]])
prediction = model.predict(test_example)
print(prediction)

I highly recommend using Python notebooks such as Jupyter Notebook to organize your code so you can quickly rerun cells without having to retrain the model. Split the code along each comment.

Let’s further analyze the data we are testing the model with:

test_example = np.array([[1, 1, 0, 0, 0]])

Essentially, we have an image with all the attributes of a horse, but without any of the environmental factors we included in the data (green grass, blue sky, tree, etc). The model outputs:

0.02694458

Ouch! Even though the model has a face and a tail — what we are using to identify the horse — it is only 2.7% sure that the image is a horse image.

Implementing Dropout in an MLP

Keras makes implementing dropout, among other methods to prevent overfitting, shockingly simple. We just have to go back to the list containing the layers of the model:

# Define the model
model = Sequential([
Dense(16, input_dim=5, activation='relu'),
Dense(8, activation='relu'),
Dense(1, activation='sigmoid')
])

And add some dropout layers!

# Define the model
model = Sequential([
Dense(16, input_dim=5, activation='relu'),
Dropout(0.5),
Dense(8, activation='relu'),
Dropout(0.5),
Dense(1, activation='sigmoid')
])

Now the model outputs:

0.98883545

It is 99% sure that the horse image, even though it does not contain the environmental variables, is a horse!

The Dropout(0.5) line indicates that any of the neurons in the layer above have a 50% chance of being “dropped,” or removed from existence, in reference tod the following layers. By implementing dropout, we have essentially trained the MLP on hundreds of models in a resource-efficient manner.

Choosing a Dropout Rate

The best way to find the ideal dropout rate for your model is through trial and error — there is no one-size-fits-all. Start with a low dropout rate, around 0.1 or 0.2, and slowly increase it until you reach your desired accuracy. Using our horse MLP, a dropout of 0.05 results in the model being 16.5% confident the image is that of a horse. On the other hand, a dropout of 0.95 simply drops out too many neurons for the model to function — but still, a confidence of 54.1% is achieved. These values are not appropriate for this model, but that does mean they might be the right fit for others.

Let’s recap — dropout is a powerful technique used in machine learning to prevent overfitting and overall improve model performance. It does this by randomly “dropping” neurons from the model in the input and hidden layers. This allows the classifier to train on hundreds to thousands of unique models in one training session, preventing it from hyper-focusing on certain features.

In the coming articles, we will discover new techniques used in the field of machine learning as an alternative or addition to dropout. Stay tuned for more!