Overfitting, Underfitting, and Regularization | by Cassie Kozyrkov | Feb, 2023


In Part 1, we covered much of the basic terminology as well as a few key insights about the bias-variance formula (MSE = Bias² + Variance), including this misquote from Anna Karenina:

All perfect models are alike, but each unhappy model can be unhappy in its own way.

To make the most of this article, I suggest taking a look at Part 1 to make sure you’re well-situated to absorb this one.

Under vs over… fitting. Image by the author.

Let’s say you have a model that is as good as you’re going to get for the information you have.

To have an even better model, you need better data. In other words, more data (quantity) or more relevant data (quality).

When I say as good as you’re going to get, I mean in “good” terms of MSE performance on data your model hasn’t seen before. (It’s supposed to predict, not postdict.) You’ve done a perfect job of getting what you can from the information you have — the rest is error you can’t do anything about with your information.

Reality = Best Model + Unavoidable Error

But here’s the problem… we’ve jumped ahead; you don’t have this model yet.

All you have is a pile of old data to learn this model from. Eventually, if you’re smart, you’ll validate this model on data it hasn’t seen before, but first you have to learn the model by finding useful patterns in data and trying to inch closer and closer to the stated objective: an MSE that’s as low as possible.

Unfortunately, during the learning process, you don’t get to observe the MSE you’re after (the one that comes from reality). You only get to compute a shoddy version from your current training dataset.

Photo by Jason Leung on Unsplash

Oh, and also, in this example “you” are not a human, you’re an optimization algorithm that was told by your human boss to twiddle the dials in the model’s settings until the MSE is as low as it will go.

You say, “Sweet! I can do this!! Boss, if you give me an extremely flexible model with lots of settings to fiddle (neural networks, anyone?), I can give you a perfect training MSE. No bias and no variance.”

The way to get a better training MSE than the true model’s test MSE is to fit all the noise (errors you have no predictively-useful information about) along with the signal. How do you achieve this little miracle? By making the model more complicated. Connecting the dots, essentially.

This is called overfitting. Such a model has an excellent training MSE but a whopper of a variance when you try to use it for anything practical. That’s what you get for trying to cheat by creating a solution with more complexity than your information supports.

The boss is too smart for your tricks. Knowing that a flexible, complicated model allows you to score too well on your training set, the boss changes the scoring function to penalize complexity. This is called regularization. (Frankly, I wish we had more regularization of engineers’ antics, to stop them from doing complicated things for complexity’s sake.)

Regularization essentially says, “Each extra bit of complexity is going to cost you, so don’t do it unless it improves the fit by at least this amount…”

If the boss regularizes too much — getting tyrannical about simplicity — your performance review is going to go terribly unless you oversimplify the model, so that’s what you end up doing.

This is called underfitting. Such a model has an excellent training score (mostly because of all the simplicity bonuses it won) but a whopper of a bias in reality. That’s what you get for insisting that solutions should be simpler than your problem requires.

And with that, we’re ready for Part 3, where we bring it all together and cram the bias-variance tradeoff into a convenient nutshell for you.

If you had fun here and you’re looking for an entire applied AI course designed to be fun for beginners and experts alike, here’s the one I made for your amusement:

Here are some of my favorite 10 minute walkthroughs:


In Part 1, we covered much of the basic terminology as well as a few key insights about the bias-variance formula (MSE = Bias² + Variance), including this misquote from Anna Karenina:

All perfect models are alike, but each unhappy model can be unhappy in its own way.

To make the most of this article, I suggest taking a look at Part 1 to make sure you’re well-situated to absorb this one.

Under vs over… fitting. Image by the author.

Let’s say you have a model that is as good as you’re going to get for the information you have.

To have an even better model, you need better data. In other words, more data (quantity) or more relevant data (quality).

When I say as good as you’re going to get, I mean in “good” terms of MSE performance on data your model hasn’t seen before. (It’s supposed to predict, not postdict.) You’ve done a perfect job of getting what you can from the information you have — the rest is error you can’t do anything about with your information.

Reality = Best Model + Unavoidable Error

But here’s the problem… we’ve jumped ahead; you don’t have this model yet.

All you have is a pile of old data to learn this model from. Eventually, if you’re smart, you’ll validate this model on data it hasn’t seen before, but first you have to learn the model by finding useful patterns in data and trying to inch closer and closer to the stated objective: an MSE that’s as low as possible.

Unfortunately, during the learning process, you don’t get to observe the MSE you’re after (the one that comes from reality). You only get to compute a shoddy version from your current training dataset.

Photo by Jason Leung on Unsplash

Oh, and also, in this example “you” are not a human, you’re an optimization algorithm that was told by your human boss to twiddle the dials in the model’s settings until the MSE is as low as it will go.

You say, “Sweet! I can do this!! Boss, if you give me an extremely flexible model with lots of settings to fiddle (neural networks, anyone?), I can give you a perfect training MSE. No bias and no variance.”

The way to get a better training MSE than the true model’s test MSE is to fit all the noise (errors you have no predictively-useful information about) along with the signal. How do you achieve this little miracle? By making the model more complicated. Connecting the dots, essentially.

This is called overfitting. Such a model has an excellent training MSE but a whopper of a variance when you try to use it for anything practical. That’s what you get for trying to cheat by creating a solution with more complexity than your information supports.

The boss is too smart for your tricks. Knowing that a flexible, complicated model allows you to score too well on your training set, the boss changes the scoring function to penalize complexity. This is called regularization. (Frankly, I wish we had more regularization of engineers’ antics, to stop them from doing complicated things for complexity’s sake.)

Regularization essentially says, “Each extra bit of complexity is going to cost you, so don’t do it unless it improves the fit by at least this amount…”

If the boss regularizes too much — getting tyrannical about simplicity — your performance review is going to go terribly unless you oversimplify the model, so that’s what you end up doing.

This is called underfitting. Such a model has an excellent training score (mostly because of all the simplicity bonuses it won) but a whopper of a bias in reality. That’s what you get for insisting that solutions should be simpler than your problem requires.

And with that, we’re ready for Part 3, where we bring it all together and cram the bias-variance tradeoff into a convenient nutshell for you.

If you had fun here and you’re looking for an entire applied AI course designed to be fun for beginners and experts alike, here’s the one I made for your amusement:

Here are some of my favorite 10 minute walkthroughs:

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
artificial intelligenceCassieFebKozyrkovmachine learningOverfittingRegularizationTech NewsUnderfitting
Comments (0)
Add Comment