Optimizing the Random Seed. Is there any point in doing that? | by Agnis Liukis | Aug, 2022

By Jessie Hobb On Aug 9, 2022

Is there any point in doing that?

At a first glance, the answer might seem obvious. But let’s not hurry with the conclusions before digging a little bit deeper.

Photo by Markus Spiske on Unsplash

I’ll start with a short story to create the context.

Random Seed optimization in practice

About 7 years ago, I started to learn Data Science and at about the same time I also started to compete in Machine Learning competitions on Kaggle platform. I knew only a little bit of theory but didn’t have much experience in building real models. And as a beginner in Machine Learning at that time I did exactly the same as many beginners do on Kaggle — I took some high-scoring public example script and tried to modify the parameters of the model in this script in the hope to improve the result.

It was an XGBoost model with about 10 parameters. At that time. I didn’t have much knowledge about the meaning of each parameter, so I simply started to change each of them in both directions — increasing and decreasing the original parameter value. After each change I re-trained the model and submitted the new result, comparing the score to the previous. Depending on the results I changed the parameter value further to the best direction with each iteration slightly improving the score.

One of the parameters of this model was random seed. Obviously, by changing the random seed, the CV score changes. It can become a little bit lower, or a little bit higher — depending on luck. Or more precisely — on random. This gives us the possibility to optimize random seed. We can find the seed value, for which the CV score is the best. This is exactly what I did for that competition, finding that the best random seed for me was 51.

But now comes the main question of this whole article.

Does it make sense?

Does it make sense to optimize random seed? Or, in other words, does having a better CV score for one seed mean that the model is better in general compared to the same model trained with another random seed and having a lower CV value?

Let’s try to find the answer to this question.

As I already mentioned, at a first glance, the answer might seem obvious. By changing the random seed, we simply get random changes in the CV score. Therefore, we can assume that these CV changes don’t mean anything, and a model with a better CV score based purely on a different random seed is not better than the same model trained with the original seed.

If not yet convinced, there are more arguments proving that random seed optimization won’t help. Take any model and find the random seed value with the best CV score. Then make any slight changes to the model, like adding one new feature, changing some other hyper-parameter or even simply shuffling the training rows around. Then check again if the previously found best random seed value still gives the best CV score. Almost for sure, it won’t. This means, that there is no one single best random seed.

So, are we done? Optimizing random seed doesn’t make any sense?

Well, not really. Let’s dig a bit deeper.

Digging deeper

First, let’s understand, what exactly does random seed. It is used by Machine Learning models for various tasks — in all places where randomness is required. For tree-based models, randomness is needed to ensure that all trees are not the same and have different splits. The more diverse trees will be in the tree-based model, the better result should be.

We can use various techniques to make trees more diverse. For example, we can use sub-sampling by removing some rows or some features (columns). In each iteration, some subset of rows or columns is removed. And randomness ensures that each time this subset is different. This in turn ensures that the model will create a different tree on each iteration.

So, random seed impacts (in a random fashion) what trees will be created. Now the question is — can one tree be better than another tree?

Yes, it can. For example, let’s imagine we have 2 features. One of them contains a true signal, but the other contains mostly a noise. It might happen, that in one tree, the true feature is removed by sub-sampling, making it unavailable for the model when considering the best possible splits. Therefore, the model will create a split by the noisy feature first. But in some other tree, the noisy feature will be removed — meaning that the first split will be done by the strong feature. In such a situation, the second tree, based on the true strong feature will be in general better.

Now, what if one random seed is lucky to generate more strong trees than some other seed? In this case, the model trained with that first seed will be indeed better than the model trained with the other seed, having more weaker trees with splits on the noisy feature.

Putting the conclusions together

So, once again — does it makes sense to optimize random seed? It depends. In most cases, it doesn’t. But in some cases, in some special circumstances, it might make the difference. For example, if the following two conditions are met:

1) There are few features with significantly different predictive power.

2) Sub-sampling or similar technique is used to randomly drop a subset of features in each iteration.

On a dataset meeting these two conditions (https://www.kaggle.com/datasets/alijs1/artificial-data-leaks) I managed to get models differing only by random seed, but showing a noticeable difference in performance. One seed was chosen as the best seed optimized against the validation set, and the other seed was chosen as the worst seed. And final performance for chosen two seeds was measured on different data, not the same validation set on which the seeds were optimized. This procedure was repeated 100 times with different validation sets to achieve some statistically significant results.

One additional observation was feature importance scores. Depending on how lucky the particular random seed was, features with true signal got slightly higher or lower on the feature importance list.

As a conclusion — there are cases (even if this particular one was artificially created) when random seed optimization can lead to better models.

Some final words

In the real-world problems, I have never optimized a random seed. As if you have two models trained with different seeds, the best approach is not to take one of them, but to blend both together. The blended (averaged) result in most of the cases will be better for each individual result.

But — if for some reason I had to choose only one of two models, differing only in the random seed used, I would choose the one with higher validation score. Just in case.

Thanks for reading! Hopefully, you got some food for thought.

Don’t forget to follow me if interested in reading more Data Science and Machine Learning related articles in the future.