Prediction Performance Drift: The Other Side of the Coin | by Valeria Fonseca Diaz | Feb, 2023

By Jessie Hobb On Feb 2, 2023

We know the causes, let’s talk about the types

Two sides of prediction performance drift (Image by author)

The world of machine learning has moved and grown so fast that in less than two decades we are already at the next stage. The models are built, and now we need to know if they provide accurate predictions in the short, medium, and long term. So many methods, theoretical approaches, schools of thought, paradigms, and digital tools are in our pockets when it comes to building our models. Now then, we want to understand better what’s at stake when it comes to prediction performance in time.

One may think that the prediction performance of the model is ensured by the dictated quality based on a test set, a separate set of samples that were not involved in the task during the training of the model at all, yet measured or observed at that moment. The reality is that, while the model may have been delivered with a certain (hopefully very satisfactory) prediction performance based on that test set, the observations that will come in time will not fully share the same properties or trends as the training and test sets. Recall that the models are machines, not thinking entities. If they see something new for which they were not trained, they cannot be critical of these new things. We, humans, can.

Prediction performance drift: Side A

The way in which we have been thinking about prediction performance involves separating the variability distributions that influence a model. There’s the distribution of the input features P(X), the distribution of the labels P(Y), and the conditional distribution of the labels given the inputs P(Y|X). Any changes in any of these components are potential causes for the model to make more inaccurate predictions with respect to its original performance. With this in mind, if we monitor these components and something in them changes at some point, well, that’s a moment for a deep check-up to see what’s happening and fix it if necessary.

Prediction performance drift: Side B

While the previous is unquestionable, we can also look at another dimension when it comes to prediction performance drift: The types of drift. When we look at the prediction performance, it’s possible to discriminate between different types of drift. Now, why would this be of any use? Well, not only we’ll detect that something is changing, but we’ll more easily know how to fix it.

There can essentially be 4 types of drift: Bias, Slope, Variance, and Non-linearities. While these names might hint more particularly at a linear regression model, they apply to the general menu of machine learning models. More generalized names for these types could be Constant shift, Rotation with respect to prediction boundary, Dispersion collapsing with boundary, and Change of boundary shape. We may agree that the latter names are long and time-consuming to write and talk about, so let’s stick to the former names. And while doing so, let’s not confuse any of the types with the nature of the model. Any of the 4 types of drifts can occur in any type of model.

Types of drift for regression models (Image by author)

By checking for the different types of drift, not only we’ll detect that something is changing, but we’ll more easily know how to fix it.

Side B: The 4 types

A bias drift refers to a constant shift. In a regression model, a bias drift is happening if we observe that our predictions are a constant away from the observed values. In a classification model, a bias drift will happen if the features for each class have a constant shift, which may also be class-specific. This type of drift may simply happen because of a change in the average of the population without any other changes in the relationships among the features and their impact on the target variable. Being a simple change, it may be easily repairable by re-shifting the bias of the model or the average in the sample.

A slope drift happens when there’s a rotation with respect to the center of the regression function or decision boundary. This type of drift can be very common in learning tasks involving images that are taken from different angles while being essentially the same image. Depending on the degree of rotation and whether the rotation is class-specific for classification tasks, a slope drift may be fixed by a simple adjustment of the rotation or by completely retraining the model.

Example of types of drift for a linear classifier (Image by author)

A variance drift, as its name specifies, it’s an increase in dispersion. It will become damaging to the prediction performance if the dispersion pushes the samples to cross the task boundaries. In the case of regression, a variance drift manifests by larger residuals. In the case of classification, this drift may be an expansion of the original dispersion of the features in each class. Here again, this drift may be class-specific. This drift may occur due to different causes, it may correspond to a bigger sampling close to the decision boundaries or an uneven shift of the input features. If due to sampling, this problem might be solved by a recalculation of the model with updated data which is more representative of the current sampling.

Example of types of drift for a non-linear classifier (Image by author)

The last type of drift is called non-linear drift. While looking very specific and systematic from the shown examples, this type of drift may contain all types of combinations of the aforementioned three types. The famous case when teaching artificial neural networks involves 2 input features to separate 2 classes. In a 2D space, the 2 classes may be separated by a linear function, but the simple rotation of 2 points induces and non-linear change. If the classes can no longer be separated by one line, well, the problem has become non-linear. Just as in the other cases, for classification tasks, this drift may also be class-specific as shown in the linear classification example above. As this is the most complex type of drift, the non-linear patterns that may be observed in the prediction quality of the model may always force the retraining of the model.

Example from linear to non-linear classification (Image by author)

When is the prediction performance degrading?

Drifts are not always problematic. Let’s dive a bit into this. A change in the center of X without a change in the center of Y, or vice-versa, will lead to a bias drift. However, if both, X and Y are moving accordingly with the model function, there won’t be a prediction performance drift as the model will continue making accurate predictions. When it comes to the slope, while for regression there’s less room for rotation, in classification tasks the classes may rotate without damaging the performance when this rotation does not collapse with the decision boundary. This may be a very theoretical fact with a little incidence in practice, but hey, that’s how it goes.

Variance drifts may be more realistic for classification problems than slope drifts. The model can handle a variance drift as long as the dispersion is not collapsing with the decision boundary. When the points start crossing decision boundaries, the prediction performance is compromised. The same is true for regression models, although, just as in the case of slope drift, there’s very little tolerance for dispersion expansion without accounting for prediction degradation.

Non-linear drifts are the least permissive. These changes might always be problematic. Models can be robust to a lot of changes corresponding to the previous drifts, but they are designed to handle a specific task with a specific shape. That shape is the representation of the concept that we created the model to predict, and if that shape starts changing, the whole concept might be changing. So here it’s hard to think of cases in which a non-linear change is not problematic.

Let’s keep flipping the coin

So here we have a new view to think about the degradation in the prediction performance of our models. Having us equipped with not only the causes but now with the types, we can keep flipping the coin when monitoring the quality of prediction of our models. That way, we not only get information about what and when something is changing, but also why it’s changing as well as how it is changing.

Are we going to end up formulating the “who” and “where”?