Machine Learning Does Not Only Predict the Future, It Actively Creates It | by Samuel Flender | Jan, 2023

By Jessie Hobb On Jan 11, 2023

A primer on position bias (and why it matters)

Standard Machine Learning curricula teach that ML models learn from patterns that exist in the past in order make predictions about the future.

This is a neat simplification, but things change dramatically once the predictions from these models are being used in production, where they create feedback loops: now, the model predictions themselves are impacting the world that the model is trying to learn from. Our models no longer just predict the future, they actively create it.

One such feedback loop is position bias, a phenomenon that’s been observed in ranking models, those that power search engines, recommender systems, social media feeds and ads rankers, across the industry.

What is position bias?

Position bias means that the highest-ranked items (videos on Netflix, pages on Google, products on Amazon, posts on Facebook, or Tweets on Twitter) are the ones which create the most engagement not because they’re actually the best content for the user, but instead simply because they’re ranked highest.

This bias manifests because the ranking model is so good that users start blindly trusting the top-ranked item without even looking any further (“blind trust bias”), or because the users didn’t even consider other, potentially better, items because they were ranked too low for them to even notice (“presentation bias”).

Why is this a problem?

Let’s take a step back. The goal of ranking models is to show the most relevant content, sorted in order of engagement probability. These models are trained on implicit user data: each time a user clicks on an item on the search results page or on the engagement interface, we use that click as a positive label in the next model training iteration.

If users start engaging with content just because of its rank and not its relevance, our training data is polluted: instead of learning what users really want, the model simply learns from its own past predictions. Over time, the predictions become static, and will lack diversity. As a result, users may get bored or annoyed, and eventually go somewhere else.

Another problem with position bias is that offline tests become unreliable. By definition, position-biased user engagement data will always be biased in favor of the existing production model, because that’s the model that produced the ranks that users saw. A new model that’s actually better may still look worse in offline tests, and may be prematurely discarded. Only online tests would reveal the truth.

How can we mitigate position bias?

Models learn from data, so in order to de-bias the model, we need to de-bias its training data. As shown by Joachims et al (2016), this can be done by weighing each training sample by the inverse of its position bias, creating more weight for samples with low bias, and less weight for samples with high bias. Intuitively, this makes sense: a click on the first-ranked item (with high position bias) is probably less informative than a click on the 10th-ranked item (with low position bias).

The problem of mitigating position bias therefore boils down to measuring it. How can we do that?

One way is result randomization: for a small subset of the serving population, simply re-rank the top N items randomly, and then measure the change in engagements as a function of rank within that population. This works, but it’s costly: random search results or recommendations, especially for large N, create poor user experience, which can hurt user retention and therefore business revenue.

A better alternative may therefore be intervention harvesting, proposed by Argawal et al (2018) in the context of full-text document search, and in parallel by Aslanyan et al (2019) in the context of e-commerce search. The key idea is that logged user engagement data in a matured ranking system already contains the ranks from multiple different ranking models, for example from historic A/B tests or simply from different versions of the production model that have been rolled out over time. This historic diversity creates an inherent randomness in ranks, which we can “harvest” to estimate position bias, without any costly interventions.

Lastly, there’s an even simpler idea, namely Google’s “Rule 36”. They suggest to simply add the rank itself as yet another feature when training the model, and then set that feature to a default value (such as -1) at inference time. The intuition is that, by simply providing all information to the model upfront, it will learn both the engagement model and a position bias model implicitly under the hood. No extra steps needed.

Final thoughts

Let’s recap. Position bias is a real thing that’s been observed across the industry. It’s a problem because it can limit a ranking model’s diversity in the long run. But we can mitigate it by de-biasing the training data with a bias estimate, which we can get from either result randomization or intervention harvesting. Another mitigation strategy is to use the ranks directly as a model feature, and let the model learn the bias implicitly, with no extra steps required.

If you think about it more holistically, the existence of position bias is really kind of ironic. If we’re making our ranking models better and better over time, these improvements may lead to more and more users blindly trusting the top-ranked result, thereby enhancing position bias, and ultimately degrading our model. Unless we’re taking deliberate steps to monitor and mitigate position bias, any model improvements may therefore eventually become self-defeating.