Learning to Rank the Bayesian Way | by Dr. Robert Kübler | Aug, 2022

By Jessie Hobb On Aug 9, 2022

Implement the Bradley–Terry model in PyMC

Imagine that a bunch of players compete in some game one versus one. Then, a natural question arises:

How to rank the players?

In theory, this problem should not be too hard — let them play a bunch of games and check the players’ win ratios. Unfortunately, this natural approach has some issues:

you cannot tell if a win ratio of close to 100% means that the player is exceptional, or if the player only stomped weak opponents, and
if a player only played a few games, the estimate of this player’s strength should come with high uncertainty, which a raw win ratio does not deliver.

You can encounter problems that can be phrased like one-versus-one games in many places:

when ranking players in an actual contest: tennis, racing, card games, Pokémon fights, …
when ranking search results: one search result is better than another if it is more relevant to the user’s query.
when ranking recommendations: one recommendation is better than another if it is more relevant to what the user might want to buy
…

In this article, I will show you how to build a simple Bayesian model in PyMC that solves this problem. In case you do not know what I’m talking about, take a look at my introduction to the Bayesian world using PyMC3, the predecessor of PyMC with nearly identical syntax.

The model we will use is called the Bradley–Terry model, which we give a Bayesian twist. This sounds scary, but it is actually quite easy. Just follow along.

Model Intuition

The whole model boils down to only two assumptions:

We assume that each player (human, Pokemon, search result, recommendation, …) has some strength (skill, relevance, …).
If Player 1 with strength s₁ and Player 2 with strength s₂ compete, Player 1 wins with a probability of

where σ is just your old pal, the sigmoid function. That’s it.

Note that we do not use any player characteristics here, for example, features like height or weight if the players are actual humans. This means that we can apply this model to a variety of problems.

However, if we happen to have player features, we can also incorporate them into the model and end up with something like Microsoft’s RankNet [1]. The authors use neural network architecture to explicitly build the players’ strengths s = f(x) from the features x while we directly treat the strengths s as parameters in our Bayesian approach.

For the people who are into neural networks: we could use an embedding layer to build a frequentist version of the Bradley-Terry model in a deep learning framework of your choice.

Let us do a small sanity check to see if this definition makes sense. Okay, so each player has a strength. This is exactly what we need if we want to order the players according to their strength, so this is great.

If we now assume that Player 1 has a much higher strength than Player 2, i.e. s₁ — s₂ is a large number, which implies that σ(s₁ — s₂) is close to 1. So Player 1 wins with an overwhelming probability, which is exactly what we want in this case. The same argument works if Player 1 has a much lower strength than Player 2. If both players have the same strength, the probability that each of them wins is σ(0) = 50%. Perfect!

Creating a Dataset

Before we come to the modeling part, let us create an artificial dataset consisting of game results that will look like this:

The advantage is then that we know which properties the model should be able to find.

To create this dataset, you can use the following code:

import pandas as pd
import numpy as npnp.random.seed(0)def determine_winner(player_1, player_2):
if player_1 == 0 and player_2 == 1:
return np.random.binomial(n=1, p=0.05)
if player_1 == 0 and player_2 == 2:
return np.random.binomial(n=1, p=0.05)
if player_1 == 1 and player_2 == 0:
return np.random.binomial(n=1, p=0.9)
if player_1 == 1 and player_2 == 2:
return np.random.binomial(n=1, p=0.1)
if player_1 == 2 and player_2 == 0:
return np.random.binomial(n=1, p=0.9)
if player_1 == 2 and player_2 == 1:
return np.random.binomial(n=1, p=0.85)games = pd.DataFrame({
"Player 1": np.random.randint(0, 3, size=1000),
"Player 2": np.random.randint(0, 3, size=1000)
}).query("`Player 1` != `Player 2`")games["Player 1 wins"] = games.apply(
lambda row: determine_winner(row["Player 1"], row["Player 2"]),
axis=1
)

Here, we have created a dataset that consists of three players challenging each other randomly. The function determine_winner does exactly that: it takes two player indices (0, 1, 2) and outputs if player_1 wins. For example, in the game (0, 1) — marked in bold in the code — the player with the number 0 wins with a probability of p=0.05 against the player with the number 1.

If you check the probability carefully, you will see that the player with the number 2 should be the best, number 1 in the middle, and 0 the weakest one.

To spice things up, let us introduce a fourth player that only played two games.

new_games = pd.DataFrame({
"Player 1": [3, 3],
"Player 2": [2, 2],
"Player 1 wins": [1, 1]
})games = pd.concat(
[games, new_games],
ignore_index=True
)

The player with number 3 played two times against number 2 and even won twice. So number 3 should have quite high strength as well, but we cannot say if number 3 is truly better than 2, or if it was mere luck.

Building a Model in PyMC

We are able to build the model in PyMC3 now. Note that we will use Gaussian priors for the players’ strengths. Also, I’ll let the model infer posteriors for five players, although there is no data for the last player with the number 4. We will see how the model deals with that.

Another important thing is that I will not use the sigmoid function explicitly. If we pass the difference of the players’ strengths via the logit_p parameter instead of p , the pm.Bernoulli object will take care of it.

import pymc as pmwith pm.Model() as model:
strength = pm.Normal("strength", 0, 1, shape=5)
diff = strength[games["Player 1"]] - strength[games["Player 2"]]obs = pm.Bernoulli(
"wins",
logit_p=diff,
observed=games["Player 1 wins"]
)
trace = pm.sample()

After the magic happened, we can check how the posteriors are distributed. On the left, you can see the posterior distributions as density plots, on the right you can see a forest plot that lets you easily compare the strength posteriors.

Here, you can see that number 0 is indeed the weakest player, followed by number 1. The numbers 2 and 3 are the best, as expected. Number 3’s posterior distribution has a slightly lower mean than the one of number 2, but the HDI (high-density interval) is much larger, showing that there is more uncertainty about number 3’s strength compared to number 2’s. The posterior strength of number 4 is the same as the prior: normally distributed with mean 0 and standard deviation 1. The model could not learn anything new here.

If you like numbers more, here they are:

From there, we can also see that the MCMC seems to have converged well since the r_hat values are all equal to 1.

We can also see that some players have a negative strength, but this is totally fine since we only use the difference in strength between 2 players anyway. If you do not like this for some reason, you can either replace the strength priors with a HalfNormal distribution, or you just add some constant like 5 to the posteriors, so all means and HDIs are in the positive range.

In this article, we have seen how to build a model that lets you rank a set of players (actual players, recommendations, search results, …). A Bayesian model that even works without any player features. This is not a restriction, however, since we can incorporate features as well.

The model started off with some prior beliefs about the strength levels of the players, which then got updated via data. The more games a player plays, the smaller the uncertainty about this player’s strength. In one extreme case, if a player never played a single game, the posterior distribution of their strength equals the prior distribution, which makes sense.

[1] Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N. and Hullender, G., Learning to Rank using Gradient Descent (2005), Proceedings of the 22nd international conference on Machine learning (pp. 89–96)

I hope that you learned something new, interesting, and useful today. Thanks for reading!

As the last point, if you

want to support me in writing more about machine learning and
plan to get a Medium subscription anyway,

why not do it via this link? This would help me a lot! 😊

To be transparent, the price for you does not change, but about half of the subscription fees go directly to me.

Thanks a lot, if you consider supporting me!

If you have any questions, write me on LinkedIn!