Importance Sampling with Python. Learn how to sample from a distribution… | by Marcello Politi | May, 2023

By Jessie Hobb On May 11, 2023

Learn how to sample from a distribution when you only have access to another distribution

Introduction

Among the various sampling methods that a data scientist must know one of the most important is the one called Importance Sampling.

This method allows us to sample from one distribution even though we are actually only able to sample from a different distribution! Let’s see how it works.

Importance Sampling

Suppose, for example, that taking samples x from the distribution g(x) is infeasible because, for example, it is too expensive. But at the same time, we have a distribution f(x) that we call importance distribution from which we are able to sample.

We can use sampling of the distribution f(x) to compute statistics about the distribution in which we are really interested g(x). Let’s see how.

Imagine that we have a distribution f(x) representing the probability of each face of the die. Each face has a probability of 1/6 if the die is “fair,” so we can represent the distribution as follows.

We also have another distribution g(x), the one we would like to sample from but for some reason are prevented from doing so. In this case, the die is not fair and so the distribution is biased. So some faces will have a higher probability than others.

Using our math notions, we are able for each distribution to calculate the expected value. So for example E[x] for the first distribution f(x) will be:

The same thing naturally applies to g(x).

Now imagine that we want to calculate statistics from our population by sampling. For example, we want to calculate the average value of the result of n rolls of a die.

We roll a fair die n times, and we know from the central limit theorem, that as n increases this average value will tend to the expected value.

Now we would like to calculate the same statistic by also rolling n times the other die, the unfair one that has a different probability distribution. The problem is that this die does not exist and therefore we cannot experiment!

But then how do we do that? What we can do is calculate this statistic by always using the fair die but using a “trick.”

We know how to calculate the expected value of this die with the distribution g(x). What we go on to do next is to multiply and divide by the same quantity f(x).

Now if we call x all the first part within the summation, this value is practically the expected value of f(x) in which x is weighted.

So we can use the same idea as before, and do a sampling in which we weigh each draw by g(x)/f(x), and the result should come close to the expected value of g.

All very nice but does it really work? Let’s do some experiments!

Let’s Code!

First, we import some libraries that we will need.

import numpy as np
import matplotlib.pyplot as plt

We now represent with two numpy arrays the distribution of two dice. The first is a fair die in which each face has the same probability. While in the second each face has a different probability.

f = np.array([1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
g = np.array([0.4, 0.3 ,0.1 ,0.1, 0.06, 0.04])

If you want you can plot the distributions using Matplotlib.

plt.bar(np.arange(1,7), f)
plt.show()

plt.bar(np.arange(1,7), g)
plt.show()

Now let’s define a function that calculates the empirical mean of the experiments. That is, we roll a die n times, and then divide the sum of the results obtained by n. The greater the number n chosen, the closer the empirical mean will be to the expected value.

def compute_avg(distr, n = 10_000):
mean = 0
for i in range(n):
mean += np.random.choice(a = np.arange(1,7),  p=distr)
print(f"average from sampling: {mean/n}")

If you now calculate this empirical mean on the two distributions f and g you should find a mean value similar to the expected one.

compute_avg(f)
compute_avg(g)

The problem is that we have to assume a case in which you cannot directly calculate the empirical mean of g, because, for example, you do not have such a die and therefore cannot experiment. The only die you have is the fair die, and you know the distribution of both dice, though.

Then as we saw earlier we can create a function that samples from the distribution f and then by weighing the extracted number in the way we showed earlier, we can calculate the mean as if it were extracted from the distribution g. Now all we have to do is write the function that does this and see that it actually works.

def importance_sampling(f, g, n = 10_000):
mean = 0
for i in range(n):
x = np.random.choice(a = np.arange(1,7),  p=f)
weight = g[x-1]/f[x-1]
x = x*weight
mean += x
print(f"average from sampling: {mean/n}")

We, therefore, launch this importance sampling on a number of launches n = 10,000 and on the f and g distributions stated earlier.

importance_sampling(f,g)

The result of this function is 2.24920 which is very close to the expected value! So we showed concretely by writing a Python code that this method works!

In this article, we looked at one of the most important sampling techniques for a data scientist. Importance sampling allows us to sample from one distribution even if we only have access to another distribution. This can happen if for example if doing sampling from the target distribution is too expensive, or impossible for any reason. I hope you have in this article learned something useful, and if you are interested in more articles of this type follow me here on Medium!😉

Marcello Politi

Linkedin, Twitter, Website