How Bayesian statistics works in updating probabilities | by Giovanni Organtini | Jul, 2022


How experiments update knowledge leading to accurate probability estimates

Dice-players and a bird-seller gathered around a stone slab — Master of the Gamblers — oil on canvas (public domain image taken from Wikipedia)

By studying a cheat’s winnings, it is possible to find out with what probability he will get points, without knowing his resources, by applying Bayes’ Theorem. In this story we see how performing some “experiments” makes it possible to correctly estimate the probability of an event happening, even if we do not know the composition of the sample from which it is drawn.

According to Bayesian theory, probabilities are subjective. It may sound odd that a subjective guess may lead to meaningful results. The purpose of this story is to show how it is possible by an example.

After all, probabilities must be subjective, because there is no way to estimate an objective probability, unless you already know it, because you have prepared the system in a certain way.

Consider an urn with 6 balls inside. The balls can be red (R), green (G) or blue (B). You can know the objective probability of drawing, e.g., a red ball, only if you know how many there are in the urn. Otherwise, you can only guess and, before drawing a ball, you can only assign a subjective probability to it: you can, in other words, bet that you will draw a red ball on the first draw based only on your feelings, depending on how lucky you feel. However, observing other people’s bets, you can get an accurate idea of that probability.

In our example, the urn contains three red balls, two green balls and one blue ball. However, suppose we know nothing about the contents of the urn. We can only say that there are the following possibilities.

  1. There are no red balls in the urn. We denote this condition as P(R|0)=0, where P(R|0) is the probability of drawing a red ball, given the assumption of 0 red balls. P(R|0) is called the conditional probability of drawing R, when there are no red balls in the urn.
  2. There is just one red ball in the urn. In this case P(R|1)=1/6, i.e., there is one possibility out of 6 of drawing a red ball.
  3. If there are 2 red balls in the urn, P(R|2)=2/6=1/3.
  4. P(R|3)=3/6=1/2 is the probability of drawing a red ball, if there are 3 red balls in the urn.
  5. P(R|4)=4/6=2/3 is the probability of drawing a red ball if there are four of them in the urn.
  6. If the number of red balls is 5, P(R|5)=5/6.
  7. Of course, if all the balls are red, P(R|6)=1.

The above probabilities are objective, however, we know nothing about the hypothesis, for which we need to make a guess. According to Bayes’ Theorem

In this formula P(H|x), the posterior, represents the updated probability that H is true, given the event x has occurred. It is given by the product of the prior P(H) and the likelihood P(x|H), divided by P(x), which is nothing but a normalisation factor and is called evidence.

In our case P(H) is our (prior) probability that hypothesis H is true. Our prior probability can only be subjective and we can only make an educated guess. We must believe that P(H) has a given value. It is called “prior” because it must be evaluated before the experiment. Since we cannot prefer one value to another, we can set P(H)=1/7, since there are seven possible hypothesis, and we have no clue to prefer one or the other.

The likelihood P(x|H) represents how likely is to draw an x from the urn, given the hypothesis H. The P(x|H) for x=R are listed above and depend on H, of course. For example, P(x|H)=2/3 for x=R and H=4.

The evidence P(x) is the probability of drawing an x, regardless of the hypothesis. It’s called the evidence, because it can be evaluated as the outcome of an experiment. Because it works as a normalising factor, it can be evaluated by marginalising the probabilities, as we shall see below.

After the first draw, we can update our knowledge, by calculating the posterior probability (i.e., the probability after the experiment has been done), according to the Bayes’ Theorem. In other words, we want to evaluate P(H|x) that the hypothesis H is true, given the event x occurred.

Suppose that we draw a green (G) ball (x=G) on our first attempt. We can then build the following table.

P(H), in the second column, is the probability of hypothesis H, listed in the first column. The likelihood P(R|H) is shown, for convenience, in the third column. In our case, the event x=G and we need to compute the likelihood of drawing a ball which is not red, which is 1-P(R|H). The product P(H)P(x|H) that the event x=G occurred, given H, is shown in the penultimate column.

The sum of all the probabilities in the penultimate column gives P(x), regardless of H. It is 1/2. Dividing the last column of the table by this number gives the posterior probability, which is added as the last column in the table.

The sum of P(H) for all H, as well as of P(H|x), must be 1, as shown. From the table we see how our belief is evolved thanks to the experiment done. Hypothesis 6 (all balls are red) has been ruled out: its probability vanishes. The probability of each hypothesis, after the experiment, decreases with the number of red balls in the ballot box. Indeed, if the number of the red balls in the urn was 5, we most likely drew a red ball, whereas, if there is just one ball, the probability of drawing a green ball is higher.

Each time we perform an experiment, we update our probability. Now, our prior is the posterior obtained after the first experiment. We are more confident that there is one red ball in the urn, rather than five, and we are sure that not all the balls are red. Let’s then reconstruct the table.

Then, let’s perform another experiment in which, we suppose we draw a red ball (x=R). In this case, the numerator of the Bayes’ Theorem is the product P(H)P(R|H) and the table changes as

Dividing each numerator by 5/18, the sum of the numerators, we get the posterior:

It is worth noting that now the posterior probabilities are symmetrical with respect to Hypothesis 3. The symmetry is due to the fact that we have drawn a green ball and a red ball and, after these experiments, we can conclude that the probability, assumed to be coincident with the frequency, of drawing a red ball is equal to that of drawing a non-red ball.

We have also ruled out another hypothesis: H=0. In fact, having drawn a red ball, it is impossible for H=0 to be true.

Now, the hypothesis H=3 is the most probable one (which is, actually, the true one).

We can iterate the process many times. Each time, we update our knowledge about the contents of the urn. It is interesting to plot the evolution of the probabilities as a function of the experiment number (we simulated 200 draws and each time computed the posterior probability):

As we can see, all priors eventually goes to zero, except the right one, P(3), which tends to 1, as predicted by the Law of Large Numbers (i.e., by the frequentist approach).

Here, you can find a jupyter notebook intended to compute the posterior after randomly drawing a ball, and iterating the process. With this tool you can see how the posterior probability evolves with the number of experiments and to understand how Bayes’ Theorem leads to an objective probability starting from a subjective one.

By playing with this tool you can even change the initial priors and see that the final result is independent on the initial guess. You can, for example, start with completely random priors P(H). Eventually, it always turn out that P(3) tends to one.

Quoting Bruno De Finetti, “probability does not exist”, in the sense that it is in no way objective, i.e., it is not intrinsic to any real experiment. Although there is no such thing as an objective probability, Bayes’ Theorem leads to an estimate of it whose value evolves with our knowledge of the system, and is updated after each experiment, leading to an estimate that is as objective as possible. In this story we have illustrated how the updating process works and why it leads to the correct answer.


How experiments update knowledge leading to accurate probability estimates

Dice-players and a bird-seller gathered around a stone slab — Master of the Gamblers — oil on canvas (public domain image taken from Wikipedia)

By studying a cheat’s winnings, it is possible to find out with what probability he will get points, without knowing his resources, by applying Bayes’ Theorem. In this story we see how performing some “experiments” makes it possible to correctly estimate the probability of an event happening, even if we do not know the composition of the sample from which it is drawn.

According to Bayesian theory, probabilities are subjective. It may sound odd that a subjective guess may lead to meaningful results. The purpose of this story is to show how it is possible by an example.

After all, probabilities must be subjective, because there is no way to estimate an objective probability, unless you already know it, because you have prepared the system in a certain way.

Consider an urn with 6 balls inside. The balls can be red (R), green (G) or blue (B). You can know the objective probability of drawing, e.g., a red ball, only if you know how many there are in the urn. Otherwise, you can only guess and, before drawing a ball, you can only assign a subjective probability to it: you can, in other words, bet that you will draw a red ball on the first draw based only on your feelings, depending on how lucky you feel. However, observing other people’s bets, you can get an accurate idea of that probability.

In our example, the urn contains three red balls, two green balls and one blue ball. However, suppose we know nothing about the contents of the urn. We can only say that there are the following possibilities.

  1. There are no red balls in the urn. We denote this condition as P(R|0)=0, where P(R|0) is the probability of drawing a red ball, given the assumption of 0 red balls. P(R|0) is called the conditional probability of drawing R, when there are no red balls in the urn.
  2. There is just one red ball in the urn. In this case P(R|1)=1/6, i.e., there is one possibility out of 6 of drawing a red ball.
  3. If there are 2 red balls in the urn, P(R|2)=2/6=1/3.
  4. P(R|3)=3/6=1/2 is the probability of drawing a red ball, if there are 3 red balls in the urn.
  5. P(R|4)=4/6=2/3 is the probability of drawing a red ball if there are four of them in the urn.
  6. If the number of red balls is 5, P(R|5)=5/6.
  7. Of course, if all the balls are red, P(R|6)=1.

The above probabilities are objective, however, we know nothing about the hypothesis, for which we need to make a guess. According to Bayes’ Theorem

In this formula P(H|x), the posterior, represents the updated probability that H is true, given the event x has occurred. It is given by the product of the prior P(H) and the likelihood P(x|H), divided by P(x), which is nothing but a normalisation factor and is called evidence.

In our case P(H) is our (prior) probability that hypothesis H is true. Our prior probability can only be subjective and we can only make an educated guess. We must believe that P(H) has a given value. It is called “prior” because it must be evaluated before the experiment. Since we cannot prefer one value to another, we can set P(H)=1/7, since there are seven possible hypothesis, and we have no clue to prefer one or the other.

The likelihood P(x|H) represents how likely is to draw an x from the urn, given the hypothesis H. The P(x|H) for x=R are listed above and depend on H, of course. For example, P(x|H)=2/3 for x=R and H=4.

The evidence P(x) is the probability of drawing an x, regardless of the hypothesis. It’s called the evidence, because it can be evaluated as the outcome of an experiment. Because it works as a normalising factor, it can be evaluated by marginalising the probabilities, as we shall see below.

After the first draw, we can update our knowledge, by calculating the posterior probability (i.e., the probability after the experiment has been done), according to the Bayes’ Theorem. In other words, we want to evaluate P(H|x) that the hypothesis H is true, given the event x occurred.

Suppose that we draw a green (G) ball (x=G) on our first attempt. We can then build the following table.

P(H), in the second column, is the probability of hypothesis H, listed in the first column. The likelihood P(R|H) is shown, for convenience, in the third column. In our case, the event x=G and we need to compute the likelihood of drawing a ball which is not red, which is 1-P(R|H). The product P(H)P(x|H) that the event x=G occurred, given H, is shown in the penultimate column.

The sum of all the probabilities in the penultimate column gives P(x), regardless of H. It is 1/2. Dividing the last column of the table by this number gives the posterior probability, which is added as the last column in the table.

The sum of P(H) for all H, as well as of P(H|x), must be 1, as shown. From the table we see how our belief is evolved thanks to the experiment done. Hypothesis 6 (all balls are red) has been ruled out: its probability vanishes. The probability of each hypothesis, after the experiment, decreases with the number of red balls in the ballot box. Indeed, if the number of the red balls in the urn was 5, we most likely drew a red ball, whereas, if there is just one ball, the probability of drawing a green ball is higher.

Each time we perform an experiment, we update our probability. Now, our prior is the posterior obtained after the first experiment. We are more confident that there is one red ball in the urn, rather than five, and we are sure that not all the balls are red. Let’s then reconstruct the table.

Then, let’s perform another experiment in which, we suppose we draw a red ball (x=R). In this case, the numerator of the Bayes’ Theorem is the product P(H)P(R|H) and the table changes as

Dividing each numerator by 5/18, the sum of the numerators, we get the posterior:

It is worth noting that now the posterior probabilities are symmetrical with respect to Hypothesis 3. The symmetry is due to the fact that we have drawn a green ball and a red ball and, after these experiments, we can conclude that the probability, assumed to be coincident with the frequency, of drawing a red ball is equal to that of drawing a non-red ball.

We have also ruled out another hypothesis: H=0. In fact, having drawn a red ball, it is impossible for H=0 to be true.

Now, the hypothesis H=3 is the most probable one (which is, actually, the true one).

We can iterate the process many times. Each time, we update our knowledge about the contents of the urn. It is interesting to plot the evolution of the probabilities as a function of the experiment number (we simulated 200 draws and each time computed the posterior probability):

As we can see, all priors eventually goes to zero, except the right one, P(3), which tends to 1, as predicted by the Law of Large Numbers (i.e., by the frequentist approach).

Here, you can find a jupyter notebook intended to compute the posterior after randomly drawing a ball, and iterating the process. With this tool you can see how the posterior probability evolves with the number of experiments and to understand how Bayes’ Theorem leads to an objective probability starting from a subjective one.

By playing with this tool you can even change the initial priors and see that the final result is independent on the initial guess. You can, for example, start with completely random priors P(H). Eventually, it always turn out that P(3) tends to one.

Quoting Bruno De Finetti, “probability does not exist”, in the sense that it is in no way objective, i.e., it is not intrinsic to any real experiment. Although there is no such thing as an objective probability, Bayes’ Theorem leads to an estimate of it whose value evolves with our knowledge of the system, and is updated after each experiment, leading to an estimate that is as objective as possible. In this story we have illustrated how the updating process works and why it leads to the correct answer.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsBayesianGiovanniJullatest newsOrgantiniprobabilitiesstatisticsTech NewsUpdatingWorks
Comments (0)
Add Comment