Introduction to Hypothesis Testing with Examples | by Neeraj Krishna | Jan, 2023

By Jessie Hobb On Jan 6, 2023

A comprehensible guide on hypothesis testing with examples and visualizations

Most tutorials I’ve seen on hypothesis testing start with a prior assumption of the distribution, list down some definitions and formulae, and directly apply them to solve a problem.

However, in this tutorial, we will learn from the first principles. This will be an example-driven tutorial where we start with a basic example and build our way up to understand the foundations of hypothesis testing.

Let’s get started.

Imagine there are two indistinguishable dice in front of you. One is fair, and the other is loaded. You randomly pick a die and toss it. After observing on which face it lands, can you determine which die you’ve picked?

The probability distribution of the dice is shown below:

Die 1:
P(X=x) = 1/6 if x = {1, 2, 3, 4, 5, 6}Die 2:
P(X=x) = 1/4 if x = {1, 2}
= 1/8 if x = {3, 4, 5, 6}

In binary hypothesis testing problems, we’ll often be presented with two choices which we call hypotheses, and we’ll have to decide whether to pick one or the other.

The hypotheses are represented by H₀ and H₁ and are called null and alternate hypotheses respectively. In hypothesis testing, we either reject or accept the null hypothesis.

In our example, die 1 and die 2 are null and alternate hypotheses respectively.

If you think about it intuitively, if the die lands on 1 or 2, it’s more likely die 2 because it has more probability to land on 1 or 2. So the decision to accept or reject the null hypothesis depends on the distribution of the observations.

So we can say the goal of hypothesis testing is to draw a boundary and separate the observation space into two regions: the rejection region and the acceptance region.

If the observation falls in the rejection region, we reject the null hypothesis, else we accept it. Now, the decision boundary isn’t going to be perfect and we’re going to make errors. For example, it’s possible that die 1 lands on 1 or 2 and we mistake it for die 2; but there is less probability of this happening. We’ll learn how to calculate the probabilities of errors in the next section.

How do we determine the decision boundary? There’s a simple and effective method called the likelihood ratio test we’ll discuss next.

You’ve got to realize first the distribution of the observations depends on the hypotheses. Below I’ve plotted the distributions in our example under the two hypotheses:

Probability distributions of the observations under both hypotheses

Now, P(X=x;H₀) and P(X=x;H₁) represents the likelihood of observations under hypotheses H₀ and H₁ respectively. Their ratio tells us how likely one hypothesis is true over the other for different observations.

This ratio is called the likelihood ratio and is represented by L(X). L(X) is a random variable that depends on the observation x.

In the likelihood ratio test, we reject the null hypothesis if the ratio is above a certain value i.e, reject the null hypothesis if L(X) > 𝜉, else accept it. 𝜉 is called the critical ratio.

So this is how we can draw a decision boundary: we separate the observations for which the likelihood ratio is greater than the critical ratio from the observations for which it isn’t.

So the observations of the form {x | L(x) > 𝜉} fall into the rejection region while the rest of them fall into the acceptance region.

Let’s illustrate it with our dice example. The likelihood ratio can be calculated as:

L(X) = (1/4) / (1/6) = 3/2 if x = {1, 2}
= (1/8) / (1/6) = 3/4 if x = {3, 4, 5, 6}

The plot of the likelihood ratio looks like this:

Now the placement of the decision boundary comes down to choosing the critical ratio. Let’s assume the critical ratio is a value between 3/2 and 3/4 i.e., 3/4 < 𝜉 < 3/2. Then our decision boundary looks like this:

if 3/4 < 𝜉 < 3/2:L(X) > 𝜉 if x = {1, 2} (rejection region)
L(X) < 𝜉 if x = {3, 4, 5, 6} (acceptance region)

Rejection and Acceptance regions if the likelihood ratio is between 3/4 and 3/2

Let’s discuss the errors associated with this decision. The first type of error occurs if observation x belongs to the rejection region but occurs under the null hypothesis. In our example, it means die 1 lands on 1 or 2.

This is called the false rejection error or the type 1 error. The probability of this error is represented by 𝛼 and can be computed as:

False Rejection Error:𝛼 = P(X|L(X) > 𝜉 ; H₀)

The second error occurs if observation x belongs to the acceptance region but occurs under the alternate hypothesis. This is called the false acceptance error or the type 2 error. The probability of this error is represented by 𝛽 and can be computed as:

False Acceptance Error:𝛽 = P(X|L(X) < 𝜉 ; H₁)

In our example, the false rejection and the false acceptance error can be calculated as:

Computing errors in the dice example:𝛼 = P(X|L(X) > 𝜉 ; H₀)
= P(X={1, 2} ; H₀)
= 2 * 1/6 
= 1/3
𝛽 = P(X|L(X) < 𝜉 ; H₁)
= P(X={3, 4, 5, 6} ; H₁)
= 4 * 1/8
= 1/2

Let’s consider two other scenarios where the critical ratio takes the following values: 𝜉 > 3/2 and 𝜉 < 3/4.

The type 1 and type 2 errors can be computed similarly.

𝛼 = 0 if 𝜉 > 3/2
= 1/3 if 3/4 < 𝜉 < 3/2
= 1 if 𝜉 < 3/4𝛽 = 1 if 𝜉 > 3/2
= 1/2 if 3/4 < 𝜉 < 3/2
= 0 if 𝜉 < 3/4

Let’s plot both the errors for different values of 𝜉.

As the critical value 𝜉 increases, the rejection region becomes smaller. As a result, the false rejection probability 𝛼 decreases, while the false acceptance probability 𝛽 increases.

We could draw a boundary in the observation space anywhere. Why do we need to compute the likelihood ratio and go through all that? Let’s see why.

Below I’ve calculated the type I and type II errors for different boundaries.

Type I and Type II errors for different boundaries.'|' is the separator - {rejection region | acceptance region}
1. {|, 1, 2, 3, 4, 5, 6}
𝛼 = P(x={} ; H₀) = 0
𝛽 = P(x={1, 2, 3, 4, 5, 6} ; H₁) = 1
𝛼 + 𝛽 = 1
2. {1, |, 2, 3, 4, 5, 6}
𝛼 = P(x={1} ; H₀) = 1/6
𝛽 = P(x={2, 3, 4, 5, 6} ; H₁) = 1/4 + 1/2 = 3/4
𝛼 + 𝛽 = 0.916
3. {1, 2, |, 3, 4, 5, 6}
𝛼 = P(x={1, 2} ; H₀) = 1/3
𝛽 = P(x={3, 4, 5, 6} ; H₁) = 1/2
𝛼 + 𝛽 = 0.833
4. {1, 2, 3, |, 4, 5, 6}
𝛼 = P(x={1, 2, 3} ; H₀) = 1/2
𝛽 = P(x={4, 5, 6} ; H₁) = 3/8
𝛼 + 𝛽 = 0.875
5. {1, 2, 3, 4, |, 5, 6}
𝛼 = P(x={1, 2, 3, 4} ; H₀) = 2/3
𝛽 = P(x={5, 6} ; H₁) = 1/4
𝛼 + 𝛽 = 0.916
6. {1, 2, 3, 4, 5, |, 6}
𝛼 = P(x={1, 2, 3, 4, 5} ; H₀) = 5/6
𝛽 = P(x={6} ; H₁) = 1/8
𝛼 + 𝛽 = 0.958
6. {1, 2, 3, 4, 5, 6, |}
𝛼 = P(x={1, 2, 3, 4, 5, 6} ; H₀) = 1
𝛽 = P(x={} ; H₁) = 0
𝛼 + 𝛽 = 1

The plot of Type I and Type II errors with their sum for different boundaries looks like this:

We can see for the optimum value of the critical ratio obtained from the likelihood ratio test, the sum of type I and type II errors is the least.

In other words, for a given false rejection probability, the likelihood ratio test offers the smallest possible false acceptance probability.

This is called the Neyman-Pearson Lemma. I’ve referenced the theoretical proof at the end of the article.

In the above example, we didn’t discuss how to choose the value of the critical ratio 𝜉. The probability distributions were discrete, so a small change in the critical ratio 𝜉 will not affect the boundary.

When we are dealing with continuous distributions, we fix the value of the false rejection probability 𝛼 and calculate the critical ratio based on that.

P(L(X) > 𝜉 ; H₀) = 𝛼

But again, the process would be the same. Once we obtain the value of the critical ratio, we separate the observation space.

Typical choices for 𝛼 are 𝛼 = 0.01, 𝛼 = 0.05, or 𝛼 = 0.01, depending on the degree of the undesirability of false rejection.

For example, if we’re dealing with a normal distribution, we could standardize it and look up the Z-table to find 𝜉 for a given 𝛼.

In this article, we’ve looked at the idea behind hypothesis testing and the intuition behind the process. The whole process can be summarized in the diagram below:

We start with two hypotheses H₀ and H₁ such that the distribution of the underlying data depends on the hypotheses. The goal is to prove or disprove the null hypothesis H₀ by finding a decision rule that maps the realized value of the observation x to one of the two hypotheses. Finally, we calculate the errors associated with the decision rule.

However, in the real world, the distinction between the two hypotheses wouldn’t be straightforward. So we’d have to do some workarounds to perform hypothesis testing. Let’s discuss this in the next article.

Hope you’ve enjoyed this article. Let’s connect.

Let's ConnectHope you've enjoyed the article. Please clap and follow if you did.You can also reach out to me on LinkedIn and Twitter.