Techno Blender
Digitally Yours.

Type I & II Errors and Sample Size Calculation in Hypothesis Testing | by Aaron Zhu | Feb, 2023

0 460


Photo by Scott Graham on Unsplash

In the world of statistics and data analysis, hypothesis testing is a fundamental concept that plays a vital role in making informed decisions. In this blog, we will delve deeper into hypothesis testing, specifically focusing on how to reduce type I and type II errors. We will discuss the factors that influence these errors, such as significance level, sample size, and data variability. So, let’s dive in and explore the intricacies of hypothesis testing!

We will use the following example throughout this blog.

The average student’s GPA from the previous semester was 2.70. A tutoring program launched in the current semester. We’re interested in performing the following hypothesis test to study if the tutoring program would improve the student’s GPA.

At the end of the current semester, we collect 20 random GPA records and assume that student GPA is normally distributed with a standard deviation (σ) of 0.5. μ represents the average GPA from the population.

  • The null hypothesis: μ = 2.70 (i.e., the tutoring program is not helpful in improving students’ GPA.)
  • The alternative hypothesis: μ > 2.70 (i.e., the tutoring program is helpful.)

The school funding is very limited. We would like to minimize the risk of committing the type I error (falsely concluding that the tutoring program is helpful when it is not).

You might ask

What are the factors we need to consider to reduce the type I error?

1. Significance Level (α)

The significance level (α) is a predefined maximum probability of commuting type I error we’re willing to accept.

With the significance level, we can find the critical value to reject the null hypothesis in a hypothesis test.

  • μ is the population parameter (e.g., the population mean) in the null hypothesis.
  • σ is the population standard deviation. If σ is unknown, we can estimate it using sample standard deviation, s.
  • n is the sample size
  • Z is the Z statistics associated with a given α. If σ is unknown or the sample size is less than 30, we would use T statistics to produce a more realizable result.

We would reject the null hypothesis if the observed sample statistic (e.g., sample mean) is equal to or more extreme than the critical value.

When we make a decision based on the significance level (α), there is a maximum α *100% risk of committing a type I error.

P(type I error, i.e., rejecting the null hypothesis when X̄ > critical value and the null hypothesis is correct) = α *100%

Image by author

For example,

Image by author

The lower the significance level (α) is, the lower the risk of committing a type I error.

2. Sample Size

Another factor that can affect type I error is the change in the sample size (e.g., from n = 20 to n = 100), and let’s see how it affects the probabilities of type I error.

For example,

when α = 0.1, n = 20,

P(type I error of rejecting null hypothesis when X̄ > 2.84) = P(Z > 2.84–2.7/0.5/√20) = 10%

when α = 0.1, n = 100,

P(type I error of rejecting null hypothesis when X̄ > 2.84) = P(Z > 2.84–2.7/0.5/√100) = 0.26%

We can change the sample size for different significance levels and get the same results.

Image by author

Across different significance levels, as the sample size increases, the probability of a type I error would decrease.

This can be understood with common sense. The larger the sample size, the more information you have about the population. This means that the precision of the test statistic improves, and the probability of a type I error decreases.

3. Data Variability

Data variability can also affect type I error. If the data variability decreases (i.e., the population standard deviation becomes smaller), we would expect to have a smaller probability of committing a type I error.

For example,

When α = 0.1, SD= 0.5,

P(type I error of rejecting null hypothesis when X̄ > 2.84) = P(Z > 2.84–2.7/0.5/√20) = 10%

When α = 0.1, SD=0.3,

P(type I error of rejecting null hypothesis when X̄ > 2.84) = P(Z > 2.84–2.7/0.3/√20) = 1.8%

We can change the standard deviation for different significance levels and get the same results.

Image by author

Across different significance levels, the probability of type I error would decrease as the standard deviation decreases.

On the other hand, if the population being studied is more variable, it can be more difficult to detect a true effect. In other words, the probability of a type I error increases. This is because the test statistic has a larger spread and it’s harder to distinguish between the null hypothesis and the alternative hypothesis.

Next, we will discuss how to reduce the type II error of a hypothesis test.

But first,

How to compute the type II error of a hypothesis test when the alternative hypothesis is correct?

If we fail to reject the null hypothesis when the alternative hypothesis is correct, we are committing a type II error.

In this example, if the alternative hypothesis is true (e.g., the true population GPA mean is 3.0), the probability of a type II error can be computed as

P(type II error, i.e., failing to reject the null hypothesis when X̄ < critical value and the alternative hypothesis is correct) = β*100%

Image by author

In many cases, we’re also interested in calculating the power of the hypothesis test.

The power (computed as 1-β) of a hypothesis test is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is correct.

What are the factors we need to consider to reduce the type II error (or increase the power)?

1. Significance Level (α)

The Significance level (α) also affects the type II error but in the opposite direction.

For example,

When α = 0.1, SD= 0.5, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√20) = 8%

When α = 0.05, SD= 0.5, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.88) = P(Z < 2.88–3.0/0.5/√20) = 14%

Image by author

A decrease in the significance level (α) causes an increase in the probability of the type II error or a decrease in the power.

2. Sample Size

The sample size affects the type II error in the same way as the type I error.

For example,

When α = 0.1, SD= 0.5, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√20) = 8%

When α = 0.1, SD= 0.5, n=100, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√100) = 0.069%

Image by author

Across different significance levels, as the sample size increases, the probability of a type II error would decrease.

3. Data Variability

The data variability also affects the type II error in the same way as the type I error.

For example,

When α = 0.1, SD= 0.5, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√20) = 8%

When α = 0.1, SD= 0.3, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.3/√20) = 0.85%

Image by author

Across different significance levels, the probability of type II error would decrease as the standard deviation decreases.

4. Effect Size

The effect size is the magnitude of the difference between the null hypothesis and the alternative hypothesis (e.g., when the true population GPA mean is 3.0, the effect size is 0.3, (3.0–2.7)).

If the effect size increases, it’s easier to detect a true effect, and the probability of a type II error decreases.

For example,

When α = 0.1, SD= 0.5, n=20, effect size = 0.3

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√20) = 8%

When α = 0.1, SD= 0.5, n=20, effect size = 0.4

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.1/0.5/√20) = 1%

Image by author

The table below summarizes the relationships between type I & II errors and these various factors.

Image by author

Now we understand that type I and II errors are functions of various factors.

How can we decrease the probabilities of type I and II errors simultaneously?

The simple answer is: increasing the sample size is the only way to decrease α and β simultaneously.

But how do we compute the sample size?

Power analysis is a popular tool to calculate the sample size to achieve the desired level of type I & II errors.

We would need the following information to calculate the sample size.

1. Significance Level (α): we typically would determine α value in advance. The common α values are 0.01, 0.05, and 0.1.

2. Power (1-β): it is the strength of your hypothesis test to detect the effect. The higher the power, the more likely detect the effect and the lower the risk of a type II error. We usually set the power to 80% or 20% β.

3. Data variability (i.e., standard deviation, σ): it is not up to us to determine the data variability. We would either need domain knowledge from experts or implement analysis on the sample data.

4. Effect size (δ): In practice, we wouldn’t know the true effect size because we’re only working with the sample data. Instead, we can determine the minimal important difference (MID), the smallest difference in a measured outcome that is considered to be meaningful or clinically relevant. We can set the effect size equal to the MID.

  • If the true effect size is smaller than the MID, then the effect size is not practically significant. For example, if the tutoring only increases students’ GPA by 0.1 (i.e., effect size = 0.1), would this be a rocking conclusion? Probably No. Therefore, a sample size that is large enough to detect an effect size that is smaller than the MID would be considered a waste of resources.
  • If the true effect size is bigger than the MID, then we’re more likely to detect the true effect. Therefore, the sample size based on the MID is sufficient for a hypothesis test.

Here are the basic formulas to compute the sample size.

From the formulas, we can summarize the relationships between sample size and these factors.

Image by author

In conclusion, when performing a hypothesis test, it is essential to consider the type I and type II errors, and the factors that can affect them. By carefully considering these factors and balancing the risks of both types of errors, we can make more accurate and informed decisions based on the results of our hypothesis tests.

If you would like to explore more posts related to Statistics, please check out my articles:

If you enjoy this article and would like to Buy Me a Coffee, please click here.

You can sign up for a membership to unlock full access to my articles, and have unlimited access to everything on Medium. Please subscribe if you’d like to get an email notification whenever I post a new article.


Photo by Scott Graham on Unsplash

In the world of statistics and data analysis, hypothesis testing is a fundamental concept that plays a vital role in making informed decisions. In this blog, we will delve deeper into hypothesis testing, specifically focusing on how to reduce type I and type II errors. We will discuss the factors that influence these errors, such as significance level, sample size, and data variability. So, let’s dive in and explore the intricacies of hypothesis testing!

We will use the following example throughout this blog.

The average student’s GPA from the previous semester was 2.70. A tutoring program launched in the current semester. We’re interested in performing the following hypothesis test to study if the tutoring program would improve the student’s GPA.

At the end of the current semester, we collect 20 random GPA records and assume that student GPA is normally distributed with a standard deviation (σ) of 0.5. μ represents the average GPA from the population.

  • The null hypothesis: μ = 2.70 (i.e., the tutoring program is not helpful in improving students’ GPA.)
  • The alternative hypothesis: μ > 2.70 (i.e., the tutoring program is helpful.)

The school funding is very limited. We would like to minimize the risk of committing the type I error (falsely concluding that the tutoring program is helpful when it is not).

You might ask

What are the factors we need to consider to reduce the type I error?

1. Significance Level (α)

The significance level (α) is a predefined maximum probability of commuting type I error we’re willing to accept.

With the significance level, we can find the critical value to reject the null hypothesis in a hypothesis test.

  • μ is the population parameter (e.g., the population mean) in the null hypothesis.
  • σ is the population standard deviation. If σ is unknown, we can estimate it using sample standard deviation, s.
  • n is the sample size
  • Z is the Z statistics associated with a given α. If σ is unknown or the sample size is less than 30, we would use T statistics to produce a more realizable result.

We would reject the null hypothesis if the observed sample statistic (e.g., sample mean) is equal to or more extreme than the critical value.

When we make a decision based on the significance level (α), there is a maximum α *100% risk of committing a type I error.

P(type I error, i.e., rejecting the null hypothesis when X̄ > critical value and the null hypothesis is correct) = α *100%

Image by author

For example,

Image by author

The lower the significance level (α) is, the lower the risk of committing a type I error.

2. Sample Size

Another factor that can affect type I error is the change in the sample size (e.g., from n = 20 to n = 100), and let’s see how it affects the probabilities of type I error.

For example,

when α = 0.1, n = 20,

P(type I error of rejecting null hypothesis when X̄ > 2.84) = P(Z > 2.84–2.7/0.5/√20) = 10%

when α = 0.1, n = 100,

P(type I error of rejecting null hypothesis when X̄ > 2.84) = P(Z > 2.84–2.7/0.5/√100) = 0.26%

We can change the sample size for different significance levels and get the same results.

Image by author

Across different significance levels, as the sample size increases, the probability of a type I error would decrease.

This can be understood with common sense. The larger the sample size, the more information you have about the population. This means that the precision of the test statistic improves, and the probability of a type I error decreases.

3. Data Variability

Data variability can also affect type I error. If the data variability decreases (i.e., the population standard deviation becomes smaller), we would expect to have a smaller probability of committing a type I error.

For example,

When α = 0.1, SD= 0.5,

P(type I error of rejecting null hypothesis when X̄ > 2.84) = P(Z > 2.84–2.7/0.5/√20) = 10%

When α = 0.1, SD=0.3,

P(type I error of rejecting null hypothesis when X̄ > 2.84) = P(Z > 2.84–2.7/0.3/√20) = 1.8%

We can change the standard deviation for different significance levels and get the same results.

Image by author

Across different significance levels, the probability of type I error would decrease as the standard deviation decreases.

On the other hand, if the population being studied is more variable, it can be more difficult to detect a true effect. In other words, the probability of a type I error increases. This is because the test statistic has a larger spread and it’s harder to distinguish between the null hypothesis and the alternative hypothesis.

Next, we will discuss how to reduce the type II error of a hypothesis test.

But first,

How to compute the type II error of a hypothesis test when the alternative hypothesis is correct?

If we fail to reject the null hypothesis when the alternative hypothesis is correct, we are committing a type II error.

In this example, if the alternative hypothesis is true (e.g., the true population GPA mean is 3.0), the probability of a type II error can be computed as

P(type II error, i.e., failing to reject the null hypothesis when X̄ < critical value and the alternative hypothesis is correct) = β*100%

Image by author

In many cases, we’re also interested in calculating the power of the hypothesis test.

The power (computed as 1-β) of a hypothesis test is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is correct.

What are the factors we need to consider to reduce the type II error (or increase the power)?

1. Significance Level (α)

The Significance level (α) also affects the type II error but in the opposite direction.

For example,

When α = 0.1, SD= 0.5, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√20) = 8%

When α = 0.05, SD= 0.5, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.88) = P(Z < 2.88–3.0/0.5/√20) = 14%

Image by author

A decrease in the significance level (α) causes an increase in the probability of the type II error or a decrease in the power.

2. Sample Size

The sample size affects the type II error in the same way as the type I error.

For example,

When α = 0.1, SD= 0.5, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√20) = 8%

When α = 0.1, SD= 0.5, n=100, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√100) = 0.069%

Image by author

Across different significance levels, as the sample size increases, the probability of a type II error would decrease.

3. Data Variability

The data variability also affects the type II error in the same way as the type I error.

For example,

When α = 0.1, SD= 0.5, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√20) = 8%

When α = 0.1, SD= 0.3, n=20, true μ = 3.0

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.3/√20) = 0.85%

Image by author

Across different significance levels, the probability of type II error would decrease as the standard deviation decreases.

4. Effect Size

The effect size is the magnitude of the difference between the null hypothesis and the alternative hypothesis (e.g., when the true population GPA mean is 3.0, the effect size is 0.3, (3.0–2.7)).

If the effect size increases, it’s easier to detect a true effect, and the probability of a type II error decreases.

For example,

When α = 0.1, SD= 0.5, n=20, effect size = 0.3

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.0/0.5/√20) = 8%

When α = 0.1, SD= 0.5, n=20, effect size = 0.4

P(type II error of failing to reject the null hypothesis when X̄ < 2.84) = P(Z < 2.84–3.1/0.5/√20) = 1%

Image by author

The table below summarizes the relationships between type I & II errors and these various factors.

Image by author

Now we understand that type I and II errors are functions of various factors.

How can we decrease the probabilities of type I and II errors simultaneously?

The simple answer is: increasing the sample size is the only way to decrease α and β simultaneously.

But how do we compute the sample size?

Power analysis is a popular tool to calculate the sample size to achieve the desired level of type I & II errors.

We would need the following information to calculate the sample size.

1. Significance Level (α): we typically would determine α value in advance. The common α values are 0.01, 0.05, and 0.1.

2. Power (1-β): it is the strength of your hypothesis test to detect the effect. The higher the power, the more likely detect the effect and the lower the risk of a type II error. We usually set the power to 80% or 20% β.

3. Data variability (i.e., standard deviation, σ): it is not up to us to determine the data variability. We would either need domain knowledge from experts or implement analysis on the sample data.

4. Effect size (δ): In practice, we wouldn’t know the true effect size because we’re only working with the sample data. Instead, we can determine the minimal important difference (MID), the smallest difference in a measured outcome that is considered to be meaningful or clinically relevant. We can set the effect size equal to the MID.

  • If the true effect size is smaller than the MID, then the effect size is not practically significant. For example, if the tutoring only increases students’ GPA by 0.1 (i.e., effect size = 0.1), would this be a rocking conclusion? Probably No. Therefore, a sample size that is large enough to detect an effect size that is smaller than the MID would be considered a waste of resources.
  • If the true effect size is bigger than the MID, then we’re more likely to detect the true effect. Therefore, the sample size based on the MID is sufficient for a hypothesis test.

Here are the basic formulas to compute the sample size.

From the formulas, we can summarize the relationships between sample size and these factors.

Image by author

In conclusion, when performing a hypothesis test, it is essential to consider the type I and type II errors, and the factors that can affect them. By carefully considering these factors and balancing the risks of both types of errors, we can make more accurate and informed decisions based on the results of our hypothesis tests.

If you would like to explore more posts related to Statistics, please check out my articles:

If you enjoy this article and would like to Buy Me a Coffee, please click here.

You can sign up for a membership to unlock full access to my articles, and have unlimited access to everything on Medium. Please subscribe if you’d like to get an email notification whenever I post a new article.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment