ANOVA Test, Simply Explained. An easy explanation of the ANOVA… | by Egor Howell | Jun, 2022


An easy explanation of the ANOVA statistical test and its concepts.

Photo by Kaysha on Unsplash

Many of you would have heard of the Z-Test and the T-Test; I have even done two previous posts on these topics that you can check out here:

These tests allow us to determine if two population or sample means are statistically significantly different. However, what if we wanted to test the means between three samples?

One would have to carry three different T-Tests in this scenario and if there were four groups, we would need six tests. The number of tests needed quickly explodes as the number of groups increases.

This is where the ANOVA test comes in! The ANOVA (Analysis of Variance) test uses variance to measure the differences in means across multiple groups.

The ANOVA test is an Omnibus test as it may tell you that the means are different but not how many or which specific ones are significantly different.

In this article, we will explain some pre-requisite concepts for ANOVA, the process of carrying out a hypothesis test, and go through an example problem where we will use ANOVA!

  • The population the groups are sampled from is a normal distribution.
  • Groups are sampled independently.
  • The populations that are used for the sample have equal variances.

We will now run through some key ideas we need to understand to carry out an ANOVA test. Some of these things will seem quite abstract at first, but I promise they will make much more sense when we run through an example at the end!

Variance

The main concept in the ANOVA tests is variance, which a measure of the spread/dispersion of the data. For the normal distribution, the variance is defined as:

Equation generated by author in LaTeX.

Where is the mean of the data, x_i are the individual data points, n is the number of data points and σ² is the variance.

Note: The denominator can be n-1 depending on whether it is a population or sample we are considering. There is a great stats stack exchange thread that explains this difference very well.

Sum of Squares

The variance is used to calculate the Sum of Squares (Error) (SSE). This is just the numerator of the variance equation above:

Equation generated by author in LaTeX.

In the ANOVA test, you carry out three different SSE:

  • Sum of Squares within groups (SSW): This is the just the SSE within each individual sample.
  • Sum of Squares between groups (SSB): This is SSE between the mean of each sample and the global/grand mean (the mean of the means of each group!).
  • Sum of Squares Total (SST): This is the SSE of the whole dataset which is done by combining all the samples together.

A known result is that SST = SSB + SSW.

F-Statistic

The test statistic for the ANOVA test is the F-Statistic, which is calculated as follows:

Equation generated by author in LaTeX.

where n_1 and n_2 are the degrees of freedom for each Sum of Squares:

Equation generated by author in LaTeX.

Where m is the number of groups and n is the total number of data points.

The F-Statistic is actually fundamentally the ratio of the two Chi-Square values divided by their corresponding degrees of freedom. The Chi-Square distribution is also the square of random variables from the normal distribution. Therefore, it makes sense why we are using the F-Distribution in the ANOVA test as we are squaring normally distributed variables and dividing them!

To learn more about the F-Distribution and Chi-Distribution check out my previous posts here:

Let’s briefly go over the fundamental steps involved in carrying out a statistical hypothesis test.

To gain a further understanding of these topics click on the links provided or refer to my linked T-Test and Z-Test at the top of the article!

Now lets put all that theory into practise!

We are investigating whether the effect of three different types of pill on weight loss. Through giving three separate groups of people one of these three pills, we observe the following loss in weight (in kg) :

              Group 1     Group 2     Group 3
--------------------------------
10 11 20
23 12 27
20 28 14
15 14 29
11 30 31

This is a one-way ANOVA test, as we are just looking out how one variable (type of pill) is affecting the groups. If we wanted to see how exercise and the pills affect weight loss, this would be a two-way ANOVA test.

Hypotheses and Significance Level

Lets formulate our hypotheses:

  • Null, H_0, all three pills have the same effect on weight loss. So, the means between groups aren’t significantly different:
Equation generated by author in LaTeX.
  • Alternative, H_1, at least one of the pills is better or worse weight loss than the others. So, the means between groups are significantly different:
Equation generated by author in LaTeX.

For this test we will use a significance level of 5%, which is the standard.

SSW

Now we calculate the Sum of Squares within each group using the formula:

Equation generated by author in LaTeX.

Where n_i is the number of data points in each group, m is the number of groups, x̄_i is the mean of each group and x_i,j are the data points.

Applying this formula to our data we get:

   Group 1 (Mean 15.8)   Group 2 (Mean 19)    Group 3 (Mean 24.2)
-----------------------------------------------------------
(10-15.8)^2 (11-19)^2 (20-24.2)^2
(23-15.8)^2 (12-19)^2 (27-24.2)^2
(20-15.8)^2 (28-19)^2 (14-24.2)^2
(15-15.8)^2 (14-19)^2 (29-24.2)^2
(11-15.8)^2 (30-19)^2 (31-24.2)^2
------------------------------------------------------------ Total 126.8 340 193.76
SSW = 126.8 + 340 + 193.76 = 660.56

SSB

Now we calculate the Sum of Squares between groups using the following formula:

Equation generated by author in LaTeX.

Where n is the number of data points in each group, m is the number of groups, x̄_i is the mean of each group and is the grand mean.

                       Global Mean: 19.67  Group 1 (Mean 15.8)   Group 2 (Mean 19)    Group 3 (Mean 24.2)          --------------------------------------------------------------------
5(15.8-19.67)^2 5(19-19.67)^2 5(24.2-19.67)^2
= 74.88 = 2.24 = 102.6
SSB = 74.88 + 2.24 + 102.6 = 179.72

These formulas have been sourced from here. If you want to view the full derivation of SSW and SSB make sure to check out that link!

F-Statistic and Critical Value

Therefore, our F-Statistic is:

Equation generated by author in LaTeX.

We can compare this to our critical value, which we can find using the F-Distribution Table. We know our degrees of freedom are 2 and 12, thus the critical value is 3.89.

Therefore, 1.632 < 3.89, we fail to reject the null hypothesis and each weight loss pill has a similar effect!

You can see that the more different the means of each group are to the grand mean, the more likely that they are statistically different. This can be seen mathematically as the value of SSB will increase leading to a larger F-Statistic.

In this article we have described the key concepts behind the ANOVA test and gone through a simple example for a one-way test. The ANOVA test is used primarily when we want to measure the differences between means for populations/samples between three or more groups.


An easy explanation of the ANOVA statistical test and its concepts.

Photo by Kaysha on Unsplash

Many of you would have heard of the Z-Test and the T-Test; I have even done two previous posts on these topics that you can check out here:

These tests allow us to determine if two population or sample means are statistically significantly different. However, what if we wanted to test the means between three samples?

One would have to carry three different T-Tests in this scenario and if there were four groups, we would need six tests. The number of tests needed quickly explodes as the number of groups increases.

This is where the ANOVA test comes in! The ANOVA (Analysis of Variance) test uses variance to measure the differences in means across multiple groups.

The ANOVA test is an Omnibus test as it may tell you that the means are different but not how many or which specific ones are significantly different.

In this article, we will explain some pre-requisite concepts for ANOVA, the process of carrying out a hypothesis test, and go through an example problem where we will use ANOVA!

  • The population the groups are sampled from is a normal distribution.
  • Groups are sampled independently.
  • The populations that are used for the sample have equal variances.

We will now run through some key ideas we need to understand to carry out an ANOVA test. Some of these things will seem quite abstract at first, but I promise they will make much more sense when we run through an example at the end!

Variance

The main concept in the ANOVA tests is variance, which a measure of the spread/dispersion of the data. For the normal distribution, the variance is defined as:

Equation generated by author in LaTeX.

Where is the mean of the data, x_i are the individual data points, n is the number of data points and σ² is the variance.

Note: The denominator can be n-1 depending on whether it is a population or sample we are considering. There is a great stats stack exchange thread that explains this difference very well.

Sum of Squares

The variance is used to calculate the Sum of Squares (Error) (SSE). This is just the numerator of the variance equation above:

Equation generated by author in LaTeX.

In the ANOVA test, you carry out three different SSE:

  • Sum of Squares within groups (SSW): This is the just the SSE within each individual sample.
  • Sum of Squares between groups (SSB): This is SSE between the mean of each sample and the global/grand mean (the mean of the means of each group!).
  • Sum of Squares Total (SST): This is the SSE of the whole dataset which is done by combining all the samples together.

A known result is that SST = SSB + SSW.

F-Statistic

The test statistic for the ANOVA test is the F-Statistic, which is calculated as follows:

Equation generated by author in LaTeX.

where n_1 and n_2 are the degrees of freedom for each Sum of Squares:

Equation generated by author in LaTeX.

Where m is the number of groups and n is the total number of data points.

The F-Statistic is actually fundamentally the ratio of the two Chi-Square values divided by their corresponding degrees of freedom. The Chi-Square distribution is also the square of random variables from the normal distribution. Therefore, it makes sense why we are using the F-Distribution in the ANOVA test as we are squaring normally distributed variables and dividing them!

To learn more about the F-Distribution and Chi-Distribution check out my previous posts here:

Let’s briefly go over the fundamental steps involved in carrying out a statistical hypothesis test.

To gain a further understanding of these topics click on the links provided or refer to my linked T-Test and Z-Test at the top of the article!

Now lets put all that theory into practise!

We are investigating whether the effect of three different types of pill on weight loss. Through giving three separate groups of people one of these three pills, we observe the following loss in weight (in kg) :

              Group 1     Group 2     Group 3
--------------------------------
10 11 20
23 12 27
20 28 14
15 14 29
11 30 31

This is a one-way ANOVA test, as we are just looking out how one variable (type of pill) is affecting the groups. If we wanted to see how exercise and the pills affect weight loss, this would be a two-way ANOVA test.

Hypotheses and Significance Level

Lets formulate our hypotheses:

  • Null, H_0, all three pills have the same effect on weight loss. So, the means between groups aren’t significantly different:
Equation generated by author in LaTeX.
  • Alternative, H_1, at least one of the pills is better or worse weight loss than the others. So, the means between groups are significantly different:
Equation generated by author in LaTeX.

For this test we will use a significance level of 5%, which is the standard.

SSW

Now we calculate the Sum of Squares within each group using the formula:

Equation generated by author in LaTeX.

Where n_i is the number of data points in each group, m is the number of groups, x̄_i is the mean of each group and x_i,j are the data points.

Applying this formula to our data we get:

   Group 1 (Mean 15.8)   Group 2 (Mean 19)    Group 3 (Mean 24.2)
-----------------------------------------------------------
(10-15.8)^2 (11-19)^2 (20-24.2)^2
(23-15.8)^2 (12-19)^2 (27-24.2)^2
(20-15.8)^2 (28-19)^2 (14-24.2)^2
(15-15.8)^2 (14-19)^2 (29-24.2)^2
(11-15.8)^2 (30-19)^2 (31-24.2)^2
------------------------------------------------------------ Total 126.8 340 193.76
SSW = 126.8 + 340 + 193.76 = 660.56

SSB

Now we calculate the Sum of Squares between groups using the following formula:

Equation generated by author in LaTeX.

Where n is the number of data points in each group, m is the number of groups, x̄_i is the mean of each group and is the grand mean.

                       Global Mean: 19.67  Group 1 (Mean 15.8)   Group 2 (Mean 19)    Group 3 (Mean 24.2)          --------------------------------------------------------------------
5(15.8-19.67)^2 5(19-19.67)^2 5(24.2-19.67)^2
= 74.88 = 2.24 = 102.6
SSB = 74.88 + 2.24 + 102.6 = 179.72

These formulas have been sourced from here. If you want to view the full derivation of SSW and SSB make sure to check out that link!

F-Statistic and Critical Value

Therefore, our F-Statistic is:

Equation generated by author in LaTeX.

We can compare this to our critical value, which we can find using the F-Distribution Table. We know our degrees of freedom are 2 and 12, thus the critical value is 3.89.

Therefore, 1.632 < 3.89, we fail to reject the null hypothesis and each weight loss pill has a similar effect!

You can see that the more different the means of each group are to the grand mean, the more likely that they are statistically different. This can be seen mathematically as the value of SSB will increase leading to a larger F-Statistic.

In this article we have described the key concepts behind the ANOVA test and gone through a simple example for a one-way test. The ANOVA test is used primarily when we want to measure the differences between means for populations/samples between three or more groups.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
ANOVAartificial intelligenceEasyEgorexplainedexplanationHowellJunSimplyTech NewsTechnoblendertest
Comments (0)
Add Comment