Techno Blender
Digitally Yours.
0 121

In establishing statistical significance, the p-value criterion is almost universally used. The criterion is to reject the null hypothesis (H0) in favour of the alternative (H1), when the p-value is less than the level of significance (α). The conventional values for this decision threshold include 0.05, 0.10, and 0.01.

By definition, the p-value measures how compatible the sample information is with H0: i.e., P(D|H0), the probability or likelihood of data (D) under H0. However, as made clear from the statements of the American Statistical Association (Wasserstein and Lazar, 2016), the p-value criterion as a decision rule has a number of serious deficiencies. The main deficiencies include

1. the p-value is a decreasing function of sample size;
2. the criterion completely ignores P(D|H1), the compatibility of data with H1; and
3. the conventional values of α (such as 0.05) are arbitrary with little scientific justification.

One of the consequences is that the p-value criterion frequently rejects H0 when it is violated by a practically negligible margin. This is especially so when the sample size is large or massive. This situation occurs because, while the p-value is a decreasing function of sample size, its threshold (α) is fixed and does not decrease with sample size. On this point, Wasserstein and Lazar (2016) strongly recommend that the p-value be supplemented or even replaced with other alternatives.

In this post, I introduce a range of simple, but more sensible, alternatives to the p-value criterion which can overcome the above-mentioned deficiencies. They can be classified into three categories:

1. Balancing P(D|H0) and P(D|H1) (Bayesian method);
2. Adjusting the level of significance (α); and

These alternatives are simple to compute, and can provide more sensible inferential outcomes than those solely based on the p-value criterion, which will be demonstrated using an application with R codes.

Consider a linear regression model

Y = β0 + β1 X1 + … + βk Xk + u,

where Y is the dependent variable, X’s are independent variables, and u is a random error term following a normal distribution with zero mean and fixed variance. We consider testing for

H0: β1 = … = βq = 0,

against H1 that H0 does not hold (q ≤ k). A simple example is H0: β1 = 0; H1: β1 ≠ 0, where q =1.

Borrowing from the Bayesian statistical inference, we define the following probabilities:

Prob(H0|D): posterior probability for H0, which is the probability or likelihood of H0 after the researcher observes the data D;

Prob(H1|D) ≡ 1 — Prob(H0|D): posterior probability for H1;

Prob(D|H0): (marginal) likelihood of data under H0;

Prob(D|H1): (marginal) likelihood of data under H1;

P(H0): prior probability for H0, representing the researcher’s belief about H0 before she observes the data;

P(H1) = 1- P(H0): prior probability for H1.

These probabilities are related (by Bayes rule) as

In establishing statistical significance, the p-value criterion is almost universally used. The criterion is to reject the null hypothesis (H0) in favour of the alternative (H1), when the p-value is less than the level of significance (α). The conventional values for this decision threshold include 0.05, 0.10, and 0.01.

By definition, the p-value measures how compatible the sample information is with H0: i.e., P(D|H0), the probability or likelihood of data (D) under H0. However, as made clear from the statements of the American Statistical Association (Wasserstein and Lazar, 2016), the p-value criterion as a decision rule has a number of serious deficiencies. The main deficiencies include

1. the p-value is a decreasing function of sample size;
2. the criterion completely ignores P(D|H1), the compatibility of data with H1; and
3. the conventional values of α (such as 0.05) are arbitrary with little scientific justification.

One of the consequences is that the p-value criterion frequently rejects H0 when it is violated by a practically negligible margin. This is especially so when the sample size is large or massive. This situation occurs because, while the p-value is a decreasing function of sample size, its threshold (α) is fixed and does not decrease with sample size. On this point, Wasserstein and Lazar (2016) strongly recommend that the p-value be supplemented or even replaced with other alternatives.

In this post, I introduce a range of simple, but more sensible, alternatives to the p-value criterion which can overcome the above-mentioned deficiencies. They can be classified into three categories:

1. Balancing P(D|H0) and P(D|H1) (Bayesian method);
2. Adjusting the level of significance (α); and

These alternatives are simple to compute, and can provide more sensible inferential outcomes than those solely based on the p-value criterion, which will be demonstrated using an application with R codes.

Consider a linear regression model

Y = β0 + β1 X1 + … + βk Xk + u,

where Y is the dependent variable, X’s are independent variables, and u is a random error term following a normal distribution with zero mean and fixed variance. We consider testing for

H0: β1 = … = βq = 0,

against H1 that H0 does not hold (q ≤ k). A simple example is H0: β1 = 0; H1: β1 ≠ 0, where q =1.

Borrowing from the Bayesian statistical inference, we define the following probabilities:

Prob(H0|D): posterior probability for H0, which is the probability or likelihood of H0 after the researcher observes the data D;

Prob(H1|D) ≡ 1 — Prob(H0|D): posterior probability for H1;

Prob(D|H0): (marginal) likelihood of data under H0;

Prob(D|H1): (marginal) likelihood of data under H1;

P(H0): prior probability for H0, representing the researcher’s belief about H0 before she observes the data;

P(H1) = 1- P(H0): prior probability for H1.

These probabilities are related (by Bayes rule) as