Techno Blender
Digitally Yours.

Linear Regression with OLS: Unbiased, Consistent, BLUE, Best (Efficient) Estimator | by Aaron Zhu | May, 2022

0 62


Understand OLS Linear Regression with a bit of math

Image by Author

The OLS estimator is known to be unbiased, consistent and BLUE (Best Linear Unbiased Estimator). But what do these properties mean? Why are they important for a linear regression model? In this article, we will discuss these properties.

A typical linear regression looks like something as follows. The response variable (i.e., Y) is explained as a linear combination of explanatory variables (e.g., the intercept, X1, X2, X3, …) and ε is the error term (i.e., a random variable) that represents the difference between the fitted response value and the actual response value.

Figure 1 (Image by author)

In order for OLS to work, we need to make sure some assumptions are true.

Assumptions 1- Linearity in Parameters: The parameter in the linear model is linear.

Figure 2 (Image by author)

Moreover, the OLS estimator is also linear, we can rewrite the OLS closed-form solution as follows (by substituting Y from figure 1 into figure 3). Matrix algebra only works in the presence of linearity. Therefore, the linearity assumption is proven.

Figure 3 (Image by author)
Figure 4 (Image by author)

Assumptions 2- Random Sampling: The observed data represent iid Independent and Identically Distributed) random samples that follow the population model (See figure 1). If data is collected cross-sectionally, we need to make sure they are sampled randomly. The bottom line is the observed data should be representative of the population data

Assumptions 3- No Perfect Collinearity: Any explanatory variable can NOT be expressed as a linear combination of other explanatory variable(s). The reason is the inverse matrix (in figure 3) exists only if X has full rank, meaning if there is perfect collinearity, it won’t have a closed-form solution.

Assumptions 4- Zero conditional mean: Expected value of the error term is zero conditional on all values of the explanatory variable (i.e., E[ε|X] = 0)

Assumptions 5- Homoscedasticity and no Autocorrelation: The error term should have constant variance and iid. In other words, the diagonal values in the variance-covariance matrix of the error term should be constant and off-diagonal values should be all 0.

Assumptions 6- Normality of Errors: The error term is normally distributed. This assumption is not required for the validity of OLS method, but this allows us to have a reliable standard error of estimates and make meaningful statistical inferences.

β vs β^ vs E(β^)

You might have seen some variations of β (e.g., β, β^, E(β^)) in statistics textbooks. Let’s discuss their definitions and differences.

β is a conceptual value- the true (and usually unknown) parameter value(s) (i.e., constant values) which explain the relationship between the explanatory variable(s) and the dependent variable in a population data.

In most cases, we won’t be using population data because it is not available or too large to process. Therefore, we would use sample data (with a finite number of observations) to develop our linear regression model.

Under the assumption of Random Sampling, the observed sample data represent an i.i.d. random sample of size n, which follows the population model. Suppose we have multiple sets of sample data (by drawing samples from the population repeatedly) and run the model separately in each dataset.

In a given sample dataset, we would have an OLS estimator, β^, which can be solved with the closed-form solution (figure 3).

It is very likely that we would get a different set of estimators (i.e., β^) in different datasets. Therefore, β^ is a random variable, with a sampling distribution with a sample mean and variance.

E(β^) is the expected value of this random variable, β^. In layman’s terms, if we run the linear model in multiple sets of samples, keep recording the values of the estimators and take an average. The average value is the expected value, E(β^).

Unbiasedness

Under the finite-sample properties, we say OLS estimator is unbiased, meaning the expected value of OLS estimator, E(β^) would equal the true population parameter, β.

Unbiasedness does NOT imply that the OLS estimator we get from the observed data (i.e., one set of random samples) would equal the exact population parameter value because the linear model still can’t fully explain the relationship due to the irreducible error term ε.

Instead, the unbiasedness property implies that if we run the linear regression model repeatedly on different sets of random samples from the same population, then the expected value of the estimator would equal the true population parameter as proven below.

Figure 5 (Image by author)

Although the OLS estimators we get from the observed data don’t equal the exact population parameter value, as long as the observed data is a good representative of the population data and the linear model is correctly specified under the assumptions, then the coefficient estimator we get from the observed data should be very closed to the true population parameter value.

Otherwise, if observed data is NOT a good representative of the population data, the model would suffer from measurement error, or the linear model is NOT correctly specified due to common issues (e.g., omitted variables or endogeneity), then the coefficient estimator we get from the observed data would be biased.

Consistency

Under the asymptotic properties, we say OLS estimator is consistent, meaning OLS estimator would converge to the true population parameter as the sample size get larger, and tends to infinity.

From Jeffrey Wooldridge’s textbook, Introductory Econometrics, C.3, we can show that the probability limit of the OLS estimator would equal the true population parameter as the sample size gets larger if assumptions hold.

Figure 6 (Image by author)
  • When E[ε|X] = 0 holds, it implies Cov(X, u) = 0, then the second term in figure 6 equals to 0. We’ve proved that as the sample size gets larger, OLS estimator would converge to the true population parameter. Therefore OLS estimator is consistent.
  • If Cov(X, u) ≠ 0, then we have an inconsistent estimator. The inconsistent issue won’t go away as the sample size increases. At the same time, the OLS estimator is biased as well.
  • If Cov(X, u) > 0 meaning x is positively correlated with the error term, then asymptotic bias is upward.
  • If Cov(X, u) < 0 meaning x is negatively correlated with the error term, then asymptotic bias is downward.

You might be wondering why are we interested in large sample properties, such as consistency, when in practice we have finite samples.

The answer is if we can show that an estimator is consistent when the sample size gets larger, then we may be more confident and optimistic about the estimator in finite samples. On the other hand, if an estimator is inconsistent, we know that the estimator is biased in finite samples.

Efficiency

To evaluate an estimator of a linear regression model, we use its efficiency based on its bias and variance.

  • An estimator that is unbiased but does not have the minimum variance is not the best.
  • An estimator that has the minimum variance but is biased is not the best
  • An estimator that is unbiased and has the minimum variance is the best (efficient).
  • The OLS estimator is the best (efficient) estimator because OLS estimators have the least variance among all linear and unbiased estimators.
Figure 7 (Image by author)

We can prove Gauss-Markov theorem with a bit of matrix operations.

Figure 8 (Image by author)
Figure 9 (Image by author)

Now we’ve proved that the variance of OLS estimator is smaller than any other linear unbiased estimator. Therefore, OLS is the Best (efficient) linear estimator.

Final Notes

  • An estimator is unbiased if the expected value of the sampling distribution of the estimators is equal the true population parameter value.
  • An estimator is consistent if, as the sample size increases, tends to infinity, the estimates converge to the true population parameter. In other words- consistency means that, as the sample size increases, the sampling distribution of the estimator becomes more concentrated at the population parameter value and the variance becomes smaller.
  • Under OLS assumptions, OLS estimator is BLUE (least variance among all linear unbiased estimators). Therefore, it is the best (efficient) estimator.

Here are some related posts you can explore if you’re interested in this topic.

You can sign up for a membership to unlock full access to my articles, and have unlimited access to everything on Medium. Please subscribe if you’d like to get an email notification whenever I post a new article.


Understand OLS Linear Regression with a bit of math

Image by Author

The OLS estimator is known to be unbiased, consistent and BLUE (Best Linear Unbiased Estimator). But what do these properties mean? Why are they important for a linear regression model? In this article, we will discuss these properties.

A typical linear regression looks like something as follows. The response variable (i.e., Y) is explained as a linear combination of explanatory variables (e.g., the intercept, X1, X2, X3, …) and ε is the error term (i.e., a random variable) that represents the difference between the fitted response value and the actual response value.

Figure 1 (Image by author)

In order for OLS to work, we need to make sure some assumptions are true.

Assumptions 1- Linearity in Parameters: The parameter in the linear model is linear.

Figure 2 (Image by author)

Moreover, the OLS estimator is also linear, we can rewrite the OLS closed-form solution as follows (by substituting Y from figure 1 into figure 3). Matrix algebra only works in the presence of linearity. Therefore, the linearity assumption is proven.

Figure 3 (Image by author)
Figure 4 (Image by author)

Assumptions 2- Random Sampling: The observed data represent iid Independent and Identically Distributed) random samples that follow the population model (See figure 1). If data is collected cross-sectionally, we need to make sure they are sampled randomly. The bottom line is the observed data should be representative of the population data

Assumptions 3- No Perfect Collinearity: Any explanatory variable can NOT be expressed as a linear combination of other explanatory variable(s). The reason is the inverse matrix (in figure 3) exists only if X has full rank, meaning if there is perfect collinearity, it won’t have a closed-form solution.

Assumptions 4- Zero conditional mean: Expected value of the error term is zero conditional on all values of the explanatory variable (i.e., E[ε|X] = 0)

Assumptions 5- Homoscedasticity and no Autocorrelation: The error term should have constant variance and iid. In other words, the diagonal values in the variance-covariance matrix of the error term should be constant and off-diagonal values should be all 0.

Assumptions 6- Normality of Errors: The error term is normally distributed. This assumption is not required for the validity of OLS method, but this allows us to have a reliable standard error of estimates and make meaningful statistical inferences.

β vs β^ vs E(β^)

You might have seen some variations of β (e.g., β, β^, E(β^)) in statistics textbooks. Let’s discuss their definitions and differences.

β is a conceptual value- the true (and usually unknown) parameter value(s) (i.e., constant values) which explain the relationship between the explanatory variable(s) and the dependent variable in a population data.

In most cases, we won’t be using population data because it is not available or too large to process. Therefore, we would use sample data (with a finite number of observations) to develop our linear regression model.

Under the assumption of Random Sampling, the observed sample data represent an i.i.d. random sample of size n, which follows the population model. Suppose we have multiple sets of sample data (by drawing samples from the population repeatedly) and run the model separately in each dataset.

In a given sample dataset, we would have an OLS estimator, β^, which can be solved with the closed-form solution (figure 3).

It is very likely that we would get a different set of estimators (i.e., β^) in different datasets. Therefore, β^ is a random variable, with a sampling distribution with a sample mean and variance.

E(β^) is the expected value of this random variable, β^. In layman’s terms, if we run the linear model in multiple sets of samples, keep recording the values of the estimators and take an average. The average value is the expected value, E(β^).

Unbiasedness

Under the finite-sample properties, we say OLS estimator is unbiased, meaning the expected value of OLS estimator, E(β^) would equal the true population parameter, β.

Unbiasedness does NOT imply that the OLS estimator we get from the observed data (i.e., one set of random samples) would equal the exact population parameter value because the linear model still can’t fully explain the relationship due to the irreducible error term ε.

Instead, the unbiasedness property implies that if we run the linear regression model repeatedly on different sets of random samples from the same population, then the expected value of the estimator would equal the true population parameter as proven below.

Figure 5 (Image by author)

Although the OLS estimators we get from the observed data don’t equal the exact population parameter value, as long as the observed data is a good representative of the population data and the linear model is correctly specified under the assumptions, then the coefficient estimator we get from the observed data should be very closed to the true population parameter value.

Otherwise, if observed data is NOT a good representative of the population data, the model would suffer from measurement error, or the linear model is NOT correctly specified due to common issues (e.g., omitted variables or endogeneity), then the coefficient estimator we get from the observed data would be biased.

Consistency

Under the asymptotic properties, we say OLS estimator is consistent, meaning OLS estimator would converge to the true population parameter as the sample size get larger, and tends to infinity.

From Jeffrey Wooldridge’s textbook, Introductory Econometrics, C.3, we can show that the probability limit of the OLS estimator would equal the true population parameter as the sample size gets larger if assumptions hold.

Figure 6 (Image by author)
  • When E[ε|X] = 0 holds, it implies Cov(X, u) = 0, then the second term in figure 6 equals to 0. We’ve proved that as the sample size gets larger, OLS estimator would converge to the true population parameter. Therefore OLS estimator is consistent.
  • If Cov(X, u) ≠ 0, then we have an inconsistent estimator. The inconsistent issue won’t go away as the sample size increases. At the same time, the OLS estimator is biased as well.
  • If Cov(X, u) > 0 meaning x is positively correlated with the error term, then asymptotic bias is upward.
  • If Cov(X, u) < 0 meaning x is negatively correlated with the error term, then asymptotic bias is downward.

You might be wondering why are we interested in large sample properties, such as consistency, when in practice we have finite samples.

The answer is if we can show that an estimator is consistent when the sample size gets larger, then we may be more confident and optimistic about the estimator in finite samples. On the other hand, if an estimator is inconsistent, we know that the estimator is biased in finite samples.

Efficiency

To evaluate an estimator of a linear regression model, we use its efficiency based on its bias and variance.

  • An estimator that is unbiased but does not have the minimum variance is not the best.
  • An estimator that has the minimum variance but is biased is not the best
  • An estimator that is unbiased and has the minimum variance is the best (efficient).
  • The OLS estimator is the best (efficient) estimator because OLS estimators have the least variance among all linear and unbiased estimators.
Figure 7 (Image by author)

We can prove Gauss-Markov theorem with a bit of matrix operations.

Figure 8 (Image by author)
Figure 9 (Image by author)

Now we’ve proved that the variance of OLS estimator is smaller than any other linear unbiased estimator. Therefore, OLS is the Best (efficient) linear estimator.

Final Notes

  • An estimator is unbiased if the expected value of the sampling distribution of the estimators is equal the true population parameter value.
  • An estimator is consistent if, as the sample size increases, tends to infinity, the estimates converge to the true population parameter. In other words- consistency means that, as the sample size increases, the sampling distribution of the estimator becomes more concentrated at the population parameter value and the variance becomes smaller.
  • Under OLS assumptions, OLS estimator is BLUE (least variance among all linear unbiased estimators). Therefore, it is the best (efficient) estimator.

Here are some related posts you can explore if you’re interested in this topic.

You can sign up for a membership to unlock full access to my articles, and have unlimited access to everything on Medium. Please subscribe if you’d like to get an email notification whenever I post a new article.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.
Leave a comment