Sampling Distribution of the OLS Estimator- 123docz.net

Assumption 3: Large Outliers Are Unlikely

4.3 Sampling Distribution of the OLS Estimator

In this appendix, we show that the OLS estimator bn1 is unbiased and, in large samples, has the normal sampling distribution given in Key Concept 4.4.

Representation of b n1 in Terms of the Regressors and Errors

We start by providing an expression for bn1 in terms of the regressors and errors. Because Yi = b0 + b1Xi+ ui, Yi - Y = b1(Xi - X) + ui - u, so the numerator of the formula for bn1 in Equation (4.25) is

i=11Xi- X21Yi - Y2 = a

i=11Xi - X23b11Xi - X2 + 1ui - u24

= b1a

i=11Xi - X22+ a

i=11Xi - X21ui - u2. (4.27) Now gni=11Xi - X21ui- u) = gni=1(Xi - X2ui - gni=11Xi - X2u = gni=11Xi- X2ui, where the final equality follows from the definition of X, which implies that gni=11Xi- X2u = 1gni=1Xi - nX2u = 0. Substituting gni=11Xi- X21ui- u2 = gni=11Xi- X2ui into the final expression in Equation (4.27) yields gni=11Xi - X21Yi - Y2 = b1gni=11Xi- X22 + gni=11Xi - X2ui. Substituting this expression in turn into the formula for bn1 in Equation (4.25) yields

bn1 = b1+ 1 na

i=11Xi - X2ui

1 na

i=11Xi - X22

. (4.28)

M04_STOC4455_04_GE_C04.indd 173 27/11/18 4:08 PM

Proof That b n1 Is Unbiased

The argument that bn1 is unbiased under the first least squares assumption uses the law of iterated expectations [Equation (2.20)]. First, obtain E(bn1X1, c, Xn) by taking the conditional expectation of both sides of Equation (4.28):

E1bn1X1, c, Xn2 = b1 + E ≥ 1 na

i=11Xi- X2ui 1

i=11Xi - X22

4 X1, c, Xn¥

= b1 + 1 na

i=11Xi- X2E1uiXi, c, Xn2 1

i=11Xi - X22

. (4.29)

By the second least squares assumption, ui is distributed independently of X for all observations other than i, so E1uiX1, c, Xn2 = E1uiXi2. By the first least squares assumption, however, E1uiXi2 = 0. Thus the second term in the final line of Equation (4.29) is 0, from which it follows that E1bn1X1, c, Xn2 = b1.

Because bn1 is unbiased given X1, c, Xn, it is unbiased after averaging over all samples X1, c, Xn. Thus the unbiasedness of bn1 follows Equation (4.29) and the law of iterated expectations: E1bn12 = E3E1bn1X1, c, Xn24 = b1.

Large-Sample Normal Distribution of the OLS Estimator

The large-sample normal approximation to the limiting distribution of bn1 (Key Concept 4.4) is obtained by considering the behavior of the final term in Equation (4.28).

First, consider the numerator of this term. Because X is consistent, if the sample size is large, X is nearly equal to mX. Thus, to a close approximation, the term in the numerator of Equation (4.28) is the sample average n, where vi = 1Xi - mX2ui. By the first least squares assumption, vi has a mean of 0. By the second least squares assumption, vi is i.i.d. The variance of vi is s2v = 3var1Xi - mX2ui4, which, by the third least squares assumption, is nonzero and finite. Therefore, v satisfies all the requirements of the central limit theorem (Key Concept 2.7).

Thus v>sv is, in large samples, distributed N(0, 1), where s2v = s2v>n. Therefore the distribution of v is well approximated by the N10, s2v>n2 distribution.

Next consider the expression in the denominator in Equation (4.28); this is the sample variance of X (except dividing by n rather than n - 1, which is inconsequential if n is large). As discussed in Section 3.2 [Equation (3.8)], the sample variance is a consistent estimator of the population variance, so in large samples it is arbitrarily close to the population variance of X.

Combining these two results, we have that, in large samples, bn1 - b1 ≅ v>var1Xi2, so that the sampling distribution of bn1 is, in large samples, N1b1, sb212, where s2b1 = var1v2>3var1Xi242 = var31Xi - mX2ui4>5n3var1Xi2426, which is the expression in Equation (4.19).

M04_STOC4455_04_GE_C04.indd 174 27/11/18 4:08 PM

Sampling Distribution of the OLS Estimator 175

Some Additional Algebraic Facts About OLS

The OLS residuals and predicted values satisfy 1 na

i=1uni = 0, (4.30)

1 na

i=1Yni= Y, (4.31) a

i=1uniXi = 0 and suX = 0, and (4.32) TSS = SSR+ ESS. (4.33) Equations (4.30) through (4.33) say that the sample average of the OLS residuals is 0; the sample average of the OLS predicted values equals Y; the sample covariance suX between the OLS residuals and the regressors is 0; and the total sum of squares is the sum of squared residuals and the explained sum of squares. [The ESS, TSS, and SSR are defined in Equations (4.12), (4.13), and (4.15).]

To verify Equation (4.30), note that the definition of bn0 lets us write the OLS residuals as uni= Yi - bn0 - bn1Xi= 1Yi - Y2 - bn11Xi - X2; thus

i=1uni= a

i=11Yi- Y2 - bn1a

i=11Xi - X2.

But the definitions of Y and X imply that gni=1(Yi- Y) = 0 and gni=1(Xi - X) = 0, so gni=1uni = 0.

To verify Equation (4.31), note that Yi= Yni + uni, so gni=1Yi = gni=1Yni + gni=1un1 = gni=1Yni, where the second equality is a consequence of Equation (4.30).

To verify Equation (4.32), note that gni=1uni = 0 implies gni=1uniXi = gni=1uni(Xi - X), so a

i=1uniXi = a

i=13(Yi - Y) - bn1(Xi - X)4(Xi- X)

= a

i=1(Yi - Y)(Xi- X) - bn1a

i=1(Xi- X)2 = 0, (4.34) where the final equality in Equation (4.34) is obtained using the formula for bn1 in Equation (4.25). This result, combined with the preceding results, implies that suX = 0.

Equation (4.33) follows from the previous results and some algebra:

TSS = a

i=1(Yi - Y)2 = a

i=1(Yi - Yni + Yni- Y)2

= a

i=1(Yi - Yni)2+ a

i=1(Yni - Y)2 + 2a

i=1(Yi - Yni)(Yni - Y)

= SSR + ESS + 2a

i=1uniYni= SSR + ESS, (4.35) where the final equality follows from gni=1uniYni= gni=1uni(bn0 + bn1Xi) = bn0gni=1uni+ bn1gni=1uniXi= 0 by the previous results.

M04_STOC4455_04_GE_C04.indd 175 27/11/18 4:08 PM

A P P E N D I X

4.4 The Least Squares Assumptions for Prediction

Section 4.4 provides the least squares assumptions for estimation of a causal effect. There is a parallel set of least squares assumptions for prediction. The difference between the two stems from the difference between the two problems. For estimation of a causal effect, X must be randomly assigned or as-if randomly assigned, which leads to least squares assumption 1 in Key Concept 4.3. In contrast, as discussed in Section 4.3, the goal of prediction is to provide accurate out-of-sample predictions. To do so, the estimated regression line must be relevant to the observation being predicted. This is the case if the data used for estimation and the observation being predicted are drawn from the same population distribution.

For example, return to the superintendent’s and father’s problems. The superintendent is interested in the causal effect on TestScore of a change in STR. Ideally, to answer her question we would have data from an experiment in which students were randomly assigned to different size classes. Absent such an experiment, she may or may not be satisfied with the regression of TestScore on STR using California data—that depends on whether least squares assumption 1 is satisfied where b1 is defined to be the causal effect.

In contrast, the father is interested in predicting test scores in a California district that did not report its test scores, so for his purposes he is interested in the population regression line relating TestScore and STR in California, the slope of which may or may not be the causal effect.

To make this precise, we introduce some additional notation. Let (Xoos,Yoos) denote the out-of-sample (“oos”) observation for which the prediction is to be made, and continue to let 1Xi, Yi 2, i = 1, c, n, be the data used to estimate the regression coefficients. The least squares assumptions for prediction are

E1YX2 = b0 + b1X and u = Y - E1YX2, where

1. 1Xoos,Yoos2 are randomly drawn from the same population distribution as 1Xi, Yi 2, i = 1, c, n;

2. 1Xi, Yi 2, i = 1, c, n, are independent and identically distributed (i.i.d.) draws from their joint distribution; and

3. Large outliers are unlikely: Xi and Yi have nonzero finite fourth moments.

There are two differences between these assumptions and the assumptions in Key Concept 4.3. The first is the definition of b1. The best predictor is given by E1YX2 (where the best predictor is defined in terms of the mean squared prediction error; see Appendix 2.2).

With the assumption of linearity, for prediction b1 is defined to be the slope of this conditional expectation, which may or may not be the causal effect. Second, because the regression line is estimated using in-sample observations but is used to predict an out-of-sample observation, the first assumption is that these are drawn from the same population.

The second and third assumptions are the same as those for estimation of causal effects in Section 4.4. They ensure that the OLS estimators are consistent for the coefficients of the population prediction line and are normally distributed when n is large.

M04_STOC4455_04_GE_C04.indd 176 27/11/18 4:08 PM

The Least Squares Assumptions for Prediction 177 Under the least squares assumptions for prediction, the OLS predicted value of Yoos is unbiased:

E1YnoosXoos = xoos2 = E1bn0 + bn0 XoosXoos = xoos2

= E1bn02 + E1bn12xoos (4.36) where the second equality follows because 1Xoos,Yoos2 are independent of the observations used to compute the OLS estimators. For the prediction problem, u is defined to be u = Y - E1YX2, so by definition E1uiXi2 = 0 and the algebra in Appendix 4.3 applies directly. Thus E1bn02 + E1bn12xoos = b0 + b1xoos = E1YoosXoos = xoos2. Combining this expression with the first expression in Equation (4.36), we have that E1Yoos - YnoosXoos = xoos2 = 0; that is, the OLS prediction is unbiased.

The least squares assumptions for prediction also ensure that the regression SER esti- mates the variance of the out-of-sample prediction error, unoos = Yoos - Ynoos. To show this, it is useful to write the out-of-sample prediction error as the sum of two terms: the error that would be made were the regression coefficients known and the error made by needing to estimate t h e m . Wr i t e unoos = Yoos - 1bn0 + bn1Xoos2 = b0 + b1Xoos + uoos - 1bn0 + bn1Xoos2= uoos - 31bn0 - b02 + 1bn1 - b12Xoos4. Thus var1unoos2 = var1uoos2 + var1bn0 + bn1Xoos2 (Exercise 4.15). The second term in this final expression is the contribution of the estimation error to the out-of-sample prediction error. When the sample size is large, the first term in this final expression is much larger than the second term. Because the in- and out-of-sample observations are from the same population, var1uoos2 = var1ui2 = s2u, so the standard deviation of unoos is estimated by the SER.

M04_STOC4455_04_GE_C04.indd 177 27/11/18 4:08 PM

178

This chapter continues the treatment of linear regression with a single regressor.

Chapter 4 explained how the OLS estimator bn1 of the slope coefficient b1 differs from one sample to the next—that is, how bn1 has a sampling distribution. In this chapter, we show how knowledge of this sampling distribution can be used to make state- ments about b1 that accurately summarize the sampling uncertainty. The starting point is the standard error of the OLS estimator, which measures the spread of the sampling distribution of bn1. Section 5.1 provides an expression for this standard error (and for the standard error of the OLS estimator of the intercept) and then shows how to use bn1 and its standard error to test hypotheses. Section 5.2 explains how to construct confi- dence intervals for b1. Section 5.3 takes up the special case of a binary regressor.

Sections 5.1 through 5.3 assume that the three least squares assumptions of Key Concept 4.3 hold. If, in addition, some stronger technical conditions hold, then some stronger results can be derived regarding the distribution of the OLS estimator. One of these stronger conditions is that the errors are homoskedastic, a concept introduced in Section 5.4. Section 5.5 presents the Gauss–Markov theorem, which states that, under certain conditions, OLS is efficient (has the smallest variance) among a certain class of estimators. Section 5.6 discusses the distribution of the OLS estimator when the population distribution of the regression errors is normal.

Sampling Distribution of the OLS Estimator

Expected Values, Mean, and Variance

The Normal, Chi-Squared, Student t, and