Small Sample Properties of the OLS Estimator

Một phần của tài liệu A guide to modern econometrics, 5th edition (Trang 29 - 34)

2.3.1 The Gauss–Markov Assumptions

Whether or not the OLS estimator b provides a good approximation to the unknown parameter vector𝛽depends crucially upon the assumptions that are made about the dis- tribution of 𝜀i and its relation to xi. A standard case in which the OLS estimator has good properties is characterised by the Gauss–Markov conditions. Later, in Section 2.6, Chapter 4 and Section 5.1, we shall consider weaker conditions under which OLS still has some attractive properties. For now, it is important to realize that the Gauss–Markov conditions are not all strictly needed to justify the use of the ordinary least squares esti- mator. They just constitute a simple case in which the small sample properties ofbare easily derived.

For the linear regression model in (2.25), given by yi=xi𝛽+𝜀i, theGauss–Markov conditionsare

E{𝜀i} =0, i=1, . . . ,N (A1)

{𝜀1, . . . , 𝜀N}and{x1, . . . ,xN}are independent (A2)

V{𝜀i} =𝜎2, i=1, . . . ,N (A3)

cov{𝜀i, 𝜀j} =0, i,j=1, . . . ,N, ij. (A4) Assumption (A1) says that the expected value of the error term is zero, which means that,on average, the regression line should be correct. Assumption (A3) states that all error terms have the same variance, which is referred to as homoskedasticity, while assumption (A4) imposes zero correlation between different error terms. This excludes any form ofautocorrelation. Taken together, (A1), (A3) and (A4) imply that the error terms are uncorrelated drawings from a distribution with expectation zero and constant

k k variance𝜎2. Using the matrix notation introduced earlier, it is possible to rewrite these

three conditions as

E{𝜀} =0 andV{𝜀} =𝜎2IN, (2.29) whereINis theN×Nidentity matrix. This says that the covariance matrix of the vector of error terms𝜀is a diagonal matrix with𝜎2on the diagonal. Assumption (A2) implies that Xand𝜀are independent. Loosely speaking, this means that knowingX does not tell us anything about the distribution of the error terms in𝜀. This is a fairly strong assumption.

It implies that

E{𝜀|X} =E{𝜀} =0 (2.30)

and

V{𝜀|X} =V{𝜀} =𝜎2IN. (2.31) That is, the matrix of regressor valuesX does not provide any information about the expected values of the error terms or their (co)variances. The two conditions (2.30) and (2.31) combine the necessary elements from the Gauss–Markov assumptions needed for the results below to hold. By conditioning onX, we may act as ifXwere nonstochastic.

The reason for this is that the outcomes in the matrixX can be taken as given without affecting the properties of𝜀, that is, one can derive all properties conditional upon X.

For simplicity, we shall take this approach in this section and Section 2.5. Under the Gauss–Markov assumptions (A1) and (A2), the linear model can be interpreted as the conditional expectation ofyigivenxi, that is,E{yi|xi} =xi𝛽. This is a direct implication of (2.30).

2.3.2 Properties of the OLS Estimator

Under assumptions (A1)–(A4), the OLS estimatorbfor𝛽has several desirable properties.

First of all, it isunbiased. This means that, in repeated sampling, we can expect that the OLS estimator is on average equal to the true value𝛽. We formulate this asE{b} =𝛽. It is instructive to see the proof:

E{b} =E{(XX)−1Xy} =E{𝛽+ (XX)−1X𝜀}

=𝛽+E{(XX)−1X𝜀} =𝛽.

In the second step we have substituted (2.26). The final step is the essential one and follows from

E{(XX)−1X𝜀} =E{(XX)−1X}E{𝜀} =0,

because, from assumption (A2),X and 𝜀 are independent and, from (A1), E{𝜀} =0.

Note that we did not use assumptions (A3) and (A4) in the proof. This shows that the OLS estimator is unbiased as long as the error terms are mean zero and independent of all explanatory variables, even if heteroskedasticity or autocorrelation are present. We shall come back to this issue in Chapter 4. If an estimator is unbiased, this means that its probability distribution has an expected value that is equal to the true unknown parameter it is estimating.

In addition to knowing that we are, on average, correct, we would also like to make statements about how (un)likely it is to be far off in a given sample. This means we

k k

SMALL SAMPLE PROPERTIES OF THE OLS ESTIMATOR 17

would like to know the distribution ofb(around its mean𝛽). First of all, the variance of b(conditional uponX) is given by

V{b|X} =𝜎2(XX)−1=𝜎2 ( N

i=1

xixi )−1

, (2.32)

which, for simplicity, we shall denote byV{b}. TheK×KmatrixV{b}is a variance–

covariance matrix, containing the variances ofb1,b2, . . . ,bK on the diagonal, and their covariances as off-diagonal elements. The proof is fairly easy and goes as follows:

V{b} =E{(b𝛽)(b𝛽)} =E{(XX)−1X𝜀𝜀X(XX)−1}

= (XX)−1X(𝜎2IN)X(XX)−1=𝜎2(XX)−1. Without using matrix notation the proof goes as follows:

V{b} =V

{ (∑

i

xixi )−1

i

xi𝜀i

}

= (∑

i

xixi )−1

V {∑

i

xi𝜀i

} (∑

i

xixi )−1

= (∑

i

xixi )−1

𝜎2 (∑

i

xixi ) (∑

i

xixi )−1

=𝜎2 (∑

i

xixi )−1

. (2.33)

This requires assumptions (A1)–(A4).

The last result is collected in the Gauss–Markov theorem, which says that under assumptions (A1)–(A4) the OLS estimatorbis thebest linear unbiased estimatorfor𝛽. In short we say thatbis BLUE for𝛽. To appreciate this result, consider the class of linear unbiased estimators. A linear estimator is a linear function of the elements inyand can be written as =Ay, whereAis aK×Nmatrix. The estimator is unbiased ifE{Ay} =𝛽. (Note that the OLS estimator is obtained for A= (XX)−1X.) Then the theorem states that the difference between the covariance matrices of=Ayand the OLS estimatorbis always positive semi-definite. What does this mean? Suppose we are interested in some linear combination of𝛽 coefficients, given byd𝛽, wheredis aK-dimensional vector.

Then the Gauss–Markov result implies that the variance of the OLS estimatordbford𝛽 is not larger than the variance of any other linear unbiased estimatordb, that is,̃

V{db}̃V{db}for any vectord. As a special case this holds for thekth element and we have

V{k}≥V{bk}.

Thus, under the Gauss–Markov assumptions, the OLS estimator is the most accurate (linear) unbiased estimator for𝛽. More details on the Gauss–Markov result can be found in Greene (2012, Section 4.3).

To estimate the variance ofbwe need to replace the unknown error variance𝜎2with an estimate. An obvious candidate is the sample variance of theresidualsei=yixib, that is,

̃ s2 = 1

N−1

N i=1

e2i (2.34)

k k (recalling that the average residual is zero). However, becauseei is different from 𝜀i,

it can be shown that this estimator is biased for𝜎2. An unbiased estimator is given by s2= 1

NK

N i=1

e2i. (2.35)

This estimator has a degrees of freedom correction as it divides by the number of obser- vations minus the number of regressors (including the intercept). An intuitive argument for this is thatK parameters were chosen so as to minimize the residual sum of squares and thus to minimize the sample variance of the residuals. Consequently,2is expected to underestimate the variance of the error term𝜎2. The estimators2, with a degrees of free- dom correction, is unbiased under assumptions (A1)–(A4); see Greene (2012, Section 4.3) for a proof. The variance ofbcan thus be estimated by

V{b} =s2(XX)−1 =s2 ( N

i=1

xixi )−1

. (2.36)

The estimated variance of an elementbkis given bys2ckk, whereckkis the(k,k)element in (Σixixi)−1. The square root of this estimated variance is usually referred to as thestandard errorofbk. We shall denote it as se(bk). It is theestimatedstandard deviation ofbkand is a measure for the accuracy of the estimator. Under assumptions (A1)–(A4), it holds that se(bk) =s√ckk. When the error terms are not homoskedastic or exhibit autocorrelation, the standard error of the OLS estimatorbkwill have to be computed in a different way (see Chapter 4).

In general the expression for the estimated covariance matrix in (2.36) does not allow derivation of analytical expressions for the standard error of a single elementbk. As an illustration, however, let us consider the regression model with two explanatory variables and a constant:

yi=𝛽1+𝛽2xi2+𝛽3xi3+𝜀i.

In this case it is possible to derive that the variance of the OLS estimatorb2 for𝛽2 is given by

V{b2} = 𝜎2 1−r232

[ N

i=1

(xi2−2)2 ]−1

,

wherer23is the sample correlation coefficient betweenxi2 andxi3,and2 is the sample average ofxi2.We can rewrite this as

V{b2} = 𝜎2 1−r2

23

1 N

[ 1 N

N i=1

(xi2−2)2 ]−1

. (2.37)

This shows that the variance ofb2 is driven by four elements. First, the term in square brackets denotes the sample variance ofx2: more variation in the regressor values leads to a more accurate estimator. Second, the term N1 is inversely related to the sample size:

having more observations increases precision. Third, the larger the error variance𝜎2, the larger the variance of the estimator. A low value for𝜎2 implies that observations

k k

SMALL SAMPLE PROPERTIES OF THE OLS ESTIMATOR 19

are typically close to the regression line, which obviously makes it easier to estimate it.

Finally, the variance is driven by the correlation between the regressors. The variance of b2is inflated if the correlation betweenxi2andxi3is high (either positive or negative). In the extreme case wherer23=1 or−1,xi2andxi3are perfectly correlated and the above variance becomes infinitely large. This is the case of perfect collinearity, and the OLS estimator in (2.7) cannot be computed (see Section 2.8).

Assumptions (A1)–(A4) state that the error terms 𝜀i are mutually uncorrelated, are independent of X, have zero mean and have a constant variance, but do not specify the shape of the distribution. For exact statistical inference from a given sample ofN observations, explicit distributional assumptions have to be made.5 The most common assumption is that the errors are jointly normally distributed.6In this case the uncorrelat- edness of (A4) is equivalent to independence of all error terms. The precise assumption is as follows:

𝜀N(0, 𝜎2IN), (A5)

saying that the vector of error terms 𝜀has anN-variate normal distribution with mean vector 0 and covariance matrix𝜎2IN. Assumption (A5) thus replaces (A1), (A3) and (A4).

An alternative way of formulating (A5) is

𝜀iNID(0, 𝜎2), (A5) which is a shorthand way of saying that the error terms𝜀iare independent drawings from a normal distribution (‘normally and independently distributed’, or n.i.d.) with mean zero and variance𝜎2. Even though error terms are unobserved, this does not mean that we are free to make any assumption we like. For example, if error terms are assumed to follow a normal distribution, this means thatyi (for given values ofxi) also follows a normal distribution. Clearly, we can think of many variables whose distribution (conditional upon a given set ofxivariables) is not normal, in which case the assumption of normal error terms is inappropriate. Fortunately, not all assumptions are equally crucial for the validity of the results that follow and, moreover, the majority of the assumptions can be tested empirically; see Chapters 3, 4 and 6.

To make things simpler, let us consider the X matrix as fixed and deterministic or, alternatively, let us work conditionally upon the outcomesX. Then the following result holds. Under assumptions (A2) and (A5) the OLS estimator bis normally distributed with mean vector𝛽and covariance matrix𝜎2(XX)−1, that is,

bN(𝛽, 𝜎2(XX)−1). (2.38)

The proof of this follows directly from the result that bis a linear combination of all 𝜀i and is omitted here. The result in (2.38) implies that each element inbis normally distributed, for example

bkN(𝛽k, 𝜎2ckk), (2.39) where, as before,ckkis the (k, k) element in(XX)−1. These results provide the basis for statistical tests based upon the OLS estimatorb.

5Later we shall see that for approximate inference in large samples this is not necessary.

6The distributions used in this text are explained in Appendix B.

k k 2.3.3 Example: Individual Wages (Continued)

Let us now turn back to our wage example. We can formulate a (fairly trivial) econometric model as

wagei=𝛽1+𝛽2malei+𝜀i,

wherewageidenotes the hourly wage rate of individualiandmalei=1 ifiis male and 0 otherwise. Imposing thatE{𝜀i} =0 andE{𝜀i|malei} =0 gives𝛽1the interpretation of the expected wage rate for females, whileE{wagei|malei=1} =𝛽1+𝛽2is the expected wage rate for males. Thus,𝛽2is the expected wage differential between an arbitrary male and female. These parameters are unknown population quantities, and we may wish to estimate them. Assume that we have a random sample, implying that different observa- tions are independent. Also assume that𝜀iis independent of the regressors, in particular, that the variance of𝜀idoes not depend upon gender(malei). Then the OLS estimator for𝛽 is unbiased and its covariance matrix is given by (2.32). The estimation results are given in Table 2.1. In addition to the OLS estimates, identical to those presented before, we now also know something about the accuracy of the estimates, as reflected in the reported stan- dard errors. We can now say that our estimate of the expected hourly wage differential𝛽2

between males and females is $1.17 with a standard error of $0.11. Combined with the normal distribution, this allows us to make statements about𝛽2. For example, we can test the hypothesis that𝛽2=0. If this hypothesis is true, the wage differential between males and females in our sample is nonzero only by chance. Section 2.5 discusses how to test hypotheses regarding𝛽.

Một phần của tài liệu A guide to modern econometrics, 5th edition (Trang 29 - 34)

Tải bản đầy đủ (PDF)

(523 trang)