Asymptotic Properties of the OLS Estimator

Một phần của tài liệu A guide to modern econometrics, 5th edition (Trang 47 - 53)

In many cases, the small sample properties of the OLS estimator may deviate from those discussed above. For example, if the error terms in the linear model𝜀ido not follow a normal distribution, it is no longer the case that the sampling distribution of the OLS estimatorbis normal. If assumption (A2) of the Gauss–Markov conditions is violated, it can no longer be shown thatbhas an expected value of𝛽. In fact, the linear regression model under the Gauss–Markov assumptions and with normal error terms is one of the very few cases in econometrics where the exact sampling distribution of the estimator is known. As soon as we relax some of these assumptions or move to alternative models, the small sample properties of the estimator are typically unknown. In such cases we use an alternative approach to evaluate the quality of an estimator, which is based on asymptotic theory. Asymptotic theory refers to the question as to what happens if, hypothetically, the sample size grows infinitely large. Asymptotically, econometric estimators usually have nice properties, like normality, and we use the asymptotic properties to approximate the properties in the finite sample that we happen to have. This section presents a first discussion of the asymptotic properties of the OLS estimator. More details are provided in Pesaran (2015, Chapter 8).

2.6.1 Consistency

Let us start with the linear model under the Gauss–Markov assumptions. In this case we know that the OLS estimatorbhas the following first two moments:

E{b} =𝛽 (2.65)

V{b} =𝜎2 ( N

i=1

xixi )−1

=𝜎2(XX)−1. (2.66)

Unless we assume that the error terms are normal, the shape of the distribution ofbis unknown. It is, however, possible to say something about the distribution ofb, at least approximately. A first starting point is the so-calledChebyshev’s inequality, which says that the probability that a random variablezdeviates more than a positive number𝛿from its mean is bounded by its variance divided by𝛿2, that is,

P{|zE{z}|> 𝛿}< V{z}

𝛿2 , for all 𝛿 >0. (2.67) For the OLS estimator this implies that itskth element satisfies

P{|bk𝛽k|> 𝛿}< V{bk}

𝛿2 = 𝜎2ckk

𝛿2 for all 𝛿 >0, (2.68)

k k where ckk, as before, is the (k, k) element in(XX)−1= (∑N

i=1xixi)−1. This inequality becomes useful if we fix𝛿at some small positive number, and then let, in our mind, the sample sizeN grow to infinity. Then what happens? It is clear that∑N

i=1xixiincreases as the number of terms grows, so that the variance ofbdecreases as the sample size increases. If we assume that16

1 N

N i=1

xixiconverges to a finite nonsingular matrixΣxx (A6) if the sample sizeNbecomes infinitely large, it follows directly from the above inequality that

N→∞lim P{|bk𝛽k|> 𝛿} =0 for all𝛿 >0. (2.69) This says that, asymptotically, the probability that the OLS estimator deviates more than𝛿 from the true parameter value is zero. We usually refer to this property as ‘the probability limit ofbis𝛽’, or ‘bconverges in probability to𝛽’, or just17

plimb=𝛽. (2.70)

Note thatbis a vector of random variables whose distribution depends onN, and𝛽is a vector of fixed (unknown) numbers. When an estimator for𝛽converges to the true value, we say that it is aconsistent estimator. Any estimator that satisfies (2.69) is a consistent estimator for𝛽, even if it is biased.

Consistencyis a large sample property and, loosely speaking, says that, if we obtain more and more observations, the probability that our estimator is some positive number away from the true value𝛽 becomes smaller and smaller. Values that bmay take that are not close to𝛽become increasingly unlikely. In many cases, one cannot prove that an estimator is unbiased, and it is possible that no unbiased estimator exists (e.g. in nonlinear or dynamic models). In these cases, a minimum requirement for an estimator to be useful is that it is consistent. We shall therefore mainly be concerned with consistency of an estimator, not with its (un)biasedness in small samples.

A useful property of probability limits (plims) is the following. If plimb=𝛽and g(.) is a continuous function, it also holds that

plimg(b) =g(𝛽). (2.71)

This guarantees that the parameterization employed is irrelevant for consistency.

For example, if s2 is a consistent estimator for 𝜎2, then s is a consistent estima- tor for 𝜎. Note that this result does not hold for unbiasedness, as E{s}2≠E{s2} (see Appendix B).

16The nonsingularity ofΣxxrequires that, asymptotically, there is no multicollinearity. The requirement that the limit is finite is a ‘regularity’ condition, which will be satisfied in most empirical applications. A sufficient condition is that the regressors are independent drawings from the same distribution with a finite variance.

Violations typically occur in time series contexts where one or more of thex-variables may be trended.

We shall return to this issue in Chapters 8 and 9.

17Unless indicated otherwise, lim and plim refer to the (probability) limit for the sample sizeNgoing to infinity (N→∞).

k k

ASYMPTOTIC PROPERTIES OF THE OLS ESTIMATOR 35

The OLS estimator is consistent under substantially weaker conditions than the Gauss–Markov assumptions employed earlier. To see this, let us write the OLS estimator as

b= (

1 N

N i=1

xixi )−1

1 N

N i=1

xiyi=𝛽+ (

1 N

N i=1

xixi )−1

1 N

N i=1

xi𝜀i. (2.72) This expression states that the OLS estimatorbequals the vector of true population coef- ficients𝛽plus a vector of estimation errors that depend upon the sample averages ofxixi andxi𝜀i. This decomposition plays a key role in establishing the properties of the OLS estimator and stresses again that this requires assumptions on𝜀iand its relation with the explanatory variables. If the sample size increases, the sample averages in (2.72) are taken over increasingly more observations. It seems reasonable to assume, and it can be shown to be true under very weak conditions,18that in the limit these sample averages converge to the corresponding population means. Then, under assumption (A6), we have

plim(b𝛽) = Σ−1xxE{xi𝜀i}, (2.73) which shows that the OLS estimator is consistent if it holds that

E{xi𝜀i} =0. (A7)

This condition simply says that the error term is mean zero and uncorrelated with any of the explanatory variables. Note that E{𝜀i|xi} =0 implies (A7), while the converse is not necessarily true.19Thus we can conclude that the OLS estimatorbis consistent for𝛽under conditions (A6) and (A7), which are much weaker than the Gauss–Markov conditions (A1)–(A4) required for unbiasedness. We shall discuss the relevance of this below.

Similarly, the least squares estimators2 for the error variance𝜎2 is consistent under conditions (A6), (A7) and (A3) (and some weak regularity conditions). The intuition is that, with bconverging to𝛽, the residualsei become asymptotically equivalent to the error terms𝜀i, so that the sample variance ofeiwill converge to the error variance𝜎2, as defined in (A3).

2.6.2 Asymptotic Normality

If the small sample distribution of an estimator is unknown, the best we can do is try to find some approximation. In most cases, one uses an asymptotic approximation (for N going to infinity) based on theasymptotic distribution. Most estimators in econo- metrics can be shown to be asymptotically normally distributed (under weak regularity conditions). By the asymptotic distribution of a consistent estimator ̂𝛽we mean the distribution of

N(̂𝛽𝛽) asN goes to infinity. The reason for the factor

N is that asymptotically ̂𝛽is equal to𝛽 with probability one for all consistent estimators. That is, ̂𝛽𝛽 has a degenerate distribution for N→∞ with all probability mass at zero.

18The result that sample averages converge to population means is provided in several versions of thelaw of large numbers(see Davidson and MacKinnon, 2004, Section 4.5 or Greene, 2012, Appendix D).

19To be precise,E{𝜀i|xi} =0 implies thatE{𝜀ig(xi)} =0 foranyfunctiong(see Appendix B).

k k If we multiply by

N and consider the asymptotic distribution of

N(̂𝛽𝛽), this will usually be a nondegenerate normal distribution. In that case

N is referred to as the rate of convergence, and it is sometimes said that the corresponding estimator is root-N- consistent. In later chapters we shall see a few cases where the rate of convergence differs from rootN.

For the OLS estimator it can be shown that under the Gauss–Markov conditions (A1)–(A4) combined with (A6) we have

√N(b𝛽)→N(0, 𝜎2Σ−1xx), (2.74) where→means ‘is asymptotically distributed as’. Thus, the OLS estimatorbis consistent and asymptotically normal (CAN), with variance–covariance matrix𝜎2Σ−1xx. In practice, where we necessarily have a finite sample, we can use this result to approximate the distribution ofbas

ba N(𝛽, 𝜎2Σ−1xxN), (2.75) where∼a means ‘is approximately distributed as’.

Because the unknown matrixΣxxwill be consistently estimated by the sample mean (1∕N)∑N

i=1xixi, this approximate distribution is estimated as ba N

⎜⎜

𝛽,s2

( N

i=1

xixi )−1⎞

⎟⎟

. (2.76)

This provides a distributional result for the OLS estimator b based upon asymptotic theory, which is approximately valid in small samples. The quality of the approximation increases as the sample size grows, and in a given application it is typically hoped that the sample size will be sufficiently large for the approximation to be reasonably accurate. Because the result in (2.76) corresponds exactly to what is used in the case of the Gauss–Markov assumptions combined with the assumption of normal error terms, it follows that all the distributional results for the OLS estimator reported above, including those fort- andF-statistics, are approximately valid,even if the errors are not normally distributed.

Because, asymptotically, atNK distributed variable converges to a standard normal one, it is not uncommon to use the critical values from a standard normal distribution (like the 1.96 at the 5% level) for all inferences, while not imposing normality of the errors. Thus, to test the hypothesis that𝛽k=𝛽k0for some given value𝛽k0,we proceed on the basis that (see (2.44))

tk= bk𝛽k0 se(bk)

approximately has a standard normal distribution (under the null), under assumptions (A1)–(A4) and (A6). Similarly, to test the multiple restrictionsR𝛽=q, we proceed on the basis that (see (2.62))

𝜉= (Rbq)V{Rb}̂ −1(Rbq)

has an approximate Chi-squared distribution withJdegrees of freedom, whereJis the number of restrictions that is tested.

k k

ASYMPTOTIC PROPERTIES OF THE OLS ESTIMATOR 37

It is possible further to relax the assumptions without affecting the validity of the results in (2.74) and (2.76). In particular, we can relax assumption (A2) to

xiand𝜀iare independent. (A8)

This condition does not rule out a dependence between xiand𝜀jforij, which is of interest for models with lagged dependent variables. Note that (A8) implies (A7). Further discussion on the asymptotic distribution of the OLS estimator and how it can be esti- mated is provided in Chapters 4 and 5.

2.6.3 Small Samples and Asymptotic Theory

The linear regression model under the Gauss–Markov conditions is one of the very few cases in which the finite sample properties of the estimator and test statistics are known.

In many other circumstances and models it is not possible or extremely difficult to derive small sample properties of an econometric estimator. In such cases, most econometricians are (necessarily) satisfied with knowing ‘approximate’ properties. As discussed above, such approximate properties are typically derived from asymptotic theory in which one considers what happens to an estimator or test statistic if the size of the sample is (hypo- thetically) growing to infinity. As a result, one expects that approximate properties based on asymptotic theory will work reasonably well if the sample size is sufficiently large.

Unfortunately, there is no unambiguous definition of what is ‘sufficiently large’.

In simple circumstances a sample size of 30 may be sufficient, whereas in more com- plicated or extreme cases a sample of 1000 may still be insufficient for the asymptotic approximation to be reasonably accurate. To obtain some idea about the small sample properties, Monte Carlo simulation studies are often performed. In a Monte Carlo study, a large number (e.g. 1000) of simulated samples are drawn from a data generating process, specified by the researcher. Each random sample is used to compute an estimator and/or a test statistic, and the distributional characteristics over the different replications are analysed.

As an illustration, consider the data generating process yi=𝛽1+𝛽2xi+𝜀i,

corresponding to the simple linear regression model. To conduct a simulation, we need to choose the distribution ofxi, or fix a set of values forxi, we need to specify the values for𝛽1and𝛽2and we need to specify the distribution of𝜀i. Suppose we consider samples of sizeN, with fixed valuesxi=1 fori=1, . . . ,N∕2 (males, say) andxi=0 otherwise (females).20 If 𝜀iNID(0,1), independently ofxi, the endogenous variable yi is also normally distributed with mean𝛽1+𝛽2xiand unit variance. Given these assumptions, a computer can easily generate a sample ofNvalues foryi. Next, we use this sample to com- pute the OLS estimator. Replicating thisRtimes, withRnewly drawn samples, produces Restimates for𝛽, b(1), . . . ,b(R), say. Assuming𝛽1=0 and𝛽2 =1, Figure 2.2 presents a histogram ofR=1000 OLS estimates for𝛽2 based on 1000 simulated samples of size N =100. Because we know that the OLS estimator is unbiased under these assumptions, we expect thatb(r)is, on average, close to the true value of 1. Moreover, from the results

20Nis taken to be an even number.

k k

0.12

0.10

0.08

0.06

0.04

0.02

0.00

0 0.5 1 1.5 2

Figure 2.2 Histogram of 1000 OLS estimates with normal density (Monte Carlo results).

in Subsection 2.3.2, and because theRreplications are generated independently, we know that the slope coefficient inb(r)is distributed as

b(r)2 ∼NID(𝛽2,c22), where𝛽2 =1 and

c22= [ N

i=1

(xix)̄ 2 ]−1

=4∕N.

The larger the number of replications, the more the histogram in Figure 2.2 will resemble the normal distribution. For ease of comparison, the normal density is also drawn.

A Monte Carlo study allows us to investigate the exact sampling distribution of an estimator or a test statistic as a function of the way in which the data are generated. This is useful in cases where one of the model assumptions (A2), (A3), (A4) or (A5) is violated and exact distributional results are unavailable. For example, a consistent estimator may exhibit small sample biases, and a Monte Carlo study may help us in identifying cases in which this small sample bias is substantial and other cases where it can be ignored. When the distribution of a test statistic is approximated on the basis of asymptotic theory, the significance level of the test (e.g. 5%) also holds approximately. The chosen level is then referred to as the nominal significance level or nominal size, while the actual probability of a type I error may be quite different (often larger). A Monte Carlo study allows us to investigate the difference between the nominal and actual significance levels. In addition, we can use a Monte Carlo experiment to analyse the distribution of a test statistic when the null hypothesis is false. This way we can investigate the power of a test. That is, what is the probability of rejecting the null hypothesis when it is actually false. For example, we may analyse the probability that the null hypothesis that𝛽2=0.5 is rejected as a function of the true value of𝛽2 (and the sample sizeN). If the true value is 0.5 this gives us the (actual) size of the test, whereas for𝛽2≠0.5 we obtain the power of the test. Finally, we can use a simulation study to analyse the properties of an estimator on the basis of a

k k

ILLUSTRATION: THE CAPITAL ASSET PRICING MODEL 39

model that deviates from the data generating process, for example a model that omits a relevant explanatory variable.

While Monte Carlo studies are useful, their results usually strongly depend upon the choices for xi, 𝛽, 𝜎2 and the sample sizeN, and therefore cannot necessarily be extrap- olated to different settings. Nevertheless, they provide interesting information about the statistical properties of an estimator or test statistic under controlled circumstances.

Fortunately, for the linear regression model the asymptotic approximation usually works quite well. As a result, for most applications it is reasonably safe to state that the OLS estimator is approximately normally distributed. More information about Monte Carlo experiments is provided in Davidson and MacKinnon (1993, Chapter 21), while a simple illustration is provided in Patterson (2000, Section 8.2).

Một phần của tài liệu A guide to modern econometrics, 5th edition (Trang 47 - 53)

Tải bản đầy đủ (PDF)

(523 trang)