foundations of econometrics phần 5 ppsx

Bootstrap Tests for Serial Correlation Whenever the regression function is nonlinear or contains lagged dependentvariables, or whenever the distribution of the error terms is unknown, no

Trang 1

7.7 Testing for Serial Correlation 279

the alternative that ρ > 0 An investigator will reject the null hypothesis if

d < d L , fail to reject if d > d U , and come to no conclusion if d L < d < d U

For example, for a test at the 05 level when n = 100 and k = 8, including the constant term, the bounding critical values are d L = 1.528 and d U = 1.826 Therefore, one would reject the null hypothesis if d < 1.528 and not reject it

if d > 1.826 Notice that, even for this not particularly small sample size, the

indeterminate region between 1.528 and 1.826 is quite large

It should by now be evident that the Durbin-Watson statistic, despite itspopularity, is not very satisfactory Using it with standard tables is relativelycumbersome and often yields inconclusive results Moreover, the standardtables only allow us to perform one-tailed tests against the alternative that

ρ > 0 Since the alternative that ρ < 0 is often of interest as well, the inability

to perform a two-tailed test, or a one-tailed test against this alternative, using

standard tables is a serious limitation Although exact P values for both tailed and two-tailed tests, which depend on the X matrix, can be obtained

one-by using appropriate software, many computer programs do not offer thiscapability In addition, the DW statistic is not valid when the regressorsinclude lagged dependent variables, and it cannot easily be generalized to testfor higher-order processes Happily, the development of simulation-based testshas made the DW statistic obsolete

Monte Carlo Tests for Serial Correlation

We discussed simulation-based tests, including Monte Carlo tests and strap tests, at some length in Section 4.6 The techniques discussed there canreadily be applied to the problem of testing for serial correlation in linear andnonlinear regression models

boot-All the test statistics we have discussed, namely, tGNR, tSR, and d, are pivotal under the null hypothesis that ρ = 0 when the assumptions of the classical

normal linear model are satisfied This makes it possible to perform MonteCarlo tests that are exact in finite samples Pivotalness follows from twoproperties shared by all these statistics The first of these is that they dependonly on the residuals ˜u t obtained by estimation under the null hypothesis.The distribution of the residuals depends on the exogenous explanatory vari-

ables X, but these are given and the same for all DGPs in a classical normal linear model The distribution does not depend on the parameter vector β of the regression function, because, if y = Xβ + u, then M X y = M X u whatever the value of the vector β.

The second property that all the statistics we have considered share is scaleinvariance By this, we mean that multiplying the dependent variable by

an arbitrary scalar λ leaves the statistic unchanged In a linear regression model, multiplying the dependent variable by λ causes the residuals to be multiplied by λ But the statistics defined in (7.51), (7.52), and (7.53) are

clearly unchanged if all the residuals are multiplied by the same constant, and

so these statistics are scale invariant Since the residuals ˜u are equal to M X u,

Trang 2

it follows that multiplying σ by an arbitrary λ multiplies the residuals by λ Consequently, the distributions of the statistics are independent of σ2 as well

as of β This implies that, for the classical normal linear model, all three

statistics are pivotal

We now outline how to perform Monte Carlo tests for serial correlation in thecontext of the classical normal linear model Let us call the test statistic we

are using τ and its realized value ˆ τ If we want to test for AR(1) errors, the best choice for the statistic τ is the t statistic tGNR from the GNR (7.43), but

it could also be the DW statistic, the t statistic tSRfrom the simple regression(7.46), or even ˜ρ itself If we want to test for AR(p) errors, the best choice for τ would be the F statistic from the GNR (7.45), but it could also be the

F statistic from a regression of ˜ u t on ˜u t−1 through ˜u t−p

The first step, evidently, is to compute ˆτ The next step is to generate B sets

of simulated residuals and use each of them to compute a simulated test

statistic, say τ ∗

j , for j = 1, , B Because the parameters do not matter,

we can simply draw B vectors u ∗

j from the N (0, I) distribution and regress each of them on X to generate the simulated residuals M X u ∗

j, which are then

used to compute τ ∗

j This can be done very inexpensively The final step is to

calculate an estimated P value for whatever null hypothesis is of interest For example, for a two-tailed test of the null hypothesis that ρ = 0, the P value would be the proportion of the τ ∗

j that exceed ˆτ in absolute value:

We would then reject the null hypothesis at level α if ˆ p ∗(ˆτ ) < α As we saw

in Section 4.6, such a test will be exact whenever B is chosen so that α(B + 1)

is an integer

Bootstrap Tests for Serial Correlation

Whenever the regression function is nonlinear or contains lagged dependentvariables, or whenever the distribution of the error terms is unknown, none ofthe standard test statistics for serial correlation will be pivotal Nevertheless,

it is still possible to obtain very accurate inferences, even in quite small ples, by using bootstrap tests The procedure is essentially the one described

sam-in the previous subsection We still generate B simulated test statistics and use them to compute a P value according to (7.54) or its analog for a one-

tailed test For best results, the test statistic used should be asymptotically

valid for the model that is being tested In particular, we should avoid d and

tSR whenever there are lagged dependent variables

It is extremely important to generate the bootstrap samples in such a way thatthey are compatible with the model under test Ways of generating bootstrapsamples for regression models were discussed in Section 4.6 If the model

Trang 3

7.7 Testing for Serial Correlation 281

is nonlinear or includes lagged dependent variables, we need to generate y ∗

j

rather than just u ∗

j For this, we need estimates of the parameters of theregression function If the model includes lagged dependent variables, wemust generate the bootstrap samples recursively, as in (4.66) Unless we aregoing to assume that the error terms are normally distributed, we shoulddraw the bootstrap error terms from the EDF of the residuals for the modelunder test, after they have been appropriately rescaled Recall that there ismore than one way to do this The simplest approach is just to multiply each

Heteroskedasticity-Robust Tests

The tests for serial correlation that we have discussed are based on the tion that the error terms are homoskedastic When this crucial assumption isviolated, the asymptotic distributions of all the test statistics will differ fromwhatever distributions they are supposed to follow asymptotically However,

assump-as we saw in Section 6.8, it is not difficult to modify GNR-bassump-ased tests to makethem robust to heteroskedasticity of unknown form

Suppose we wish to test the linear regression model (7.42), in which the errorterms are serially uncorrelated, against the alternative that the error terms

follow an AR(p) process Under the assumption of homoskedasticity, we could simply run the GNR (7.45) and use an asymptotic F test If we let Z denote

an n × p matrix with typical element Z ti = ˜u t−i, where any missing laggedresiduals are replaced by zeros, this GNR can be written as

˜

u = Xb + Zc + residuals (7.55)

The ordinary F test for c = 0 in (7.55) is not robust to heteroskedasticity, but

a heteroskedasticity-robust test can easily be computed using the proceduredescribed in Section 6.8 This procedure works as follows:

1 Create the matrices ˜UX and ˜ UZ by multiplying the tthrow of X and the tthrow of Z by ˜ u t for all t.

2 Create the matrices ˜U −1 X and ˜ U −1 Z by dividing the tthrow of X and the tthrow of Z by ˜ u t for all t.

3 Regress each of the columns of ˜U −1 X and ˜ U −1 Z on ˜ UX and ˜ UZ jointly.

Save the resulting matrices of fitted values and call them ¯X and ¯ Z,

respectively

Trang 4

4 Regress ι, a vector of 1s, on ¯ X Retain the sum of squared residuals from this regression, and call it RSSR Then regress ι on ¯ X and ¯ Z jointly,

retain the sum of squared residuals, and call it USSR

5 Compute the test statistic RSSR − USSR, which will be asymptotically distributed as χ2(p) under the null hypothesis.

Although this heteroskedasticity-robust test is asymptotically valid, it willnot be exact in finite samples In principle, it should be possible to obtain

more reliable results by using bootstrap P values instead of asymptotic ones.

However, none of the methods of generating bootstrap samples for regressionmodels that we have discussed so far (see Section 4.6) is appropriate for amodel with heteroskedastic error terms Several methods exist, but they arebeyond the scope of this book, and there currently exists no method that wecan recommend with complete confidence; see Davison and Hinkley (1997)and Horowitz (2001)

Other Tests Based on OLS Residuals

The tests for serial correlation that we have discussed in this section are by

no means the only scale-invariant tests based on least squares residuals thatare regularly encountered in econometrics Many tests for heteroskedasticity,skewness, kurtosis, and other deviations from the NID assumption also havethese properties For example, consider tests for heteroskedasticity based

on regression (7.28) Nothing in that regression depends on y except for the

squared residuals that constitute the regressand Further, it is clear that both

the F statistic for the hypothesis that b γ = 0 and n times the centered R2 are

scale invariant Therefore, for a classical normal linear model with X and Z

fixed, these statistics are pivotal Consequently, Monte Carlo tests based on

them, in which we draw the error terms from the N (0, 1) distribution, are

exact in finite samples

When the normality assumption is not appropriate, we have two options Ifsome other distribution that is known up to a scale parameter is thought to be

appropriate, we can draw the error terms from it instead of from the N (0, 1)

distribution If the assumed distribution really is the true one, we obtain

an exact test Alternatively, we can perform a bootstrap test in which theerror terms are obtained by resampling the rescaled residuals This is alsoappropriate when there are lagged dependent variables among the regressors.The bootstrap test will not be exact, but it should still perform well in finitesamples no matter how the error terms actually happen to be distributed

7.8 Estimating Models with Autoregressive Errors

If we decide that the error terms of a regression model are serially correlated,either on the basis of theoretical considerations or as a result of specification

Trang 5

7.8 Estimating Models with Autoregressive Errors 283testing, and we are confident that the regression function itself is not misspec-ified, the next step is to estimate a modified model which takes account ofthe serial correlation The simplest such model is (7.40), which is the originalregression model modified by having the error terms follow an AR(1) process.For ease of reference, we rewrite (7.40) here:

y t = X t β + u t , u t = ρu t−1 + ε t , ε t ∼ IID(0, σ ε2) (7.56)

In many cases, as we will discuss in the next section, the best approach mayactually be to specify a more complicated, dynamic, model for which theerror terms are not serially correlated In this section, however, we ignore thisimportant issue and simply discuss how to estimate the model (7.56) undervarious assumptions

Estimation by Feasible GLS

We have seen that, if the u t follow a stationary AR(1) process, that is, if

|ρ| < 1 and Var(u1) = σ2

u = σ2

ε /(1 − ρ2), then the covariance matrix of

the entire vector u is the n × n matrix Ω(ρ) given in (7.32) In order to compute GLS estimates, we need to find a matrix Ψ with the property that

Ψ Ψ > = Ω −1 This property will be satisfied whenever the covariance matrix

of Ψ > u is proportional to the identity matrix, which it will be if we choose Ψ

in such a way that Ψ > u = ε.

For t = 2, , n, we know from (7.29) that

and this allows us to construct the rows of Ψ >except for the first row The

tthrow must have 1 in the tthposition, −ρ in the (t − 1)st position, and 0severywhere else

For the first row of Ψ >, however, we need to be a little more careful Under

the hypothesis of stationarity of u, the variance of u1 is σ2

u Further, since

the ε t are innovations, u1 is uncorrelated with the ε t for t = 2, , n Thus,

if we define ε1 by the formula

ε1 = (σ ε /σ u )u1 = (1 − ρ2)1/2 u1, (7.58)

it can be seen that the n vector ε, with the first component ε1 defined

by (7.58) and the remaining components ε t defined by (7.57), has a

covar-iance matrix equal to σ2

εI

Putting together (7.57) and (7.58), we conclude that Ψ >should be defined

as an n × n matrix with all diagonal elements equal to 1 except for the first, which is equal to (1 − ρ2)1/2, and all other elements equal to 0 except for

Trang 6

the ones on the diagonal immediately below the principal diagonal, which are

equal to −ρ In terms of Ψ rather than of Ψ >, we have:

the transformation for the first observation would involve taking the squareroot of a negative number Unfortunately, the estimator ˜ρ is not guaranteed

to satisfy the stationarity condition, although, in practice, it is very likely to

do so when the model is correctly specified, even if the true value of ρ is quite

large in absolute value

Whether ρ is known or estimated, the next step in GLS estimation is to form the vector Ψ > y and the matrix Ψ > X It is easy to do this without having to store the n × n matrix Ψ in computer memory The first element of Ψ > y is (1 − ρ2)1/2 y1, and the remaining elements have the form y t − ρy t−1 Each

column of Ψ > X has precisely the same form as Ψ > y and can be calculated in

precisely the same way

The final step is to run an OLS regression of Ψ > y on Ψ > X This regression

yields the (feasible) GLS estimates

ˆ

βGLS= (X > Ψ Ψ > X) −1 X > Ψ Ψ > y (7.60)

along with the estimated covariance matrix

dVar( ˆβGLS) = s2(X > Ψ Ψ > X) −1 , (7.61)

where s2 is the usual OLS estimate of the variance of the error terms Ofcourse, the estimator (7.60) is formally identical to (7.04), since (7.60) is valid

for any Ψ matrix.

Trang 7

7.8 Estimating Models with Autoregressive Errors 285Estimation by Nonlinear Least Squares

If we ignore the first observation, then (7.56), the linear regression modelwith AR(1) errors, can be written as the nonlinear regression model (7.41).Since the model (7.41) is written in such a way that the error terms are inno-vations, NLS estimation is consistent whether the explanatory variables areexogenous or merely predetermined NLS estimates can be obtained by anystandard nonlinear minimization algorithm of the type that was discussed

in Section 6.4, where the function to be minimized is SSR(β, ρ), the sum of squared residuals for observations 2 through n Such procedures generally

work well, and they can also be used for models with higher-order sive errors; see Exercise 7.17 However, some care must be taken to ensurethat the algorithm does not terminate at a local minimum which is not alsothe global minimum There is a serious risk of this, especially for models withlagged dependent variables among the regressors.2

autoregres-Whether or not there are lagged dependent variables in X t, a valid estimatedcovariance matrix can always be obtained by running the GNR (6.67), whichcorresponds to the model (7.41), with all variables evaluated at the NLSestimates ˆβ and ˆ ρ This GNR is

u1> (X − ˆ ρX1) uˆ1> uˆ1

#−1

where the n×k matrix X1has typical row X t−1, and the vector ˆu1has typical

element y t−1 − X t−1 β This is the estimated covariance matrix that a goodˆ

nonlinear regression package should print The first factor in (7.63) is just

the NLS estimate of σ2

ε The SSR is divided by n − k − 2 because there are

k + 1 parameters in the regression function, one of which is ρ, and we estimate using only n − 1 observations.

It is instructive to compute the limit in probability of the matrix (7.63) when

n → ∞ for the case in which all the explanatory variables in X t are exogenous.The parameters are all estimated consistently by NLS, and so the estimates

converge to the true parameter values β0, ρ0, and σ2

ε as n → ∞ In computing

the limit of the denominator of the simple estimator ˜ρ given by (7.47), we saw that n −1 uˆ1> uˆ1 tends to σ2

ε /(1 − ρ2

0) The limit of n −1 (X − ˆ ρX1)> uˆ1 is the

2 See Dufour, Gaudry, and Liem (1980) and Betancourt and Kelejian (1981).

Trang 8

same as that of n −1 (X − ρ0X1)> uˆ1 by the consistency of ˆρ In addition, given the exogeneity of X, and thus also of X1, it follows at once from the law of

large numbers that n −1 (X − ρ0X1)> uˆ1 tends to zero Thus, in this special

case, the asymptotic covariance matrix of n 1/2( ˆβ − β0) and n 1/2(ˆρ − ρ0) is

exo-iance matrix will just be (7.63) without its last row and column It is easy to

see that n times this matrix tends to the top left block of (7.65) as n → ∞.

The lower right-hand element of the matrix (7.65) tells us that, when all the

regressors are exogenous, the asymptotic variance of n 1/2(ˆρ − ρ0) is 1 − ρ2

0

A sensible estimate of the variance is therefore dVar(ˆρ) = n −1 (1 − ˆ ρ2) It mayseem surprising that the variance of ˆρ does not depend on σ2

ε However, we sawearlier that, with exogenous regressors, the consistent estimator ˜ρ of (7.47) is

scale invariant The same is true, asymptotically, of the NLS estimator ˆρ, and

so its asymptotic variance is independent of σ2

ε.Comparison of GLS and NLS

The most obvious difference between estimation by GLS and estimation byNLS is the treatment of the first observation: GLS takes it into account, andNLS does not This difference reflects the fact that the two procedures areestimating slightly different models With NLS, all that is required is the

stationarity condition that |ρ| < 1 With GLS, on the other hand, the error

process must actually be stationary Recall that the stationarity condition isnecessary but not sufficient for stationarity of the process A sufficient con-

dition requires, in addition, that Var(u1) = σ2

The second major difference between estimation by GLS and estimation by

NLS is that the former method estimates β conditional on ρ, while the latter

Trang 9

7.8 Estimating Models with Autoregressive Errors 287

method estimates β and ρ jointly Except in the unlikely case in which the value of ρ is known, the first step in GLS is to estimate ρ consistently If the explanatory variables in the matrix X are all exogenous, there are several procedures that will deliver a consistent estimate of ρ The weak point is

that the estimate is not unique, and in general it is not optimal One possiblesolution to this difficulty is to iterate the feasible GLS procedure, as suggested

at the end of Section 7.4, and we will consider this solution below

A more fundamental weakness of GLS arises whenever one or more of theexplanatory variables are lagged dependent variables, or, more generally, pre-determined but not exogenous variables Even with a consistent estimator

of ρ, one of the conditions for the applicability of feasible GLS, condition (7.23), does not hold when any elements of X t are not exogenous It is notsimple to see directly just why this is so, but, in the next paragraph, we willobtain indirect evidence by showing that feasible GLS gives an invalid estima-tor of the covariance matrix Fortunately, there is not much temptation to useGLS if the non-exogenous explanatory variables are lagged variables, becauselagged variables are not observed for the first observation In all events, theconclusion is simple: We should avoid GLS if the explanatory variables arenot all exogenous

The GLS covariance matrix estimator is (7.61), which is obtained by regressing

Ψ >(ˆρ)y on Ψ >(ˆρ)X for some consistent estimate ˆ ρ Since Ψ > (ρ)u = ε by construction, s2 is an estimator of σ2

ε Moreover, the first observation has no

impact asymptotically Therefore, the limit as n → ∞ of n times (7.61) is the

In contrast, the NLS covariance matrix estimator is (7.63) With exogenous

regressors, n times (7.63) tends to the same limit as (7.65), of which the top

left block is just (7.66) But when the regressors are not all exogenous, the

argument that the off-diagonal blocks of n times (7.63) tend to zero no longer

works, and, in fact, the limits of these blocks are in general nonzero When amatrix that is not block-diagonal is inverted, the top left block of the inverse

is not the same as the inverse of the top left block of the original matrix;see Exercise 7.11 In fact, as readers are asked to show in Exercise 7.12, thetop left block of the inverse is greater by a positive semidefinite matrix thanthe inverse of the top left block Consequently, the GLS covariance matrixestimator underestimates the true covariance matrix asymptotically

NLS has only one major weak point, which is that it does not take account ofthe first observation Of course, this is really an advantage if the error processsatisfies the stationarity condition without actually being stationary, or ifsome of the explanatory variables are not exogenous But with a stationaryerror process and exogenous regressors, we wish to retain the information inthe first observation, because it appears that retaining the first observationcan sometimes lead to a noticeable efficiency gain in finite samples The

Trang 10

reason is that the transformation for observation 1 is quite different from thetransformation for all the other observations In consequence, the transformedfirst observation may well be a high leverage point; see Section 2.6 This

is particularly likely to happen if one or more of the regressors is stronglytrending If so, dropping the first observation can mean throwing away a lot

of information See Davidson and MacKinnon (1993, Section 10.6) for a muchfuller discussion and references

Efficient Estimation by GLS or NLS

When the error process is stationary and all the regressors are exogenous, it

is possible to obtain an estimator with the best features of GLS and NLS bymodifying NLS so that it makes use of the information in the first observationand therefore yields an efficient estimator The first-order conditions (7.07)for GLS estimation of the model (7.56) can be written as

any value instead of just ˜β.

In Section 7.4, we mentioned the possibility of using an iterated feasible GLSprocedure We can now see precisely how such a procedure would work forthis model In the first step, we obtain the OLS parameter vector ˜β In the

Trang 11

7.8 Estimating Models with Autoregressive Errors 289

second step, the formula (7.69) is evaluated at β = ˜ β to obtain ˜ ρ, a consistent estimate of ρ In the third step, we use (7.60) to obtain the feasible GLS

estimate ˆβF, thus solving the first-order conditions (7.67) At this point, we

go back to the second step and insert ˆβF into (7.69) for an updated estimate

of ρ, which we subsequently use in (7.60) for the next estimate of β The

iterative procedure may then be continued until convergence, assuming that

it does converge If so, then the final estimates, which we will call ˆβ and ˆ ρ,

must satisfy the two equations

feasible GLS, without the first observation, is identical to NLS If the first

observation is retained, then iterated feasible GLS improves on NLS by takingaccount of the first observation

We can also modify NLS to take account of the first observation To do this,

we extend the GNR (6.67), which is given by (7.62) when evaluated at ˆβ

and ˆρ, by giving it a first observation For this observation, the regressand

is (1 − ρ2)1/2 (y1− X1β), the regressors corresponding to β are given by the row vector (1 − ρ2)1/2 X1, and the regressor corresponding to ρ is zero The

conditions that the extended regressand should be orthogonal to the extendedregressors are exactly the conditions (7.70)

Two asymptotically equivalent procedures can be based on this extended

GNR Both begin by obtaining the NLS estimates of β and ρ without the

first observation and evaluating the extended GNR at those preliminary NLSestimates The OLS estimates from the extended GNR can be thought of as

a vector of corrections to the initial estimates For the first procedure, thefinal estimator is a one-step estimator, defined as in (6.59) by adding the cor-rections to the preliminary estimates For the second procedure, this process

is iterated The variables of the extended GNR are evaluated at the one-stepestimates, another set of corrections is obtained, these are added to the pre-vious estimates, and iteration continues until the corrections are negligible Ifthis happens, the iterated estimates once more satisfy the conditions (7.70),and so they are equal to the iterated GLS estimates

Although the iterated feasible GLS estimator generally performs well, it does

have one weakness: There is no way to ensure that |ˆ ρ| < 1 In the unlikely but not impossible event that |ˆ ρ| ≥ 1, the estimated covariance matrix (7.61)

will not be valid, the second term in (7.67) will be negative, and the first

observation will therefore tend to have a perverse effect on the estimates of β.

Trang 12

In Chapter 10, we will see that maximum likelihood estimation shares thegood properties of iterated feasible GLS while also ensuring that the estimate

of ρ satisfies the stationarity condition.

The iterated feasible GLS procedure considered above has much in commonwith a very old, but still widely-used, algorithm for estimating models withstationary AR(1) errors This algorithm, which is called iterated Cochrane-Orcutt, was originally proposed in a classic paper by Cochrane and Orcutt(1949) It works in exactly the same way as iterated feasible GLS, except that

it omits the first observation The properties of this algorithm are explored

in Exercises 7.18-19

7.9 Specification Testing and Serial Correlation

Models estimated using time-series data frequently appear to have error termswhich are serially correlated However, as we will see, many types of misspec-

ification can create the appearance of serial correlation Therefore, finding

evidence of serial correlation does not mean that it is necessarily appropriate

to model the error terms as following some sort of autoregressive or movingaverage process If the regression function of the original model is misspecified

in any way, then a model like (7.41), which has been modified to incorporateAR(1) errors, will probably also be misspecified It is therefore extremelyimportant to test the specification of any regression model that has been

“corrected” for serial correlation

The Appearance of Serial Correlation

There are several types of misspecification of the regression function that canincorrectly create the appearance of serial correlation For instance, it may bethat the true regression function is nonlinear in one or more of the regressorswhile the estimated one is linear In that case, depending on how the dataare ordered, the residuals from a linear regression model may well appear to

be serially correlated All that is needed is for the independent variables onwhich the dependent variable depends nonlinearly to be correlated with time

As a concrete example, consider Figure 7.1, which shows 200 hypothetical

observations on a regressor x and a regressand y, together with an OLS

re-gression line and the fitted values from the true, nonlinear model For thelinear model, the residuals are always negative for the smallest and largest

values of x, and they tend to be positive for the intermediate values As a

consequence, they appear to be serially correlated: If the observations are

ordered according to the value of x, the estimate ˜ ρ obtained by regressing the OLS residuals on themselves lagged once is 0.298, and the t statistic for ρ = 0

is 4.462 Thus, if the data are ordered in this way, there appears to be strong

evidence of serial correlation But this evidence is misleading Either plotting

the residuals against x or including x2 as an additional regressor will quicklyreveal the true nature of the misspecification

Trang 13

7.9 Specification Testing and Serial Correlation 291

.

. .

.

. .

.

. .

.

. .

.

Regression line for linear model

Fitted values for true model

x y

Figure 7.1 The appearance of serial correlation

The true regression function in this example contains a term in x2 Since the linear model omits this term, it is underspecified, in the sense discussed

in Section 3.7 Any sort of underspecification has the potential to create the appearance of serial correlation if the incorrectly omitted variables are themselves serially correlated Therefore, whenever we find evidence of serial correlation, our first reaction should be to think carefully about the specifica-tion of the regression funcspecifica-tion Perhaps one or more addispecifica-tional independent variables should be included among the regressors Perhaps powers, cross-products, or lags of some of the existing independent variables need to be included Or perhaps the regression function should be made dynamic by including one or more lags of the dependent variable

Common Factor Restrictions

It is very common for linear regression models to suffer from dynamic mis-specification The simplest example is failing to include a lagged dependent variable among the regressors More generally, dynamic misspecification oc-curs whenever the regression function incorrectly omits lags of the dependent variable or of one or more independent variables A somewhat mechanical, but often very effective, way to detect dynamic misspecification in models with autoregressive errors is to test the common factor restrictions that are implicit in such models The idea of testing these restrictions was initially pro-posed by Sargan (1964) and further developed by Hendry and Mizon (1978), Mizon and Hendry (1980), Sargan (1980), and others See Hendry (1995) for

a detailed treatment of dynamic specification in linear regression models

Trang 14

The easiest way to understand what common factor restrictions are and howthey got their name is to consider a linear regression model with errors thatapparently follow an AR(1) process In this case, there are really three nestedmodels The first of these is the original linear regression model with errorterms that are assumed to be serially independent:

H0: y t = X t β + u t , u t ∼ IID(0, σ2) (7.71)

The second is the nonlinear model (7.41) that is obtained when the errorterms in (7.71) follow the AR(1) process (7.29) Although we have alreadydiscussed this model extensively, we rewrite it here for convenience:

H1: y t = ρy t−1 + X t β − ρX t−1 β + ε t , ε t ∼ IID(0, σ ε2) (7.72)

The third is the linear model that can be obtained by relaxing the nonlinearrestrictions which are implicit in (7.72) This model is

H2: y t = ρy t−1 + X t β + X t−1 γ + ε t , ε t ∼ IID(0, σ ε2), (7.73) where γ, like β, is a k vector When all three of these models are estimated over the same sample period, the original model, H0, is a special case of the

nonlinear model H1, which in turn is a special case of the unrestricted linear

model H2 Of course, in order to estimate H1 and H2, we need to drop thefirst observation

The nonlinear model H1 imposes on H2 the restrictions that γ = −ρβ The

reason for calling these restrictions “common factor” restrictions can easily beseen if we rewrite both models using lag operator notation (see Section 7.6)

When we do this, H1 becomes

(1 − ρL)y t = (1 − ρL)X t β + ε t , (7.74) and H2 becomes

(1 − ρL)y t = X t β + LX t γ + ε t (7.75)

It is evident that in (7.74), but not in (7.75), the common factor 1 − ρL

appears on both sides of the equation This is where the term “commonfactor restrictions” comes from

How Many Common Factor Restrictions Are There?

There is one feature of common factor restrictions that can be tricky: It isoften not obvious just how many restrictions there are For the case of testing

H1 against H2, there appear to be k restrictions The null hypothesis, H1,

has k + 1 parameters (the k vector β and the scalar ρ), and the alternative hypothesis, H2, seems to have 2k + 1 parameters (the k vectors β and γ, and the scalar ρ) Therefore, the number of restrictions appears to be the difference between 2k + 1 and k + 1, which is k In fact, however, the number

Trang 15

7.9 Specification Testing and Serial Correlation 293

of restrictions will almost always be less than k, because, except in rare cases, the number of identifiable parameters in H2 will be less than 2k + 1 We now

show why this is the case

Let us consider a simple example Suppose the regression function for the

original model H0 is

β1+ β2z t + β3t + β4z t−1 + β5y t−1 , (7.76) where z t is the tthobservation on some independent variable, and t is the tth

observation on a linear time trend The regression function for the unrestricted

model H2 that corresponds to (7.76) is

β1+ β2z t + β3t + β4z t−1 + β5y t−1 + ρy t−1 + γ1+ γ2z t−1 + γ3(t − 1) + γ4z t−2 + γ5y t−2 (7.77)

At first glance, this regression function appears to have 11 parameters ever, it really has only 7, because 4 of them are unidentifiable We cannot

How-estimate both β1 and γ1, because there cannot be two constant terms

Like-wise, we cannot estimate both β4 and γ2, because there cannot be two

coef-ficients of z t−1 , and we cannot estimate both β5 and ρ, because there cannot

be two coefficients of y t−1 We also cannot estimate γ3 along with β3 and

the constant, because t, t − 1, and the constant term are perfectly collinear, since t − (t − 1) = 1 The version of H2 that can actually be estimated hasregression function

δ1+ β2z t + δ2t + δ3z t−1 + δ4y t−1 + γ4z t−2 + γ5y t−2 , (7.78)

where

δ1= β1+ γ1− γ3, δ2 = β3+ γ3, δ3= β4+ γ2, and δ4= ρ + β5.

We see that (7.78) has only 7 identifiable parameters: β2, γ4, γ5, δ1, δ2,

δ3, and δ4, instead of the 11 parameters, many of them not identifiable, ofexpression (7.77) In contrast, the regression function for the restricted model,

H1, has 6 parameters: β1 through β5, and ρ Therefore, in this example, H1imposes just one restriction on H2

The phenomenon illustrated in this example arises, to a greater or lesserextent, for almost every model with common factor restrictions Constantterms, many types of dummy variables (notably, seasonal dummies and timetrends), lagged dependent variables, and independent variables that appear

with more than one time subscript always lead to an unrestricted model H2

with some parameters that cannot be identified The number of identifiable

parameters will almost always be less than 2k + 1, and, in consequence, the number of restrictions will almost always be less than k.

Trang 16

Testing Common Factor Restrictions

Any of the techniques discussed in Sections 6.7 and 6.8 can be used to testcommon factor restrictions In practice, if the error terms are believed to be

homoskedastic, the easiest approach is probably to use an asymptotic F test.

For the example of equations (7.72) and (7.73), the restricted sum of squared

residuals, RSSR, is obtained from NLS estimation of H1, and the unrestricted

one, USSR, is obtained from OLS estimation of H2 Then the test statistic is

(RSSR − USSR)/r USSR/(n − k − r − 2)

a

∼ F (r, n − k − r − 2), (7.79)

where r is the number of restrictions The number of degrees of freedom in the denominator reflects the fact that the unrestricted model has k + r + 1 parameters and is estimated using the n − 1 observations for t = 2, , n.

Of course, since both the null and alternative models involve lagged dependent

variables, the test statistic (7.79) does not actually follow the F (r, n−k−r−2)

distribution in finite samples Therefore, when the sample size is not large,

it is a good idea to bootstrap the test As Davidson and MacKinnon (1999a)

have shown, highly reliable P values may be obtained in this way, even for

very small sample sizes The bootstrap samples are generated recursively from

the restricted model, H1, using the NLS estimates of that model As withbootstrap tests for serial correlation, the bootstrap error terms may either bedrawn from the normal distribution or obtained by resampling the rescaledNLS residuals; see the discussion in Sections 4.6 and 7.7

Although this bootstrap procedure is conceptually simple, it may be quiteexpensive to compute, because the nonlinear model (7.72) must be estimatedfor every bootstrap sample It may therefore be more attractive to follow theidea in Exercises 6.17 and 6.18 by bootstrapping a GNR-based test statistic

that requires no nonlinear estimation at all For the H1 model (7.72), thecorresponding GNR is (7.62), but now we wish to evaluate it, not at the NLSestimates from (7.72), but at the estimates ´β and ´ ρ obtained by estimating the linear H2 model (7.73) These estimates are root-n consistent under H2,

and so also under H1, which is contained in H2 as a special case Thus the

GNR for H1, which was introduced in Section 6.6, is

y t − ´ ρy t−1 − X t β + ´´ ρX t−1 β´

= (X t − ´ ρX t−1 )b + b ρ (y t−1 − X t−1 β) + residual.´ (7.80) Since H2 is a linear model, the regressors of the GNR that corresponds to itare just the regressors in (7.73), and the regressand is the same as in (7.80);

recall Section 6.5 However, in order to construct the GNR-based F statistic,

which has exactly the same form as (7.79), it is not necessary to run the

GNR for model H2 at all Since the regressand of (7.80) is just the dependentvariable of (7.73) plus a linear combination of the independent variables, the

Trang 17

7.9 Specification Testing and Serial Correlation 295residuals from (7.73) are the same as those from its GNR Consequently, wecan evaluate (7.79) with USSR from (7.73) and RSSR from (7.80).

In Section 6.6, we gave the impression that ´β and ´ ρ are simply the OLS timates of β and ρ from (7.73) When X contains neither lagged dependent

es-variables nor multiple lags of any independent variable, this is true ever, when these conditions are not satisfied, the parameters of (7.73) do notcorrespond directly to those of (7.72), and this makes it a little more compli-cated to obtain consistent estimates of these parameters Just how to do sowas discussed in Section 10.3 of Davidson and MacKinnon (1993) and will beillustrated in Exercise 7.16

How-Tests of Nested Hypotheses

The models H0, H1, and H2defined in (7.71) through (7.73) form a sequence

of nested hypotheses Such sequences occur quite frequently in many branches

of econometrics, and they have an interesting property Asymptotically, the F statistic for testing H0 against H1 is independent of the F statistic for testing

H1 against H2 This is true whether we actually estimate H1 or merely use

a GNR, and it is also true for other test statistics that are asymptotically

equivalent to F statistics In fact, the result is true for any sequence of nested hypotheses where the test statistics follow χ2distributions asymptotically; seeDavidson and MacKinnon (1993, Supplement) and Exercise 7.21

The independence property of tests in a nested sequence has a useful

impli-cation Suppose that τ ij denotes the statistic for testing H i , which has k i parameters, against H j , which has k j > k i parameters, where i = 0, 1 and

j = 1, 2, with j > i Then, if each of the test statistics is asymptotically distributed as χ2(k j − k i),

This result implies that, at least asymptotically, each of the component test

statistics is bounded above by the test statistic for H0 against H2

The result (7.81) is not particularly useful in the case of (7.71), (7.72), and(7.73), where all of the test statistics are quite easy to compute However, itcan sometimes come in handy Suppose, for example, that it is easy to test

H0 against H2 but hard to test H0 against H1 Then, if τ02 is small enough

that it would not cause us to reject H0 against H1 when compared with the

appropriate critical value for the χ2(k1− k0) distribution, we do not need to

bother calculating τ01, because it will be even smaller

Trang 18

7.10 Models for Panel Data

Many data sets are measured across two dimensions One dimension is time,and the other is usually called the cross-section dimension For example, wemay have 40 annual observations on 25 countries, or 100 quarterly observations

on 50 states, or 6 annual observations on 3100 individuals Data of this typeare often referred to as panel data It is likely that the error terms for a modelusing panel data will display certain types of dependence, which should betaken into account when we estimate such a model

For simplicity, we restrict our attention to the linear regression model

y it = X it β + u it , i = 1, , m, t = 1, , T, (7.82)

where X it is a 1 × k vector of observations on explanatory variables There are assumed to be m cross-sectional units and T time periods, for a total

of n = mT observations If each u it has expectation zero conditional on its

corresponding X it, we can estimate equation (7.82) by ordinary least squares

But the OLS estimator is not efficient if the u it are not IID, and the IIDassumption is rarely realistic with panel data

If certain shocks affect the same cross-sectional unit at all points in time,

the error terms u it and u is will be correlated for all t 6= s Similarly, if

certain shocks affect all cross-sectional units at the same point in time, the

error terms u it and u jt will be correlated for all i 6= j In consequence, if

we use OLS, not only will we obtain inefficient parameter estimates, but wewill also obtain an inconsistent estimate of their covariance matrix; recall

the discussion of Section 5.5 If the expectation of u it conditional on X it is

not zero, then, for reasons mentioned in Section 7.4, OLS will actually yield inconsistent parameter estimates This will happen, for example, when X it contains lagged dependent variables and the u it are serially correlated.Error-Components Models

The two most popular approaches for dealing with panel data are both based

on what are called error-components models The idea is to specify the error

term u it in (7.82) as consisting of two or three separate shocks, each of which

is assumed to be independent of the others A fairly general specification is

Here e t affects all observations for time period t, v i affects all observations

for cross-sectional unit i, and ε it affects only observation it It is ally assumed that the e t are independent across t, the v i are independent

gener-across i, and the ε it are independent across all i and t Classic papers on

error-components models include Balestra and Nerlove (1966), Fuller and Battese(1974), and Mundlak (1978)

Trang 19

7.10 Models for Panel Data 297

In order to estimate an error-components model, the e t and v ican be regarded

as being either fixed or random, in a sense that we will explain If the e t and v i are thought of as fixed effects, then they are treated as parameters

to be estimated It turns out that they can then be estimated by OLS usingdummy variables If they are thought of as random effects, then we must

figure out the covariance matrix of the u it as functions of the variances of

the e t , v i , and ε it, and use feasible GLS Each of these approaches can beappropriate in some circumstances but may be inappropriate in others

In what follows, we simplify the error-components specification (7.83) by

elim-inating the e t Thus we assume that there are shocks specific to each sectional unit, or group, but no time-specific shocks This assumption is oftenmade in empirical work, and it considerably simplifies the algebra In addi-

cross-tion, we assume that the X itare exogenous The presence of lagged dependentvariables in panel data models raises a number of issues that we do not wish

to discuss here; see Arellano and Bond (1991) and Arellano and Bover (1995).Fixed-Effects Estimation

The model that underlies fixed-effects estimation, based on equation (7.82)and the simplified version of equation (7.83), can be written as follows:

y = Xβ + Dη + ε, E(εε > ) = σ2

where y and ε are n vectors with typical elements y it and ε it, respectively,

and D is an n × m matrix of dummy variables, constructed in such a way that the element in the row corresponding to observation it, for i = 1, , m and t = 1, , T, and column j, for j = 1, , m, is equal to 1 if i = j

and equal to 0 otherwise.3 The m vector η has typical element v i, and so

it follows that the n vector Dη has element v i in the row corresponding to

observation it Note that there is exactly one element of D equal to 1 in each row, which implies that the n vector ι with each element equal to 1 is a linear combination of the columns of D Consequently, in order to avoid collinear regressors, the matrix X should not contain a constant.

The vector η plays the role of a parameter vector, and it is in this sense that the v i are called fixed effects They could in fact be random; the essential thing

is that they must be independent of the error terms ε it They may, however,

be correlated with the explanatory variables in the matrix X Whether or not this is the case, the model (7.84), interpreted conditionally on η, implies

that the moment conditions

Trang 20

are satisfied The fixed-effects estimator, which is the OLS estimator of β

in equation (7.84), is based on these moment conditions Because of the way

it is computed, this estimator is sometimes called the least squares dummyvariables, or LSDV, estimator

Let M D denote the projection matrix I − D(D > D) −1 D > Then, by the FWL

Theorem, we know that the OLS estimator of β in (7.84) can be obtained

by regressing M D y, the residuals from a regression of y on D, on M D X, the matrix of residuals from regressing each of the columns of X on D The

fixed-effects estimator is therefore

ˆ

βFE = (X > M D X) −1 X > M D y (7.85) For any n vector x, let ¯ x i denote the group mean T −1PT

t=1 x it Then it

is easy to check that element it of the vector M D x is equal to x it − ¯ x i,the deviation from the group mean Since all the variables in (7.85) are

premultiplied by M D, it follows that this estimator makes use only of the

information in the variation around the mean for each of the m groups For this reason, it is often called the within-groups estimator Because X and D

are exogenous, this estimator is unbiased Moreover, since the conditions ofthe Gauss-Markov theorem are satisfied, we can conclude that the fixed-effectsestimator is BLUE

The fixed-effects estimator (7.85) has advantages and disadvantages It is

easy to compute, even when m is very large, because it is never necessary to make direct use of the n × n matrix M D All that is needed is to compute

the m group means for each variable In addition, the estimates ˆ η of the fixed

effects may well be of interest in their own right However, the estimatorcannot be used with an explanatory variable that takes on the same value forall the observations in each group, because such a column would be collinear

with the columns of D More generally, if the explanatory variables in the matrix X are well explained by the dummy variables in D, the parameter vector β will not be estimated at all precisely It is of course possible to

estimate a constant, simply by taking the mean of the estimates ˆη.

Random-Effects Estimation

It is possible to improve on the efficiency of the fixed-effects estimator if one

is willing to impose restrictions on the model (7.84) For that model, all we

require is that the matrix X of explanatory variables and the cross-sectional errors v i should both be independent of the ε it, but this does not rule outthe possibility of a correlation between them The restrictions imposed for

random-effects estimation require that the v i should be independent of X.

This independence assumption is by no means always plausible For example,

in a panel of observations on individual workers, an observed variable likethe hourly wage rate may well be correlated with an unobserved variable

Trang 21

7.10 Models for Panel Data 299

like ability, which implicitly enters into the individual-specific error term v i.However, if the assumption is satisfied, it follows that

E(u it | X) = E(v i + ε it | X) = 0, (7.86)

since v i and ε it are then both independent of X Condition (7.86) is precisely

the condition which ensures that OLS estimation of the model (7.82), ratherthan the model (7.84), will yield unbiased estimates

However, OLS estimation of equation (7.82) is not in general efficient, because

the u it are not IID We can calculate the covariance matrix of the u it if we

assume that the v i are IID random variables with mean zero and variance σ2

v.This assumption accounts for the term “random” effects From (7.83), setting

e t = 0 and using the assumption that the shocks are independent, it is easy

to see that

Var(u it ) = σ2

v + σ2

ε , Cov(u it u is ) = σ2v , and Cov(u it u js ) = 0 for all i 6= j.

These define the elements of the n × n covariance matrix Ω, which we need

for GLS estimation If the data are ordered by the cross-sectional units in

m blocks of T observations each, this matrix has the form

v everywhere else Here ι is a T vector of 1s.

To obtain GLS estimates of β, we would need to know the values of σ2

ε and σ2

v,

or, at least, the value of their ratio, since, as we saw in Section 7.3, GLS

estimation requires only that Ω should be specified up to a factor To obtain

feasible GLS estimates, we need a consistent estimate of that ratio However,the reader may have noticed that we have made no use in this section so far

of asymptotic concepts, such as that of a consistent estimate This is because,

in order to obtain definite results, we must specify what happens to both m and T when n = mT tends to infinity.

Consider the fixed-effects model (7.84) If m remains fixed as T → ∞, then the number of regressors also remains fixed as n → ∞, and standard asymptotic theory applies But if T remains fixed as m → ∞, then the number of parameters to be estimated tends to infinity, and the m vector ˆ η of estimates

Trang 22

of the fixed effects is not consistent, because each estimated effect depends

only on T observations It is nevertheless possible to show that, even in this

case, ˆβ remains consistent; see Exercise 7.23.

It is always possible to find a consistent estimate of σ2

ε by estimating the

model (7.84), because, no matter how m and T may behave as n → ∞, there are n residuals Thus, if we divide the SSR from (7.84) by n − m − k, we will obtain an unbiased and consistent estimate of σ2

ε, since the error terms for this

model are just the ε it But the natural estimator of σ2

v, namely, the sample

variance of the m elements of ˆ η, is not consistent unless m → ∞ In practice,

therefore, it is probably undesirable to use the random-effects estimator when

ˆ

βBG = (X > P D X) −1 X > P D y (7.89) Although regression (7.88) appears to have n = mT observations, it really has only m, because the regressand and all the regressors are the same for every

observation in each group The estimator bears the name “between-groups”

because it uses only the variation among the group means If m < k, note that the estimator (7.89) does not even exist, since the matrix X > P D X can have rank at most m.

If the restrictions of the random-effects model are not satisfied, the estimatorˆ

βBG, if it exists, is in general biased and inconsistent To see this, observethat unbiasedness and consistency require that the moment conditions

E¡(P D X) >

it (y it − X it β)¢= 0 (7.90) should hold, where (P D X) it is the row labelled it of the n × k matrix P D X Since y it − X it β = v i + ε it , and since ε it is independent of everything else

in condition (7.90), this condition is equivalent to the absence of correlation

between the v i and the elements of the matrix X.

As readers are asked to show in Exercise 7.24, the variance of the error terms

in regression (7.88) is σ2

v + σ2

ε /T Therefore, if we run it as a regression with m observations, divide the SSR by m − k, and then subtract 1/T times our estimate of σ2

ε, we will obtain a consistent, but not necessarily positive,

Trang 23

7.10 Models for Panel Data 301the between-groups estimator (7.89) For the former to be consistent, we needonly the assumptions of the fixed-effects model, but for the latter we need inaddition the restrictions of the random-effects model Thus both the OLSestimator of (7.82) and the feasible GLS estimator are consistent only if thebetween-groups estimator is consistent.

For the OLS estimator of (7.82),

ˆ

β = (X > X) −1 X > y

= (X > X) −1 (X > M D y + X > P D y)

= (X > X) −1 X > M D X ˆ βFE+ (X > X) −1 X > P D X ˆ βBG,

which shows that the estimator is indeed a matrix-weighted average of ˆβFE

and ˆβBG As readers are asked to show in Exercise 7.25, the GLS estimator

of the random-effects model can be obtained by running the OLS regression

(I − λP D )y = (I − λP D )Xβ + residuals, (7.91) where the scalar λ is defined by

matrix-identical to the OLS estimator when λ = 0, which happens when σ2

v = 0,

and equal to the within-groups, or fixed-effects, estimator when λ = 1, which happens when σ2

ε = 0 Except in these two special cases, the GLS estimator

is more efficient, in the context of the random-effects model, than either theOLS estimator or the fixed-effects estimator But equation (7.91) also impliesthat the random-effects estimator is inconsistent whenever the between-groupsestimator is inconsistent

Unbalanced Panels

Up to this point, we have assumed that we are dealing with a balanced panel,

that is, a data set for which there are precisely T observations for each

cross-sectional unit However, it is quite common to encounter unbalanced panels,for which the number of observations is not the same for every cross-sectionalunit The fixed-effects estimator can be used with unbalanced panels withoutany real change It is still based on regression (7.84), and the only change is

that the matrix of dummy variables D will no longer have the same number

of 1s in each column The random-effects estimator can also be used withunbalanced panels, but it needs to be modified slightly

Trang 24

Let us assume that the data are grouped by cross-sectional units Let T i denote the number of observations associated with unit i, and partition y and

X as follows:

y = [y1 y2 ··· y m ], X = [X1 X2 ··· X m ], where y i and X i denote the T i rows of y and X that correspond to the ith

unit By analogy with (7.92), make the definition

Let ¯y i denote a T i vector, each element of which is the mean of the elements

of y i Similarly, let ¯X i denote a T i × k matrix, each element of which is the mean of the corresponding column of X i Then the random-effects estimatorcan be computed by running the linear regression

reduces to regression (7.91) in that special case

Group Effects and Individual Data

Error-components models are also relevant for regressions on cross-sectiondata with no time dimension, but where the observations naturally belong togroups For example, each observation might correspond to a household living

in a certain state, and each group would then consist of all the householdsliving in a particular state In such cases, it is plausible that the error terms forindividuals within the same group are correlated An error-components model

that combines a group-specific error v i , with variance σ2

v, and an

individual-specific error ε it , with variance σ2

ε, is a natural way to model this sort ofcorrelation Such a model implies that the correlation between the error terms

for observations in the same group is ρ ≡ σ2

Trang 25

7.11 Final Remarks 303panel, because this model takes account of between-group variation Thiscan be seen from equation (7.93): Collinearity of the transformed group-levelvariables on the right-hand side occurs only if the explanatory variables are

collinear to begin with The estimates of σ2

ε and σ2

v needed to compute the

λ i may be obtained in various ways, some of which were discussed in thesubsection on random-effects estimation As we remarked there, these work

well only if the number of groups m is not too small.

If it is thought that the within-group correlation ρ is small, it may be tempting

to ignore it and use OLS estimation, with the usual OLS covariance matrix

This can be a serious mistake unless ρ is actually zero, since the OLS dard errors can be drastic underestimates even with small values of ρ, as

stan-Kloek (1981) and Moulton (1986, 1990) have pointed out The problem isparticularly severe when the number of observations per group is large, asreaders are asked to show in Exercise 7.26 The correlation of the error termswithin groups means that the effective sample size is much smaller than theactual sample size when there are many observations per group

In this section, we have presented just a few of the most basic ideas concerningestimation with panel data Of course, GLS is not the only method that can

be used to estimate models for data of this type The generalized method ofmoments (Chapter 9) and the method of maximum likelihood (Chapter 10)are also commonly used For more detailed treatments of various modelsfor panel data, see, among others, Chamberlain (1984), Hsiao (1986, 2001),Baltagi (1995), Greene (2000, Chapter 14), Ruud (2000, Chapter 24), Arellanoand Honor´e (2001), and Wooldridge (2001)

7.11 Final Remarks

Several important concepts were introduced in the first four sections of thischapter, which dealt with the basic theory of generalized least squares esti-mation The concept of an efficient MM estimator, which we introduced inSection 7.2, will be encountered again in the context of generalized instru-mental variables estimation (Chapter 8) and generalized method of momentsestimation (Chapter 9) The key idea of feasible GLS estimation, namely, that

an unknown covariance matrix may in some circumstances be replaced by aconsistent estimate of that matrix without changing the asymptotic properties

of the resulting estimator, will also be encountered again in Chapter 9.The remainder of the chapter dealt with the treatment of heteroskedasticityand serial correlation in linear regression models, and with error-componentsmodels for panel data Although this material is of considerable practicalimportance, most of the techniques we discussed, although sometimes compli-cated in detail, are conceptually straightforward applications of feasible GLSestimation, NLS estimation, and methods for testing hypotheses that wereintroduced in Chapters 4 and 6

Trang 26

7.12 Exercises

7.1 Using the fact that E(uu > | X) = Ω for regression (7.01), show directly,

without appeal to standard OLS results, that the covariance matrix of the GLS estimator ˆβGLS is given by (7.05).

7.2 Show that the matrix (7.11), reproduced here for easy reference,

X > Ω −1 X − X > W (W > Ω W ) −1 W > X,

is positive semidefinite As in Section 6.2, this may be done by showing that

this matrix can be expressed in the form Z > M Z, for some n × k matrix Z and some n × n orthogonal projection matrix M It is helpful to express Ω −1

as Ψ Ψ >, as in (7.02).

7.3 Using the data in the file earnings.data, run the regression

y t = β1d 1t + β2d 2t + β3d 3t + u t , which was previously estimated in Exercise 5.3 Recall that the d itare dummy

variables Then test the null hypothesis that E(u2t ) = σ2against the tive that

alterna-E(u2t ) = γ1d 1t + γ2d 2t + γ3d 3t Report P values for F and nR2 tests.

7.4 If u t follows the stationary AR(1) process

u t = ρu t−1 + ε t , ε t ∼ IID(0, σ ε2), |ρ| < 1, show that Cov(u t u t−j ) = Cov(u t u t+j ) = ρ j σ ε2/(1 − ρ2) Then use this result

to show that the correlation between u t and u t−j is just ρ j.

7.5 Consider the nonlinear regression model y t = x t (β) + u t Derive the GNR for

testing the null hypothesis that the u t are serially uncorrelated against the alternative that they follow an AR(1) process.

7.6 Show how to test the null hypothesis that the error terms of the linear

regres-sion model y = Xβ + u are serially uncorrelated against the alternative that

they follow an AR(4) process by means of a GNR Derive the test GNR from first principles.

7.7 Consider the following three models, where u t is assumed to be IID(0, σ2):

H0: y t = β + u t

H1: y t = β + ρ(y t−1 − β) + u t

H2: y t = β + u t + αu t−1 Explain how to test H0 against H1by using a GNR Then show that exactly

the same test statistic is also appropriate for testing H0 against H2.

7.8 Write the trace in (7.50) explicitly in terms of P X rather than M X, and show

that the terms containing one or more factors of P Xall vanish asymptotically.

Trang 27

7.12 Exercises 305

7.9 By direct matrix multiplication, show that, if Ψ is given by (7.59), then Ψ Ψ >

is equal to the matrix

Show further, by direct calculation, that this matrix is proportional to the

inverse of the matrix Ω given in (7.32).

7.10 Show that equation (7.30), relating u to ε, can be modified to take account

of the definition (7.58) of ε1, with the result that

u t = ε t + ρε t−1 + ρ2ε t−2 + · · · + ρ

t−1

(1 − ρ2 )1/2 ε1 (7.94) The relation Ψ > u = ε implies that u = (Ψ >)−1 ε Use the result (7.94) to show that Ψ −1 can be written as

where θ ≡ (1 − ρ2)−1/2 Verify by direct calculation that this matrix is the

inverse of the Ψ given by (7.59).

7.11 Consider a square, symmetric, nonsingular matrix partitioned as follows

Trang 28

7.12 Suppose that the matrix H of the previous question is positive definite It therefore follows (see Section 3.4) that there exists a square matrix X such that H = X > X Partition X as [X1 X2], so that

where the blocks of the matrix on the right-hand side are the same as the

blocks in (7.95) Show that the top left block D of H −1 can be expressed

as (X1> M2X1)−1 , where M2 = I − X2(X2> X2)−1 X2> Use this result to

show that D − A −1 = (X1> M2X1)−1 − (X1> X1)−1is a positive semidefinite matrix.

7.13 Consider testing for first-order serial correlation of the error terms in the regression model

where y1 is the vector with typical element y t−1, by use of the statistics

tGNR and tSR defined in (7.51) and (7.52), respectively Show first that the

vector denoted as M X u˜1 in (7.51) and (7.52) is equal to − ˜ βM X y2, where

y2 is the vector with typical element y t−2, and ˜β is the OLS estimate of β from (7.96) Then show that, as n → ∞, tGNR tends to the random vari-

able τ ≡ σ −2 u plim n −1/2 (βy1− y2)> u, whereas tSRtends to the same random

variable times β Show finally that tGNR, but not tSR, provides an

asymptot-ically correct test, by showing that the random variable τ is asymptotasymptot-ically distributed as N (0, 1).

7.14 The file money.data contains seasonally adjusted quarterly data for the

loga-rithm of the real money supply, m t , real GDP, y t, and the 3-month Treasury

Bill rate, r t, for Canada for the period 1967:1 to 1998:4 A conventional demand for money function is

m t = β1+ β2r t + β3y t + β4m t−1 + u t (7.97)

Estimate this model over the period 1968:1 to 1998:4, and then test it for AR(1) errors using two different GNRs that differ in their treatment of the first observation.

7.15 Use nonlinear least squares to estimate, over the period 1968:1 to 1998:4,

the model that results if u t in (7.97) follows an AR(1) process Then test the common factor restrictions that are implicit in this model Calculate an

asymptotic P value for the test.

7.16 Test the common factor restrictions of Exercise 7.15 again using a GNR.

Calculate both an asymptotic P value and a bootstrap P value based on at least B = 99 bootstrap samples Hint: To obtain a consistent estimate of ρ for the GNR, use the fact that the coefficient of r t−1in the unrestricted model

(7.73) is equal to −ρ times the coefficient of r t.

7.17 Use nonlinear least squares to estimate, over the period 1968:1 to 1998:4,

the model that results if u t in (7.97) follows an AR(2) process Is there any evidence that an AR(2) process is needed here?

Trang 29

next step in this procedure? Complete the description of iterated Orcutt as iterated feasible GLS, showing how each step of the procedure can

Cochrane-be carried out using an OLS regression.

Show that, when the algorithm converges, conditions (7.68) for NLS mation are satisfied Also show that, unlike iterated feasible GLS including

esti-observation 1, this algorithm must eventually converge, although perhaps only

to a local, rather than the global, minimum of SSR(β, ρ).

7.19 Consider once more the model that you estimated in Exercise 7.15 Estimate this model using the iterated Cochrane-Orcutt algorithm, using a sequence of OLS regressions, and see how many iterations are needed to achieve the same estimates as those achieved by NLS Compare this number with the number

of iterations used by NLS itself.

Repeat the exercise with a starting value of 0.5 for ρ instead of the value of 0

that is conventionally used.

7.20 Test the hypothesis that the error terms of the linear regression model (7.97) are serially uncorrelated against the alternatives that they follow the simple

AR(4) process u t = ρ4u t−1 +ε t and that they follow a general AR(4) process Test the hypothesis that the error terms of the nonlinear regression model you estimated in Exercise 7.15 are serially uncorrelated against the same two alternative hypotheses Use Gauss-Newton regressions.

7.21 Consider the linear regression model

y = X0β0+ X1β1+ X2β2+ u, u ∼ IID(0, σ2I), (7.98)

where there are n observations, and k0, k1, and k2 denote the numbers of

parameters in β0, β1, and β2, respectively Let H0 denote the hypothesis

that β1 = 0 and β2 = 0, H1 denote the hypothesis that β2 = 0, and H2

denote the model (7.98) with no restrictions.

Show that the F statistics for testing H0against H1and for testing H1against

H2 are asymptotically independent of each other.

7.22 This question uses data on daily returns for the period 1989–1998 for shares

of Mobil Corporation from the file daily-crsp.data These data are made available by courtesy of the Center for Research in Security Prices (CRSP); see the comments at the bottom of the file Regress these returns on a constant and themselves lagged once, twice, three, and four times, dropping the first four observations Then test the null hypothesis that all coefficients except the constant term are equal to zero, as they should be if market prices fully reflect all available information Perform a heteroskedasticity-robust test by

running two HRGNRs, and report P values for both tests.

Trang 30

7.23 Consider the fixed-effects model (7.84) Show that, under mild regularity ditions, which you should specify, the OLS estimator ˆβFEtends in probability

con-to the true parameter veccon-tor β0 as m, the number of cross-sectional units, tends to infinity, while T , the number of time periods, remains fixed.

7.24 Suppose that

where there are n = mT observations, y is an n vector with typical element

y it , X is an n × k matrix with typical row X it , ε is an n vector with typical element ε it , and v is an n vector with v i repeated in the positions that

correspond to y i1 through y iT Let the v i have variance σ2 and the ε it have

variance σ ε2 Given these assumptions, show that the variance of the error

terms in regression (7.88) is σ v2+ σ ε2/T.

7.25 Show that, for Σ defined in (7.87),

Σ −1/2= 1

σ ε(IT − λP ι ), where P ι ≡ ι(ι > ι) −1 ι > = (1/T )ιι >, and

Then use this result to show that the GLS estimates of β may be obtained

by running regression (7.91) What is the covariance matrix of the GLS estimator?

7.26 Suppose that, in the error-components model (7.99), none of the columns of X

displays any within-group variation Recall that, for this model, the data are

balanced, with m groups and T observations per group Show that the OLS

and GLS estimators are identical in this special case Then write down the true covariance matrix of both these estimators How is this covariance matrix related to the usual one for OLS that would be computed by a regression package under classical assumptions? What happens to this relationship as

T and ρ, the correlation of the error terms within groups, change?

Trang 31

Chapter 8

Instrumental Variables

Estimation

8.1 Introduction

In Section 3.3, the ordinary least squares estimator ˆβ was shown to be

consis-tent under condition (3.10), according to which the expectation of the error

term u t associated with observation t is zero conditional on the regressors X t

for that same observation As we saw in Section 4.5, this condition can also

be expressed either by saying that the regressors X t are predetermined or by

saying that the error terms u t are innovations When condition (3.10) doesnot hold, the consistency proof of Section 3.3 is not applicable, and the OLSestimator will, in general, be biased and inconsistent

It is not always reasonable to assume that the error terms are innovations

In fact, as we will see in the next section, there are commonly encounteredsituations in which the error terms are necessarily correlated with some of theregressors for the same observation Even in these circumstances, however, it

is usually possible, although not always easy, to define an information set Ωtfor each observation such that

Trang 32

with briefly A more general class of MM estimators, of which both OLS and

IV are special cases, will be the subject of Chapter 9

8.2 Correlation Between Error Terms and Regressors

We now briefly discuss two common situations in which the error terms will

be correlated with the regressors and will therefore not have mean zero ditional on them The first one, usually referred to by the name errors invariables, occurs whenever the independent variables in a regression modelare measured with error The second situation, often simply referred to assimultaneity, occurs whenever two or more endogenous variables are jointlydetermined by a system of simultaneous equations

con-Errors in Variables

For a variety of reasons, many economic variables are measured with error Forexample, macroeconomic time series are often based, in large part, on surveys,and they must therefore suffer from sampling variability Whenever thereare measurement errors, the values economists observe inevitably differ, to agreater or lesser extent, from the true values that economic agents presumablyact upon As we will see, measurement errors in the dependent variable of aregression model are generally of no great consequence, unless they are verylarge However, measurement errors in the independent variables cause theerror terms to be correlated with the regressors that are measured with error,and this causes OLS to be inconsistent

The problems caused by errors in variables can be seen quite clearly in thecontext of the simple linear regression model Consider the model

y ◦ t = β1+ β2x ◦ t + u ◦ t , u ◦ t ∼ IID(0, σ2), (8.02) where the variables x ◦

Here v 1t and v 2t are measurement errors which are assumed, perhaps not

realistically in some cases, to be IID with variances ω2

If we suppose that the true DGP is a special case of (8.02) along with (8.03),

we see from (8.03) that x ◦

t = x t − v 1t and y ◦

t = y t − v 2t If we substitute theseinto (8.02), we find that

y t = β1+ β2(x t − v 1t ) + u ◦

t + v 2t

= β1+ β2x t + u ◦ t + v 2t − β2v 1t

Trang 33

8.2 Correlation Between Error Terms and Regressors 311

The measurement error in the independent variable also increases the variance

of the error terms, but it has another, much more severe, consequence as well

Because x t = x ◦

t + v 1t , and u t depends on v 1t , u t will be correlated with x t whenever β2 6= 0 In fact, since the random part of x t is v 1t, we see that

E(u t | x t ) = E(u t | v 1t ) = −β2v 1t , (8.05) because we assume that v 1t is independent of u ◦

t and v 2t From (8.05), we can

see, using the fact that E(u t) = 0 unconditionally, that

Cov(x t , u t ) = E(x t u t) = E¡x t E(u t | x t)¢

= −E¡(x ◦ t + v 1t )β2v 1t¢= −β2ω21 This covariance is negative if β2> 0 and positive if β2 < 0, and, since it does not depend on the sample size n, it will not go away as n becomes large An exactly similar argument shows that the assumption that E(u t | X t) = 0 is

false whenever any element of X t is measured with error In consequence, theOLS estimator will be biased and inconsistent

Errors in variables are a potential problem whenever we try to estimate aconsumption function, especially if we are using cross-section data Manyeconomic theories (for example, Friedman, 1957) suggest that household con-sumption will depend on “permanent” income or “life-cycle” income, but sur-veys of household behavior almost never measure this Instead, they typically

provide somewhat inaccurate estimates of current income If we think of y t as

measured consumption, x ◦

t as permanent income, and x t as estimated currentincome, then the above analysis applies directly to the consumption function

The marginal propensity to consume is β2, which must be positive, causing

the correlation between u t and x t to be negative As readers are asked to show

in Exercise 8.1, the probability limit of ˆβ2 is less than the true value β20 Inconsequence, the OLS estimator ˆβ2 is biased downward, even asymptotically

Of course, if our objective is simply to estimate the relationship between the

observed dependent variable y t and the observed independent variable x t,there is nothing wrong with using ordinary least squares to estimate equation

(8.04) In that case, u t would simply be defined as the difference between

y t and its expectation conditional on x t But our analysis shows that the

OLS estimators of β1 and β2 in equation (8.04) are not consistent for thecorresponding parameters of equation (8.02) In most cases, it is parameterslike these that we want to estimate on the basis of economic theory

There is an extensive literature on ways to avoid the inconsistency caused byerrors in variables See, among many others, Hausman and Watson (1985),

Trang 34

Leamer (1987), and Dagenais and Dagenais (1997) The simplest and mostwidely-used approach is just to use an instrumental variables estimator.Simultaneous Equations

Economic theory often suggests that two or more endogenous variables aredetermined simultaneously In this situation, as we will see shortly, all of theendogenous variables will necessarily be correlated with the error terms in all

of the equations This means that none of them may validly appear in theregression functions of models that are to be estimated by least squares

A classic example, which well illustrates the econometric problems caused bysimultaneity, is the determination of price and quantity for a commodity at

the partial equilibrium of a competitive market Suppose that q t is quantity

and p t is price, both of which would often be in logarithms A linear (orloglinear) model of demand and supply is

functions, β d and β s are corresponding vectors of parameters, γ d and γ s are

scalar parameters, and u d

t and u s

t are the error terms in the demand and

supply functions Economic theory predicts that, in most cases, γ d < 0 and

γ s > 0, which is equivalent to saying that the demand curve slopes downward

and the supply curve slopes upward

Equations (8.06) and (8.07) are a pair of linear simultaneous equations for

the two unknowns p t and q t For that reason, these equations constitute what

is called a linear simultaneous equations model In this case, there are twodependent variables, quantity and price For estimation purposes, the keyfeature of the model is that quantity depends on price in both equations.Since there are two equations and two unknowns, it is straightforward to solve

equations (8.06) and (8.07) for p t and q t This is most easily done by rewritingthem in matrix notation as

·

u d t

u s t

¸

The solution to (8.08), which will exist whenever γ d 6= γ s, so that the matrix

on the left-hand side of (8.08) is nonsingular, is

·

u d t

u s t

¸!

Định dạng
Số trang	69
Dung lượng	1,14 MB