Bootstrap Tests for Serial Correlation Whenever the regression function is nonlinear or contains lagged dependentvariables, or whenever the distribution of the error terms is unknown, no
Trang 17.7 Testing for Serial Correlation 279
the alternative that ρ > 0 An investigator will reject the null hypothesis if
d < d L , fail to reject if d > d U , and come to no conclusion if d L < d < d U
For example, for a test at the 05 level when n = 100 and k = 8, including the constant term, the bounding critical values are d L = 1.528 and d U = 1.826 Therefore, one would reject the null hypothesis if d < 1.528 and not reject it
if d > 1.826 Notice that, even for this not particularly small sample size, the
indeterminate region between 1.528 and 1.826 is quite large
It should by now be evident that the Durbin-Watson statistic, despite itspopularity, is not very satisfactory Using it with standard tables is relativelycumbersome and often yields inconclusive results Moreover, the standardtables only allow us to perform one-tailed tests against the alternative that
ρ > 0 Since the alternative that ρ < 0 is often of interest as well, the inability
to perform a two-tailed test, or a one-tailed test against this alternative, using
standard tables is a serious limitation Although exact P values for both tailed and two-tailed tests, which depend on the X matrix, can be obtained
one-by using appropriate software, many computer programs do not offer thiscapability In addition, the DW statistic is not valid when the regressorsinclude lagged dependent variables, and it cannot easily be generalized to testfor higher-order processes Happily, the development of simulation-based testshas made the DW statistic obsolete
Monte Carlo Tests for Serial Correlation
We discussed simulation-based tests, including Monte Carlo tests and strap tests, at some length in Section 4.6 The techniques discussed there canreadily be applied to the problem of testing for serial correlation in linear andnonlinear regression models
boot-All the test statistics we have discussed, namely, tGNR, tSR, and d, are pivotal under the null hypothesis that ρ = 0 when the assumptions of the classical
normal linear model are satisfied This makes it possible to perform MonteCarlo tests that are exact in finite samples Pivotalness follows from twoproperties shared by all these statistics The first of these is that they dependonly on the residuals ˜u t obtained by estimation under the null hypothesis.The distribution of the residuals depends on the exogenous explanatory vari-
ables X, but these are given and the same for all DGPs in a classical normal linear model The distribution does not depend on the parameter vector β of the regression function, because, if y = Xβ + u, then M X y = M X u what- ever the value of the vector β.
The second property that all the statistics we have considered share is scaleinvariance By this, we mean that multiplying the dependent variable by
an arbitrary scalar λ leaves the statistic unchanged In a linear regression model, multiplying the dependent variable by λ causes the residuals to be multiplied by λ But the statistics defined in (7.51), (7.52), and (7.53) are
clearly unchanged if all the residuals are multiplied by the same constant, and
so these statistics are scale invariant Since the residuals ˜u are equal to M X u,
Trang 2it follows that multiplying σ by an arbitrary λ multiplies the residuals by λ Consequently, the distributions of the statistics are independent of σ2 as well
as of β This implies that, for the classical normal linear model, all three
statistics are pivotal
We now outline how to perform Monte Carlo tests for serial correlation in thecontext of the classical normal linear model Let us call the test statistic we
are using τ and its realized value ˆ τ If we want to test for AR(1) errors, the best choice for the statistic τ is the t statistic tGNR from the GNR (7.43), but
it could also be the DW statistic, the t statistic tSRfrom the simple regression(7.46), or even ˜ρ itself If we want to test for AR(p) errors, the best choice for τ would be the F statistic from the GNR (7.45), but it could also be the
F statistic from a regression of ˜ u t on ˜u t−1 through ˜u t−p
The first step, evidently, is to compute ˆτ The next step is to generate B sets
of simulated residuals and use each of them to compute a simulated test
statistic, say τ ∗
j , for j = 1, , B Because the parameters do not matter,
we can simply draw B vectors u ∗
j from the N (0, I) distribution and regress each of them on X to generate the simulated residuals M X u ∗
j, which are then
used to compute τ ∗
j This can be done very inexpensively The final step is to
calculate an estimated P value for whatever null hypothesis is of interest For example, for a two-tailed test of the null hypothesis that ρ = 0, the P value would be the proportion of the τ ∗
j that exceed ˆτ in absolute value:
We would then reject the null hypothesis at level α if ˆ p ∗(ˆτ ) < α As we saw
in Section 4.6, such a test will be exact whenever B is chosen so that α(B + 1)
is an integer
Bootstrap Tests for Serial Correlation
Whenever the regression function is nonlinear or contains lagged dependentvariables, or whenever the distribution of the error terms is unknown, none ofthe standard test statistics for serial correlation will be pivotal Nevertheless,
it is still possible to obtain very accurate inferences, even in quite small ples, by using bootstrap tests The procedure is essentially the one described
sam-in the previous subsection We still generate B simulated test statistics and use them to compute a P value according to (7.54) or its analog for a one-
tailed test For best results, the test statistic used should be asymptotically
valid for the model that is being tested In particular, we should avoid d and
tSR whenever there are lagged dependent variables
It is extremely important to generate the bootstrap samples in such a way thatthey are compatible with the model under test Ways of generating bootstrapsamples for regression models were discussed in Section 4.6 If the model
Trang 37.7 Testing for Serial Correlation 281
is nonlinear or includes lagged dependent variables, we need to generate y ∗
j
rather than just u ∗
j For this, we need estimates of the parameters of theregression function If the model includes lagged dependent variables, wemust generate the bootstrap samples recursively, as in (4.66) Unless we aregoing to assume that the error terms are normally distributed, we shoulddraw the bootstrap error terms from the EDF of the residuals for the modelunder test, after they have been appropriately rescaled Recall that there ismore than one way to do this The simplest approach is just to multiply each
Heteroskedasticity-Robust Tests
The tests for serial correlation that we have discussed are based on the tion that the error terms are homoskedastic When this crucial assumption isviolated, the asymptotic distributions of all the test statistics will differ fromwhatever distributions they are supposed to follow asymptotically However,
assump-as we saw in Section 6.8, it is not difficult to modify GNR-bassump-ased tests to makethem robust to heteroskedasticity of unknown form
Suppose we wish to test the linear regression model (7.42), in which the errorterms are serially uncorrelated, against the alternative that the error terms
follow an AR(p) process Under the assumption of homoskedasticity, we could simply run the GNR (7.45) and use an asymptotic F test If we let Z denote
an n × p matrix with typical element Z ti = ˜u t−i, where any missing laggedresiduals are replaced by zeros, this GNR can be written as
˜
u = Xb + Zc + residuals (7.55)
The ordinary F test for c = 0 in (7.55) is not robust to heteroskedasticity, but
a heteroskedasticity-robust test can easily be computed using the proceduredescribed in Section 6.8 This procedure works as follows:
1 Create the matrices ˜UX and ˜ UZ by multiplying the tthrow of X and the tthrow of Z by ˜ u t for all t.
2 Create the matrices ˜U −1 X and ˜ U −1 Z by dividing the tthrow of X and the tthrow of Z by ˜ u t for all t.
3 Regress each of the columns of ˜U −1 X and ˜ U −1 Z on ˜ UX and ˜ UZ jointly.
Save the resulting matrices of fitted values and call them ¯X and ¯ Z,
respectively
Trang 44 Regress ι, a vector of 1s, on ¯ X Retain the sum of squared residuals from this regression, and call it RSSR Then regress ι on ¯ X and ¯ Z jointly,
retain the sum of squared residuals, and call it USSR
5 Compute the test statistic RSSR − USSR, which will be asymptotically distributed as χ2(p) under the null hypothesis.
Although this heteroskedasticity-robust test is asymptotically valid, it willnot be exact in finite samples In principle, it should be possible to obtain
more reliable results by using bootstrap P values instead of asymptotic ones.
However, none of the methods of generating bootstrap samples for regressionmodels that we have discussed so far (see Section 4.6) is appropriate for amodel with heteroskedastic error terms Several methods exist, but they arebeyond the scope of this book, and there currently exists no method that wecan recommend with complete confidence; see Davison and Hinkley (1997)and Horowitz (2001)
Other Tests Based on OLS Residuals
The tests for serial correlation that we have discussed in this section are by
no means the only scale-invariant tests based on least squares residuals thatare regularly encountered in econometrics Many tests for heteroskedasticity,skewness, kurtosis, and other deviations from the NID assumption also havethese properties For example, consider tests for heteroskedasticity based
on regression (7.28) Nothing in that regression depends on y except for the
squared residuals that constitute the regressand Further, it is clear that both
the F statistic for the hypothesis that b γ = 0 and n times the centered R2 are
scale invariant Therefore, for a classical normal linear model with X and Z
fixed, these statistics are pivotal Consequently, Monte Carlo tests based on
them, in which we draw the error terms from the N (0, 1) distribution, are
exact in finite samples
When the normality assumption is not appropriate, we have two options Ifsome other distribution that is known up to a scale parameter is thought to be
appropriate, we can draw the error terms from it instead of from the N (0, 1)
distribution If the assumed distribution really is the true one, we obtain
an exact test Alternatively, we can perform a bootstrap test in which theerror terms are obtained by resampling the rescaled residuals This is alsoappropriate when there are lagged dependent variables among the regressors.The bootstrap test will not be exact, but it should still perform well in finitesamples no matter how the error terms actually happen to be distributed
7.8 Estimating Models with Autoregressive Errors
If we decide that the error terms of a regression model are serially correlated,either on the basis of theoretical considerations or as a result of specification
Trang 57.8 Estimating Models with Autoregressive Errors 283testing, and we are confident that the regression function itself is not misspec-ified, the next step is to estimate a modified model which takes account ofthe serial correlation The simplest such model is (7.40), which is the originalregression model modified by having the error terms follow an AR(1) process.For ease of reference, we rewrite (7.40) here:
y t = X t β + u t , u t = ρu t−1 + ε t , ε t ∼ IID(0, σ ε2) (7.56)
In many cases, as we will discuss in the next section, the best approach mayactually be to specify a more complicated, dynamic, model for which theerror terms are not serially correlated In this section, however, we ignore thisimportant issue and simply discuss how to estimate the model (7.56) undervarious assumptions
Estimation by Feasible GLS
We have seen that, if the u t follow a stationary AR(1) process, that is, if
|ρ| < 1 and Var(u1) = σ2
u = σ2
ε /(1 − ρ2), then the covariance matrix of
the entire vector u is the n × n matrix Ω(ρ) given in (7.32) In order to compute GLS estimates, we need to find a matrix Ψ with the property that
Ψ Ψ > = Ω −1 This property will be satisfied whenever the covariance matrix
of Ψ > u is proportional to the identity matrix, which it will be if we choose Ψ
in such a way that Ψ > u = ε.
For t = 2, , n, we know from (7.29) that
and this allows us to construct the rows of Ψ >except for the first row The
tthrow must have 1 in the tthposition, −ρ in the (t − 1)st position, and 0severywhere else
For the first row of Ψ >, however, we need to be a little more careful Under
the hypothesis of stationarity of u, the variance of u1 is σ2
u Further, since
the ε t are innovations, u1 is uncorrelated with the ε t for t = 2, , n Thus,
if we define ε1 by the formula
ε1 = (σ ε /σ u )u1 = (1 − ρ2)1/2 u1, (7.58)
it can be seen that the n vector ε, with the first component ε1 defined
by (7.58) and the remaining components ε t defined by (7.57), has a
covar-iance matrix equal to σ2
εI
Putting together (7.57) and (7.58), we conclude that Ψ >should be defined
as an n × n matrix with all diagonal elements equal to 1 except for the first, which is equal to (1 − ρ2)1/2, and all other elements equal to 0 except for
Trang 6the ones on the diagonal immediately below the principal diagonal, which are
equal to −ρ In terms of Ψ rather than of Ψ >, we have:
the transformation for the first observation would involve taking the squareroot of a negative number Unfortunately, the estimator ˜ρ is not guaranteed
to satisfy the stationarity condition, although, in practice, it is very likely to
do so when the model is correctly specified, even if the true value of ρ is quite
large in absolute value
Whether ρ is known or estimated, the next step in GLS estimation is to form the vector Ψ > y and the matrix Ψ > X It is easy to do this without having to store the n × n matrix Ψ in computer memory The first element of Ψ > y is (1 − ρ2)1/2 y1, and the remaining elements have the form y t − ρy t−1 Each
column of Ψ > X has precisely the same form as Ψ > y and can be calculated in
precisely the same way
The final step is to run an OLS regression of Ψ > y on Ψ > X This regression
yields the (feasible) GLS estimates
ˆ
βGLS= (X > Ψ Ψ > X) −1 X > Ψ Ψ > y (7.60)
along with the estimated covariance matrix
dVar( ˆβGLS) = s2(X > Ψ Ψ > X) −1 , (7.61)
where s2 is the usual OLS estimate of the variance of the error terms Ofcourse, the estimator (7.60) is formally identical to (7.04), since (7.60) is valid
for any Ψ matrix.
Trang 77.8 Estimating Models with Autoregressive Errors 285Estimation by Nonlinear Least Squares
If we ignore the first observation, then (7.56), the linear regression modelwith AR(1) errors, can be written as the nonlinear regression model (7.41).Since the model (7.41) is written in such a way that the error terms are inno-vations, NLS estimation is consistent whether the explanatory variables areexogenous or merely predetermined NLS estimates can be obtained by anystandard nonlinear minimization algorithm of the type that was discussed
in Section 6.4, where the function to be minimized is SSR(β, ρ), the sum of squared residuals for observations 2 through n Such procedures generally
work well, and they can also be used for models with higher-order sive errors; see Exercise 7.17 However, some care must be taken to ensurethat the algorithm does not terminate at a local minimum which is not alsothe global minimum There is a serious risk of this, especially for models withlagged dependent variables among the regressors.2
autoregres-Whether or not there are lagged dependent variables in X t, a valid estimatedcovariance matrix can always be obtained by running the GNR (6.67), whichcorresponds to the model (7.41), with all variables evaluated at the NLSestimates ˆβ and ˆ ρ This GNR is
u1> (X − ˆ ρX1) uˆ1> uˆ1
#−1
where the n×k matrix X1has typical row X t−1, and the vector ˆu1has typical
element y t−1 − X t−1 β This is the estimated covariance matrix that a goodˆ
nonlinear regression package should print The first factor in (7.63) is just
the NLS estimate of σ2
ε The SSR is divided by n − k − 2 because there are
k + 1 parameters in the regression function, one of which is ρ, and we estimate using only n − 1 observations.
It is instructive to compute the limit in probability of the matrix (7.63) when
n → ∞ for the case in which all the explanatory variables in X t are exogenous.The parameters are all estimated consistently by NLS, and so the estimates
converge to the true parameter values β0, ρ0, and σ2
ε as n → ∞ In computing
the limit of the denominator of the simple estimator ˜ρ given by (7.47), we saw that n −1 uˆ1> uˆ1 tends to σ2
ε /(1 − ρ2
0) The limit of n −1 (X − ˆ ρX1)> uˆ1 is the
2 See Dufour, Gaudry, and Liem (1980) and Betancourt and Kelejian (1981).
Trang 8same as that of n −1 (X − ρ0X1)> uˆ1 by the consistency of ˆρ In addition, given the exogeneity of X, and thus also of X1, it follows at once from the law of
large numbers that n −1 (X − ρ0X1)> uˆ1 tends to zero Thus, in this special
case, the asymptotic covariance matrix of n 1/2( ˆβ − β0) and n 1/2(ˆρ − ρ0) is
exo-iance matrix will just be (7.63) without its last row and column It is easy to
see that n times this matrix tends to the top left block of (7.65) as n → ∞.
The lower right-hand element of the matrix (7.65) tells us that, when all the
regressors are exogenous, the asymptotic variance of n 1/2(ˆρ − ρ0) is 1 − ρ2
0
A sensible estimate of the variance is therefore dVar(ˆρ) = n −1 (1 − ˆ ρ2) It mayseem surprising that the variance of ˆρ does not depend on σ2
ε However, we sawearlier that, with exogenous regressors, the consistent estimator ˜ρ of (7.47) is
scale invariant The same is true, asymptotically, of the NLS estimator ˆρ, and
so its asymptotic variance is independent of σ2
ε.Comparison of GLS and NLS
The most obvious difference between estimation by GLS and estimation byNLS is the treatment of the first observation: GLS takes it into account, andNLS does not This difference reflects the fact that the two procedures areestimating slightly different models With NLS, all that is required is the
stationarity condition that |ρ| < 1 With GLS, on the other hand, the error
process must actually be stationary Recall that the stationarity condition isnecessary but not sufficient for stationarity of the process A sufficient con-
dition requires, in addition, that Var(u1) = σ2
The second major difference between estimation by GLS and estimation by
NLS is that the former method estimates β conditional on ρ, while the latter
Trang 97.8 Estimating Models with Autoregressive Errors 287
method estimates β and ρ jointly Except in the unlikely case in which the value of ρ is known, the first step in GLS is to estimate ρ consistently If the explanatory variables in the matrix X are all exogenous, there are several procedures that will deliver a consistent estimate of ρ The weak point is
that the estimate is not unique, and in general it is not optimal One possiblesolution to this difficulty is to iterate the feasible GLS procedure, as suggested
at the end of Section 7.4, and we will consider this solution below
A more fundamental weakness of GLS arises whenever one or more of theexplanatory variables are lagged dependent variables, or, more generally, pre-determined but not exogenous variables Even with a consistent estimator
of ρ, one of the conditions for the applicability of feasible GLS, condition (7.23), does not hold when any elements of X t are not exogenous It is notsimple to see directly just why this is so, but, in the next paragraph, we willobtain indirect evidence by showing that feasible GLS gives an invalid estima-tor of the covariance matrix Fortunately, there is not much temptation to useGLS if the non-exogenous explanatory variables are lagged variables, becauselagged variables are not observed for the first observation In all events, theconclusion is simple: We should avoid GLS if the explanatory variables arenot all exogenous
The GLS covariance matrix estimator is (7.61), which is obtained by regressing
Ψ >(ˆρ)y on Ψ >(ˆρ)X for some consistent estimate ˆ ρ Since Ψ > (ρ)u = ε by construction, s2 is an estimator of σ2
ε Moreover, the first observation has no
impact asymptotically Therefore, the limit as n → ∞ of n times (7.61) is the
In contrast, the NLS covariance matrix estimator is (7.63) With exogenous
regressors, n times (7.63) tends to the same limit as (7.65), of which the top
left block is just (7.66) But when the regressors are not all exogenous, the
argument that the off-diagonal blocks of n times (7.63) tend to zero no longer
works, and, in fact, the limits of these blocks are in general nonzero When amatrix that is not block-diagonal is inverted, the top left block of the inverse
is not the same as the inverse of the top left block of the original matrix;see Exercise 7.11 In fact, as readers are asked to show in Exercise 7.12, thetop left block of the inverse is greater by a positive semidefinite matrix thanthe inverse of the top left block Consequently, the GLS covariance matrixestimator underestimates the true covariance matrix asymptotically
NLS has only one major weak point, which is that it does not take account ofthe first observation Of course, this is really an advantage if the error processsatisfies the stationarity condition without actually being stationary, or ifsome of the explanatory variables are not exogenous But with a stationaryerror process and exogenous regressors, we wish to retain the information inthe first observation, because it appears that retaining the first observationcan sometimes lead to a noticeable efficiency gain in finite samples The
Trang 10reason is that the transformation for observation 1 is quite different from thetransformation for all the other observations In consequence, the transformedfirst observation may well be a high leverage point; see Section 2.6 This
is particularly likely to happen if one or more of the regressors is stronglytrending If so, dropping the first observation can mean throwing away a lot
of information See Davidson and MacKinnon (1993, Section 10.6) for a muchfuller discussion and references
Efficient Estimation by GLS or NLS
When the error process is stationary and all the regressors are exogenous, it
is possible to obtain an estimator with the best features of GLS and NLS bymodifying NLS so that it makes use of the information in the first observationand therefore yields an efficient estimator The first-order conditions (7.07)for GLS estimation of the model (7.56) can be written as
any value instead of just ˜β.
In Section 7.4, we mentioned the possibility of using an iterated feasible GLSprocedure We can now see precisely how such a procedure would work forthis model In the first step, we obtain the OLS parameter vector ˜β In the
Trang 117.8 Estimating Models with Autoregressive Errors 289
second step, the formula (7.69) is evaluated at β = ˜ β to obtain ˜ ρ, a consistent estimate of ρ In the third step, we use (7.60) to obtain the feasible GLS
estimate ˆβF, thus solving the first-order conditions (7.67) At this point, we
go back to the second step and insert ˆβF into (7.69) for an updated estimate
of ρ, which we subsequently use in (7.60) for the next estimate of β The
iterative procedure may then be continued until convergence, assuming that
it does converge If so, then the final estimates, which we will call ˆβ and ˆ ρ,
must satisfy the two equations
feasible GLS, without the first observation, is identical to NLS If the first
observation is retained, then iterated feasible GLS improves on NLS by takingaccount of the first observation
We can also modify NLS to take account of the first observation To do this,
we extend the GNR (6.67), which is given by (7.62) when evaluated at ˆβ
and ˆρ, by giving it a first observation For this observation, the regressand
is (1 − ρ2)1/2 (y1− X1β), the regressors corresponding to β are given by the row vector (1 − ρ2)1/2 X1, and the regressor corresponding to ρ is zero The
conditions that the extended regressand should be orthogonal to the extendedregressors are exactly the conditions (7.70)
Two asymptotically equivalent procedures can be based on this extended
GNR Both begin by obtaining the NLS estimates of β and ρ without the
first observation and evaluating the extended GNR at those preliminary NLSestimates The OLS estimates from the extended GNR can be thought of as
a vector of corrections to the initial estimates For the first procedure, thefinal estimator is a one-step estimator, defined as in (6.59) by adding the cor-rections to the preliminary estimates For the second procedure, this process
is iterated The variables of the extended GNR are evaluated at the one-stepestimates, another set of corrections is obtained, these are added to the pre-vious estimates, and iteration continues until the corrections are negligible Ifthis happens, the iterated estimates once more satisfy the conditions (7.70),and so they are equal to the iterated GLS estimates
Although the iterated feasible GLS estimator generally performs well, it does
have one weakness: There is no way to ensure that |ˆ ρ| < 1 In the unlikely but not impossible event that |ˆ ρ| ≥ 1, the estimated covariance matrix (7.61)
will not be valid, the second term in (7.67) will be negative, and the first
observation will therefore tend to have a perverse effect on the estimates of β.
Trang 12In Chapter 10, we will see that maximum likelihood estimation shares thegood properties of iterated feasible GLS while also ensuring that the estimate
of ρ satisfies the stationarity condition.
The iterated feasible GLS procedure considered above has much in commonwith a very old, but still widely-used, algorithm for estimating models withstationary AR(1) errors This algorithm, which is called iterated Cochrane-Orcutt, was originally proposed in a classic paper by Cochrane and Orcutt(1949) It works in exactly the same way as iterated feasible GLS, except that
it omits the first observation The properties of this algorithm are explored
in Exercises 7.18-19
7.9 Specification Testing and Serial Correlation
Models estimated using time-series data frequently appear to have error termswhich are serially correlated However, as we will see, many types of misspec-
ification can create the appearance of serial correlation Therefore, finding
evidence of serial correlation does not mean that it is necessarily appropriate
to model the error terms as following some sort of autoregressive or movingaverage process If the regression function of the original model is misspecified
in any way, then a model like (7.41), which has been modified to incorporateAR(1) errors, will probably also be misspecified It is therefore extremelyimportant to test the specification of any regression model that has been
“corrected” for serial correlation
The Appearance of Serial Correlation
There are several types of misspecification of the regression function that canincorrectly create the appearance of serial correlation For instance, it may bethat the true regression function is nonlinear in one or more of the regressorswhile the estimated one is linear In that case, depending on how the dataare ordered, the residuals from a linear regression model may well appear to
be serially correlated All that is needed is for the independent variables onwhich the dependent variable depends nonlinearly to be correlated with time
As a concrete example, consider Figure 7.1, which shows 200 hypothetical
observations on a regressor x and a regressand y, together with an OLS
re-gression line and the fitted values from the true, nonlinear model For thelinear model, the residuals are always negative for the smallest and largest
values of x, and they tend to be positive for the intermediate values As a
consequence, they appear to be serially correlated: If the observations are
ordered according to the value of x, the estimate ˜ ρ obtained by regressing the OLS residuals on themselves lagged once is 0.298, and the t statistic for ρ = 0
is 4.462 Thus, if the data are ordered in this way, there appears to be strong
evidence of serial correlation But this evidence is misleading Either plotting
the residuals against x or including x2 as an additional regressor will quicklyreveal the true nature of the misspecification
Trang 137.9 Specification Testing and Serial Correlation 291
.
.
.
.
.
.
. .
.
. .
.
.
.
. .
.
.
.
.
.
.
.
.
. .
. .
.
.
.
.
Regression line for linear model
Fitted values for true model
x y
Figure 7.1 The appearance of serial correlation
The true regression function in this example contains a term in x2 Since the linear model omits this term, it is underspecified, in the sense discussed
in Section 3.7 Any sort of underspecification has the potential to create the appearance of serial correlation if the incorrectly omitted variables are themselves serially correlated Therefore, whenever we find evidence of serial correlation, our first reaction should be to think carefully about the specifica-tion of the regression funcspecifica-tion Perhaps one or more addispecifica-tional independent variables should be included among the regressors Perhaps powers, cross-products, or lags of some of the existing independent variables need to be included Or perhaps the regression function should be made dynamic by including one or more lags of the dependent variable
Common Factor Restrictions
It is very common for linear regression models to suffer from dynamic mis-specification The simplest example is failing to include a lagged dependent variable among the regressors More generally, dynamic misspecification oc-curs whenever the regression function incorrectly omits lags of the dependent variable or of one or more independent variables A somewhat mechanical, but often very effective, way to detect dynamic misspecification in models with autoregressive errors is to test the common factor restrictions that are implicit in such models The idea of testing these restrictions was initially pro-posed by Sargan (1964) and further developed by Hendry and Mizon (1978), Mizon and Hendry (1980), Sargan (1980), and others See Hendry (1995) for
a detailed treatment of dynamic specification in linear regression models
Trang 14The easiest way to understand what common factor restrictions are and howthey got their name is to consider a linear regression model with errors thatapparently follow an AR(1) process In this case, there are really three nestedmodels The first of these is the original linear regression model with errorterms that are assumed to be serially independent:
H0: y t = X t β + u t , u t ∼ IID(0, σ2) (7.71)
The second is the nonlinear model (7.41) that is obtained when the errorterms in (7.71) follow the AR(1) process (7.29) Although we have alreadydiscussed this model extensively, we rewrite it here for convenience:
H1: y t = ρy t−1 + X t β − ρX t−1 β + ε t , ε t ∼ IID(0, σ ε2) (7.72)
The third is the linear model that can be obtained by relaxing the nonlinearrestrictions which are implicit in (7.72) This model is
H2: y t = ρy t−1 + X t β + X t−1 γ + ε t , ε t ∼ IID(0, σ ε2), (7.73) where γ, like β, is a k vector When all three of these models are estimated over the same sample period, the original model, H0, is a special case of the
nonlinear model H1, which in turn is a special case of the unrestricted linear
model H2 Of course, in order to estimate H1 and H2, we need to drop thefirst observation
The nonlinear model H1 imposes on H2 the restrictions that γ = −ρβ The
reason for calling these restrictions “common factor” restrictions can easily beseen if we rewrite both models using lag operator notation (see Section 7.6)
When we do this, H1 becomes
(1 − ρL)y t = (1 − ρL)X t β + ε t , (7.74) and H2 becomes
(1 − ρL)y t = X t β + LX t γ + ε t (7.75)
It is evident that in (7.74), but not in (7.75), the common factor 1 − ρL
appears on both sides of the equation This is where the term “commonfactor restrictions” comes from
How Many Common Factor Restrictions Are There?
There is one feature of common factor restrictions that can be tricky: It isoften not obvious just how many restrictions there are For the case of testing
H1 against H2, there appear to be k restrictions The null hypothesis, H1,
has k + 1 parameters (the k vector β and the scalar ρ), and the alternative hypothesis, H2, seems to have 2k + 1 parameters (the k vectors β and γ, and the scalar ρ) Therefore, the number of restrictions appears to be the difference between 2k + 1 and k + 1, which is k In fact, however, the number
Trang 157.9 Specification Testing and Serial Correlation 293
of restrictions will almost always be less than k, because, except in rare cases, the number of identifiable parameters in H2 will be less than 2k + 1 We now
show why this is the case
Let us consider a simple example Suppose the regression function for the
original model H0 is
β1+ β2z t + β3t + β4z t−1 + β5y t−1 , (7.76) where z t is the tthobservation on some independent variable, and t is the tth
observation on a linear time trend The regression function for the unrestricted
model H2 that corresponds to (7.76) is
β1+ β2z t + β3t + β4z t−1 + β5y t−1 + ρy t−1 + γ1+ γ2z t−1 + γ3(t − 1) + γ4z t−2 + γ5y t−2 (7.77)
At first glance, this regression function appears to have 11 parameters ever, it really has only 7, because 4 of them are unidentifiable We cannot
How-estimate both β1 and γ1, because there cannot be two constant terms
Like-wise, we cannot estimate both β4 and γ2, because there cannot be two
coef-ficients of z t−1 , and we cannot estimate both β5 and ρ, because there cannot
be two coefficients of y t−1 We also cannot estimate γ3 along with β3 and
the constant, because t, t − 1, and the constant term are perfectly collinear, since t − (t − 1) = 1 The version of H2 that can actually be estimated hasregression function
δ1+ β2z t + δ2t + δ3z t−1 + δ4y t−1 + γ4z t−2 + γ5y t−2 , (7.78)
where
δ1= β1+ γ1− γ3, δ2 = β3+ γ3, δ3= β4+ γ2, and δ4= ρ + β5.
We see that (7.78) has only 7 identifiable parameters: β2, γ4, γ5, δ1, δ2,
δ3, and δ4, instead of the 11 parameters, many of them not identifiable, ofexpression (7.77) In contrast, the regression function for the restricted model,
H1, has 6 parameters: β1 through β5, and ρ Therefore, in this example, H1imposes just one restriction on H2
The phenomenon illustrated in this example arises, to a greater or lesserextent, for almost every model with common factor restrictions Constantterms, many types of dummy variables (notably, seasonal dummies and timetrends), lagged dependent variables, and independent variables that appear
with more than one time subscript always lead to an unrestricted model H2
with some parameters that cannot be identified The number of identifiable
parameters will almost always be less than 2k + 1, and, in consequence, the number of restrictions will almost always be less than k.
Trang 16Testing Common Factor Restrictions
Any of the techniques discussed in Sections 6.7 and 6.8 can be used to testcommon factor restrictions In practice, if the error terms are believed to be
homoskedastic, the easiest approach is probably to use an asymptotic F test.
For the example of equations (7.72) and (7.73), the restricted sum of squared
residuals, RSSR, is obtained from NLS estimation of H1, and the unrestricted
one, USSR, is obtained from OLS estimation of H2 Then the test statistic is
(RSSR − USSR)/r USSR/(n − k − r − 2)
a
∼ F (r, n − k − r − 2), (7.79)
where r is the number of restrictions The number of degrees of freedom in the denominator reflects the fact that the unrestricted model has k + r + 1 parameters and is estimated using the n − 1 observations for t = 2, , n.
Of course, since both the null and alternative models involve lagged dependent
variables, the test statistic (7.79) does not actually follow the F (r, n−k−r−2)
distribution in finite samples Therefore, when the sample size is not large,
it is a good idea to bootstrap the test As Davidson and MacKinnon (1999a)
have shown, highly reliable P values may be obtained in this way, even for
very small sample sizes The bootstrap samples are generated recursively from
the restricted model, H1, using the NLS estimates of that model As withbootstrap tests for serial correlation, the bootstrap error terms may either bedrawn from the normal distribution or obtained by resampling the rescaledNLS residuals; see the discussion in Sections 4.6 and 7.7
Although this bootstrap procedure is conceptually simple, it may be quiteexpensive to compute, because the nonlinear model (7.72) must be estimatedfor every bootstrap sample It may therefore be more attractive to follow theidea in Exercises 6.17 and 6.18 by bootstrapping a GNR-based test statistic
that requires no nonlinear estimation at all For the H1 model (7.72), thecorresponding GNR is (7.62), but now we wish to evaluate it, not at the NLSestimates from (7.72), but at the estimates ´β and ´ ρ obtained by estimating the linear H2 model (7.73) These estimates are root-n consistent under H2,
and so also under H1, which is contained in H2 as a special case Thus the
GNR for H1, which was introduced in Section 6.6, is
y t − ´ ρy t−1 − X t β + ´´ ρX t−1 β´
= (X t − ´ ρX t−1 )b + b ρ (y t−1 − X t−1 β) + residual.´ (7.80) Since H2 is a linear model, the regressors of the GNR that corresponds to itare just the regressors in (7.73), and the regressand is the same as in (7.80);
recall Section 6.5 However, in order to construct the GNR-based F statistic,
which has exactly the same form as (7.79), it is not necessary to run the
GNR for model H2 at all Since the regressand of (7.80) is just the dependentvariable of (7.73) plus a linear combination of the independent variables, the
Trang 177.9 Specification Testing and Serial Correlation 295residuals from (7.73) are the same as those from its GNR Consequently, wecan evaluate (7.79) with USSR from (7.73) and RSSR from (7.80).
In Section 6.6, we gave the impression that ´β and ´ ρ are simply the OLS timates of β and ρ from (7.73) When X contains neither lagged dependent
es-variables nor multiple lags of any independent variable, this is true ever, when these conditions are not satisfied, the parameters of (7.73) do notcorrespond directly to those of (7.72), and this makes it a little more compli-cated to obtain consistent estimates of these parameters Just how to do sowas discussed in Section 10.3 of Davidson and MacKinnon (1993) and will beillustrated in Exercise 7.16
How-Tests of Nested Hypotheses
The models H0, H1, and H2defined in (7.71) through (7.73) form a sequence
of nested hypotheses Such sequences occur quite frequently in many branches
of econometrics, and they have an interesting property Asymptotically, the F statistic for testing H0 against H1 is independent of the F statistic for testing
H1 against H2 This is true whether we actually estimate H1 or merely use
a GNR, and it is also true for other test statistics that are asymptotically
equivalent to F statistics In fact, the result is true for any sequence of nested hypotheses where the test statistics follow χ2distributions asymptotically; seeDavidson and MacKinnon (1993, Supplement) and Exercise 7.21
The independence property of tests in a nested sequence has a useful
impli-cation Suppose that τ ij denotes the statistic for testing H i , which has k i parameters, against H j , which has k j > k i parameters, where i = 0, 1 and
j = 1, 2, with j > i Then, if each of the test statistics is asymptotically distributed as χ2(k j − k i),
This result implies that, at least asymptotically, each of the component test
statistics is bounded above by the test statistic for H0 against H2
The result (7.81) is not particularly useful in the case of (7.71), (7.72), and(7.73), where all of the test statistics are quite easy to compute However, itcan sometimes come in handy Suppose, for example, that it is easy to test
H0 against H2 but hard to test H0 against H1 Then, if τ02 is small enough
that it would not cause us to reject H0 against H1 when compared with the
appropriate critical value for the χ2(k1− k0) distribution, we do not need to
bother calculating τ01, because it will be even smaller
Trang 187.10 Models for Panel Data
Many data sets are measured across two dimensions One dimension is time,and the other is usually called the cross-section dimension For example, wemay have 40 annual observations on 25 countries, or 100 quarterly observations
on 50 states, or 6 annual observations on 3100 individuals Data of this typeare often referred to as panel data It is likely that the error terms for a modelusing panel data will display certain types of dependence, which should betaken into account when we estimate such a model
For simplicity, we restrict our attention to the linear regression model
y it = X it β + u it , i = 1, , m, t = 1, , T, (7.82)
where X it is a 1 × k vector of observations on explanatory variables There are assumed to be m cross-sectional units and T time periods, for a total
of n = mT observations If each u it has expectation zero conditional on its
corresponding X it, we can estimate equation (7.82) by ordinary least squares
But the OLS estimator is not efficient if the u it are not IID, and the IIDassumption is rarely realistic with panel data
If certain shocks affect the same cross-sectional unit at all points in time,
the error terms u it and u is will be correlated for all t 6= s Similarly, if
certain shocks affect all cross-sectional units at the same point in time, the
error terms u it and u jt will be correlated for all i 6= j In consequence, if
we use OLS, not only will we obtain inefficient parameter estimates, but wewill also obtain an inconsistent estimate of their covariance matrix; recall
the discussion of Section 5.5 If the expectation of u it conditional on X it is
not zero, then, for reasons mentioned in Section 7.4, OLS will actually yield inconsistent parameter estimates This will happen, for example, when X it contains lagged dependent variables and the u it are serially correlated.Error-Components Models
The two most popular approaches for dealing with panel data are both based
on what are called error-components models The idea is to specify the error
term u it in (7.82) as consisting of two or three separate shocks, each of which
is assumed to be independent of the others A fairly general specification is
Here e t affects all observations for time period t, v i affects all observations
for cross-sectional unit i, and ε it affects only observation it It is ally assumed that the e t are independent across t, the v i are independent
gener-across i, and the ε it are independent across all i and t Classic papers on
error-components models include Balestra and Nerlove (1966), Fuller and Battese(1974), and Mundlak (1978)
Trang 197.10 Models for Panel Data 297
In order to estimate an error-components model, the e t and v ican be regarded
as being either fixed or random, in a sense that we will explain If the e t and v i are thought of as fixed effects, then they are treated as parameters
to be estimated It turns out that they can then be estimated by OLS usingdummy variables If they are thought of as random effects, then we must
figure out the covariance matrix of the u it as functions of the variances of
the e t , v i , and ε it, and use feasible GLS Each of these approaches can beappropriate in some circumstances but may be inappropriate in others
In what follows, we simplify the error-components specification (7.83) by
elim-inating the e t Thus we assume that there are shocks specific to each sectional unit, or group, but no time-specific shocks This assumption is oftenmade in empirical work, and it considerably simplifies the algebra In addi-
cross-tion, we assume that the X itare exogenous The presence of lagged dependentvariables in panel data models raises a number of issues that we do not wish
to discuss here; see Arellano and Bond (1991) and Arellano and Bover (1995).Fixed-Effects Estimation
The model that underlies fixed-effects estimation, based on equation (7.82)and the simplified version of equation (7.83), can be written as follows:
y = Xβ + Dη + ε, E(εε > ) = σ2
where y and ε are n vectors with typical elements y it and ε it, respectively,
and D is an n × m matrix of dummy variables, constructed in such a way that the element in the row corresponding to observation it, for i = 1, , m and t = 1, , T, and column j, for j = 1, , m, is equal to 1 if i = j
and equal to 0 otherwise.3 The m vector η has typical element v i, and so
it follows that the n vector Dη has element v i in the row corresponding to
observation it Note that there is exactly one element of D equal to 1 in each row, which implies that the n vector ι with each element equal to 1 is a linear combination of the columns of D Consequently, in order to avoid collinear regressors, the matrix X should not contain a constant.
The vector η plays the role of a parameter vector, and it is in this sense that the v i are called fixed effects They could in fact be random; the essential thing
is that they must be independent of the error terms ε it They may, however,
be correlated with the explanatory variables in the matrix X Whether or not this is the case, the model (7.84), interpreted conditionally on η, implies
that the moment conditions
Trang 20are satisfied The fixed-effects estimator, which is the OLS estimator of β
in equation (7.84), is based on these moment conditions Because of the way
it is computed, this estimator is sometimes called the least squares dummyvariables, or LSDV, estimator
Let M D denote the projection matrix I − D(D > D) −1 D > Then, by the FWL
Theorem, we know that the OLS estimator of β in (7.84) can be obtained
by regressing M D y, the residuals from a regression of y on D, on M D X, the matrix of residuals from regressing each of the columns of X on D The
fixed-effects estimator is therefore
ˆ
βFE = (X > M D X) −1 X > M D y (7.85) For any n vector x, let ¯ x i denote the group mean T −1PT
t=1 x it Then it
is easy to check that element it of the vector M D x is equal to x it − ¯ x i,the deviation from the group mean Since all the variables in (7.85) are
premultiplied by M D, it follows that this estimator makes use only of the
information in the variation around the mean for each of the m groups For this reason, it is often called the within-groups estimator Because X and D
are exogenous, this estimator is unbiased Moreover, since the conditions ofthe Gauss-Markov theorem are satisfied, we can conclude that the fixed-effectsestimator is BLUE
The fixed-effects estimator (7.85) has advantages and disadvantages It is
easy to compute, even when m is very large, because it is never necessary to make direct use of the n × n matrix M D All that is needed is to compute
the m group means for each variable In addition, the estimates ˆ η of the fixed
effects may well be of interest in their own right However, the estimatorcannot be used with an explanatory variable that takes on the same value forall the observations in each group, because such a column would be collinear
with the columns of D More generally, if the explanatory variables in the matrix X are well explained by the dummy variables in D, the parameter vector β will not be estimated at all precisely It is of course possible to
estimate a constant, simply by taking the mean of the estimates ˆη.
Random-Effects Estimation
It is possible to improve on the efficiency of the fixed-effects estimator if one
is willing to impose restrictions on the model (7.84) For that model, all we
require is that the matrix X of explanatory variables and the cross-sectional errors v i should both be independent of the ε it, but this does not rule outthe possibility of a correlation between them The restrictions imposed for
random-effects estimation require that the v i should be independent of X.
This independence assumption is by no means always plausible For example,
in a panel of observations on individual workers, an observed variable likethe hourly wage rate may well be correlated with an unobserved variable
Trang 217.10 Models for Panel Data 299
like ability, which implicitly enters into the individual-specific error term v i.However, if the assumption is satisfied, it follows that
E(u it | X) = E(v i + ε it | X) = 0, (7.86)
since v i and ε it are then both independent of X Condition (7.86) is precisely
the condition which ensures that OLS estimation of the model (7.82), ratherthan the model (7.84), will yield unbiased estimates
However, OLS estimation of equation (7.82) is not in general efficient, because
the u it are not IID We can calculate the covariance matrix of the u it if we
assume that the v i are IID random variables with mean zero and variance σ2
v.This assumption accounts for the term “random” effects From (7.83), setting
e t = 0 and using the assumption that the shocks are independent, it is easy
to see that
Var(u it ) = σ2
v + σ2
ε , Cov(u it u is ) = σ2v , and Cov(u it u js ) = 0 for all i 6= j.
These define the elements of the n × n covariance matrix Ω, which we need
for GLS estimation If the data are ordered by the cross-sectional units in
m blocks of T observations each, this matrix has the form
v everywhere else Here ι is a T vector of 1s.
To obtain GLS estimates of β, we would need to know the values of σ2
ε and σ2
v,
or, at least, the value of their ratio, since, as we saw in Section 7.3, GLS
estimation requires only that Ω should be specified up to a factor To obtain
feasible GLS estimates, we need a consistent estimate of that ratio However,the reader may have noticed that we have made no use in this section so far
of asymptotic concepts, such as that of a consistent estimate This is because,
in order to obtain definite results, we must specify what happens to both m and T when n = mT tends to infinity.
Consider the fixed-effects model (7.84) If m remains fixed as T → ∞, then the number of regressors also remains fixed as n → ∞, and standard asymptotic theory applies But if T remains fixed as m → ∞, then the number of parameters to be estimated tends to infinity, and the m vector ˆ η of estimates
Trang 22of the fixed effects is not consistent, because each estimated effect depends
only on T observations It is nevertheless possible to show that, even in this
case, ˆβ remains consistent; see Exercise 7.23.
It is always possible to find a consistent estimate of σ2
ε by estimating the
model (7.84), because, no matter how m and T may behave as n → ∞, there are n residuals Thus, if we divide the SSR from (7.84) by n − m − k, we will obtain an unbiased and consistent estimate of σ2
ε, since the error terms for this
model are just the ε it But the natural estimator of σ2
v, namely, the sample
variance of the m elements of ˆ η, is not consistent unless m → ∞ In practice,
therefore, it is probably undesirable to use the random-effects estimator when
ˆ
βBG = (X > P D X) −1 X > P D y (7.89) Although regression (7.88) appears to have n = mT observations, it really has only m, because the regressand and all the regressors are the same for every
observation in each group The estimator bears the name “between-groups”
because it uses only the variation among the group means If m < k, note that the estimator (7.89) does not even exist, since the matrix X > P D X can have rank at most m.
If the restrictions of the random-effects model are not satisfied, the estimatorˆ
βBG, if it exists, is in general biased and inconsistent To see this, observethat unbiasedness and consistency require that the moment conditions
E¡(P D X) >
it (y it − X it β)¢= 0 (7.90) should hold, where (P D X) it is the row labelled it of the n × k matrix P D X Since y it − X it β = v i + ε it , and since ε it is independent of everything else
in condition (7.90), this condition is equivalent to the absence of correlation
between the v i and the elements of the matrix X.
As readers are asked to show in Exercise 7.24, the variance of the error terms
in regression (7.88) is σ2
v + σ2
ε /T Therefore, if we run it as a regression with m observations, divide the SSR by m − k, and then subtract 1/T times our estimate of σ2
ε, we will obtain a consistent, but not necessarily positive,
Trang 237.10 Models for Panel Data 301the between-groups estimator (7.89) For the former to be consistent, we needonly the assumptions of the fixed-effects model, but for the latter we need inaddition the restrictions of the random-effects model Thus both the OLSestimator of (7.82) and the feasible GLS estimator are consistent only if thebetween-groups estimator is consistent.
For the OLS estimator of (7.82),
ˆ
β = (X > X) −1 X > y
= (X > X) −1 (X > M D y + X > P D y)
= (X > X) −1 X > M D X ˆ βFE+ (X > X) −1 X > P D X ˆ βBG,
which shows that the estimator is indeed a matrix-weighted average of ˆβFE
and ˆβBG As readers are asked to show in Exercise 7.25, the GLS estimator
of the random-effects model can be obtained by running the OLS regression
(I − λP D )y = (I − λP D )Xβ + residuals, (7.91) where the scalar λ is defined by
matrix-identical to the OLS estimator when λ = 0, which happens when σ2
v = 0,
and equal to the within-groups, or fixed-effects, estimator when λ = 1, which happens when σ2
ε = 0 Except in these two special cases, the GLS estimator
is more efficient, in the context of the random-effects model, than either theOLS estimator or the fixed-effects estimator But equation (7.91) also impliesthat the random-effects estimator is inconsistent whenever the between-groupsestimator is inconsistent
Unbalanced Panels
Up to this point, we have assumed that we are dealing with a balanced panel,
that is, a data set for which there are precisely T observations for each
cross-sectional unit However, it is quite common to encounter unbalanced panels,for which the number of observations is not the same for every cross-sectionalunit The fixed-effects estimator can be used with unbalanced panels withoutany real change It is still based on regression (7.84), and the only change is
that the matrix of dummy variables D will no longer have the same number
of 1s in each column The random-effects estimator can also be used withunbalanced panels, but it needs to be modified slightly
Trang 24Let us assume that the data are grouped by cross-sectional units Let T i denote the number of observations associated with unit i, and partition y and
X as follows:
y = [y1 y2 ··· y m ], X = [X1 X2 ··· X m ], where y i and X i denote the T i rows of y and X that correspond to the ith
unit By analogy with (7.92), make the definition
Let ¯y i denote a T i vector, each element of which is the mean of the elements
of y i Similarly, let ¯X i denote a T i × k matrix, each element of which is the mean of the corresponding column of X i Then the random-effects estimatorcan be computed by running the linear regression
reduces to regression (7.91) in that special case
Group Effects and Individual Data
Error-components models are also relevant for regressions on cross-sectiondata with no time dimension, but where the observations naturally belong togroups For example, each observation might correspond to a household living
in a certain state, and each group would then consist of all the householdsliving in a particular state In such cases, it is plausible that the error terms forindividuals within the same group are correlated An error-components model
that combines a group-specific error v i , with variance σ2
v, and an
individual-specific error ε it , with variance σ2
ε, is a natural way to model this sort ofcorrelation Such a model implies that the correlation between the error terms
for observations in the same group is ρ ≡ σ2
Trang 257.11 Final Remarks 303panel, because this model takes account of between-group variation Thiscan be seen from equation (7.93): Collinearity of the transformed group-levelvariables on the right-hand side occurs only if the explanatory variables are
collinear to begin with The estimates of σ2
ε and σ2
v needed to compute the
λ i may be obtained in various ways, some of which were discussed in thesubsection on random-effects estimation As we remarked there, these work
well only if the number of groups m is not too small.
If it is thought that the within-group correlation ρ is small, it may be tempting
to ignore it and use OLS estimation, with the usual OLS covariance matrix
This can be a serious mistake unless ρ is actually zero, since the OLS dard errors can be drastic underestimates even with small values of ρ, as
stan-Kloek (1981) and Moulton (1986, 1990) have pointed out The problem isparticularly severe when the number of observations per group is large, asreaders are asked to show in Exercise 7.26 The correlation of the error termswithin groups means that the effective sample size is much smaller than theactual sample size when there are many observations per group
In this section, we have presented just a few of the most basic ideas concerningestimation with panel data Of course, GLS is not the only method that can
be used to estimate models for data of this type The generalized method ofmoments (Chapter 9) and the method of maximum likelihood (Chapter 10)are also commonly used For more detailed treatments of various modelsfor panel data, see, among others, Chamberlain (1984), Hsiao (1986, 2001),Baltagi (1995), Greene (2000, Chapter 14), Ruud (2000, Chapter 24), Arellanoand Honor´e (2001), and Wooldridge (2001)
7.11 Final Remarks
Several important concepts were introduced in the first four sections of thischapter, which dealt with the basic theory of generalized least squares esti-mation The concept of an efficient MM estimator, which we introduced inSection 7.2, will be encountered again in the context of generalized instru-mental variables estimation (Chapter 8) and generalized method of momentsestimation (Chapter 9) The key idea of feasible GLS estimation, namely, that
an unknown covariance matrix may in some circumstances be replaced by aconsistent estimate of that matrix without changing the asymptotic properties
of the resulting estimator, will also be encountered again in Chapter 9.The remainder of the chapter dealt with the treatment of heteroskedasticityand serial correlation in linear regression models, and with error-componentsmodels for panel data Although this material is of considerable practicalimportance, most of the techniques we discussed, although sometimes compli-cated in detail, are conceptually straightforward applications of feasible GLSestimation, NLS estimation, and methods for testing hypotheses that wereintroduced in Chapters 4 and 6
Trang 267.12 Exercises
7.1 Using the fact that E(uu > | X) = Ω for regression (7.01), show directly,
without appeal to standard OLS results, that the covariance matrix of the GLS estimator ˆβGLS is given by (7.05).
7.2 Show that the matrix (7.11), reproduced here for easy reference,
X > Ω −1 X − X > W (W > Ω W ) −1 W > X,
is positive semidefinite As in Section 6.2, this may be done by showing that
this matrix can be expressed in the form Z > M Z, for some n × k matrix Z and some n × n orthogonal projection matrix M It is helpful to express Ω −1
as Ψ Ψ >, as in (7.02).
7.3 Using the data in the file earnings.data, run the regression
y t = β1d 1t + β2d 2t + β3d 3t + u t , which was previously estimated in Exercise 5.3 Recall that the d itare dummy
variables Then test the null hypothesis that E(u2t ) = σ2against the tive that
alterna-E(u2t ) = γ1d 1t + γ2d 2t + γ3d 3t Report P values for F and nR2 tests.
7.4 If u t follows the stationary AR(1) process
u t = ρu t−1 + ε t , ε t ∼ IID(0, σ ε2), |ρ| < 1, show that Cov(u t u t−j ) = Cov(u t u t+j ) = ρ j σ ε2/(1 − ρ2) Then use this result
to show that the correlation between u t and u t−j is just ρ j.
7.5 Consider the nonlinear regression model y t = x t (β) + u t Derive the GNR for
testing the null hypothesis that the u t are serially uncorrelated against the alternative that they follow an AR(1) process.
7.6 Show how to test the null hypothesis that the error terms of the linear
regres-sion model y = Xβ + u are serially uncorrelated against the alternative that
they follow an AR(4) process by means of a GNR Derive the test GNR from first principles.
7.7 Consider the following three models, where u t is assumed to be IID(0, σ2):
H0: y t = β + u t
H1: y t = β + ρ(y t−1 − β) + u t
H2: y t = β + u t + αu t−1 Explain how to test H0 against H1by using a GNR Then show that exactly
the same test statistic is also appropriate for testing H0 against H2.
7.8 Write the trace in (7.50) explicitly in terms of P X rather than M X, and show
that the terms containing one or more factors of P Xall vanish asymptotically.
Trang 277.12 Exercises 305
7.9 By direct matrix multiplication, show that, if Ψ is given by (7.59), then Ψ Ψ >
is equal to the matrix
Show further, by direct calculation, that this matrix is proportional to the
inverse of the matrix Ω given in (7.32).
7.10 Show that equation (7.30), relating u to ε, can be modified to take account
of the definition (7.58) of ε1, with the result that
u t = ε t + ρε t−1 + ρ2ε t−2 + · · · + ρ
t−1
(1 − ρ2 )1/2 ε1 (7.94) The relation Ψ > u = ε implies that u = (Ψ >)−1 ε Use the result (7.94) to show that Ψ −1 can be written as
where θ ≡ (1 − ρ2)−1/2 Verify by direct calculation that this matrix is the
inverse of the Ψ given by (7.59).
7.11 Consider a square, symmetric, nonsingular matrix partitioned as follows
Trang 287.12 Suppose that the matrix H of the previous question is positive definite It therefore follows (see Section 3.4) that there exists a square matrix X such that H = X > X Partition X as [X1 X2], so that
where the blocks of the matrix on the right-hand side are the same as the
blocks in (7.95) Show that the top left block D of H −1 can be expressed
as (X1> M2X1)−1 , where M2 = I − X2(X2> X2)−1 X2> Use this result to
show that D − A −1 = (X1> M2X1)−1 − (X1> X1)−1is a positive semidefinite matrix.
7.13 Consider testing for first-order serial correlation of the error terms in the regression model
where y1 is the vector with typical element y t−1, by use of the statistics
tGNR and tSR defined in (7.51) and (7.52), respectively Show first that the
vector denoted as M X u˜1 in (7.51) and (7.52) is equal to − ˜ βM X y2, where
y2 is the vector with typical element y t−2, and ˜β is the OLS estimate of β from (7.96) Then show that, as n → ∞, tGNR tends to the random vari-
able τ ≡ σ −2 u plim n −1/2 (βy1− y2)> u, whereas tSRtends to the same random
variable times β Show finally that tGNR, but not tSR, provides an
asymptot-ically correct test, by showing that the random variable τ is asymptotasymptot-ically distributed as N (0, 1).
7.14 The file money.data contains seasonally adjusted quarterly data for the
loga-rithm of the real money supply, m t , real GDP, y t, and the 3-month Treasury
Bill rate, r t, for Canada for the period 1967:1 to 1998:4 A conventional demand for money function is
m t = β1+ β2r t + β3y t + β4m t−1 + u t (7.97)
Estimate this model over the period 1968:1 to 1998:4, and then test it for AR(1) errors using two different GNRs that differ in their treatment of the first observation.
7.15 Use nonlinear least squares to estimate, over the period 1968:1 to 1998:4,
the model that results if u t in (7.97) follows an AR(1) process Then test the common factor restrictions that are implicit in this model Calculate an
asymptotic P value for the test.
7.16 Test the common factor restrictions of Exercise 7.15 again using a GNR.
Calculate both an asymptotic P value and a bootstrap P value based on at least B = 99 bootstrap samples Hint: To obtain a consistent estimate of ρ for the GNR, use the fact that the coefficient of r t−1in the unrestricted model
(7.73) is equal to −ρ times the coefficient of r t.
7.17 Use nonlinear least squares to estimate, over the period 1968:1 to 1998:4,
the model that results if u t in (7.97) follows an AR(2) process Is there any evidence that an AR(2) process is needed here?
Trang 29next step in this procedure? Complete the description of iterated Orcutt as iterated feasible GLS, showing how each step of the procedure can
Cochrane-be carried out using an OLS regression.
Show that, when the algorithm converges, conditions (7.68) for NLS mation are satisfied Also show that, unlike iterated feasible GLS including
esti-observation 1, this algorithm must eventually converge, although perhaps only
to a local, rather than the global, minimum of SSR(β, ρ).
7.19 Consider once more the model that you estimated in Exercise 7.15 Estimate this model using the iterated Cochrane-Orcutt algorithm, using a sequence of OLS regressions, and see how many iterations are needed to achieve the same estimates as those achieved by NLS Compare this number with the number
of iterations used by NLS itself.
Repeat the exercise with a starting value of 0.5 for ρ instead of the value of 0
that is conventionally used.
7.20 Test the hypothesis that the error terms of the linear regression model (7.97) are serially uncorrelated against the alternatives that they follow the simple
AR(4) process u t = ρ4u t−1 +ε t and that they follow a general AR(4) process Test the hypothesis that the error terms of the nonlinear regression model you estimated in Exercise 7.15 are serially uncorrelated against the same two alternative hypotheses Use Gauss-Newton regressions.
7.21 Consider the linear regression model
y = X0β0+ X1β1+ X2β2+ u, u ∼ IID(0, σ2I), (7.98)
where there are n observations, and k0, k1, and k2 denote the numbers of
parameters in β0, β1, and β2, respectively Let H0 denote the hypothesis
that β1 = 0 and β2 = 0, H1 denote the hypothesis that β2 = 0, and H2
denote the model (7.98) with no restrictions.
Show that the F statistics for testing H0against H1and for testing H1against
H2 are asymptotically independent of each other.
7.22 This question uses data on daily returns for the period 1989–1998 for shares
of Mobil Corporation from the file daily-crsp.data These data are made available by courtesy of the Center for Research in Security Prices (CRSP); see the comments at the bottom of the file Regress these returns on a constant and themselves lagged once, twice, three, and four times, dropping the first four observations Then test the null hypothesis that all coefficients except the constant term are equal to zero, as they should be if market prices fully reflect all available information Perform a heteroskedasticity-robust test by
running two HRGNRs, and report P values for both tests.
Trang 307.23 Consider the fixed-effects model (7.84) Show that, under mild regularity ditions, which you should specify, the OLS estimator ˆβFEtends in probability
con-to the true parameter veccon-tor β0 as m, the number of cross-sectional units, tends to infinity, while T , the number of time periods, remains fixed.
7.24 Suppose that
where there are n = mT observations, y is an n vector with typical element
y it , X is an n × k matrix with typical row X it , ε is an n vector with typical element ε it , and v is an n vector with v i repeated in the positions that
correspond to y i1 through y iT Let the v i have variance σ2 and the ε it have
variance σ ε2 Given these assumptions, show that the variance of the error
terms in regression (7.88) is σ v2+ σ ε2/T.
7.25 Show that, for Σ defined in (7.87),
Σ −1/2= 1
σ ε(IT − λP ι ), where P ι ≡ ι(ι > ι) −1 ι > = (1/T )ιι >, and
Then use this result to show that the GLS estimates of β may be obtained
by running regression (7.91) What is the covariance matrix of the GLS estimator?
7.26 Suppose that, in the error-components model (7.99), none of the columns of X
displays any within-group variation Recall that, for this model, the data are
balanced, with m groups and T observations per group Show that the OLS
and GLS estimators are identical in this special case Then write down the true covariance matrix of both these estimators How is this covariance matrix related to the usual one for OLS that would be computed by a regression package under classical assumptions? What happens to this relationship as
T and ρ, the correlation of the error terms within groups, change?
Trang 31Chapter 8
Instrumental Variables
Estimation
8.1 Introduction
In Section 3.3, the ordinary least squares estimator ˆβ was shown to be
consis-tent under condition (3.10), according to which the expectation of the error
term u t associated with observation t is zero conditional on the regressors X t
for that same observation As we saw in Section 4.5, this condition can also
be expressed either by saying that the regressors X t are predetermined or by
saying that the error terms u t are innovations When condition (3.10) doesnot hold, the consistency proof of Section 3.3 is not applicable, and the OLSestimator will, in general, be biased and inconsistent
It is not always reasonable to assume that the error terms are innovations
In fact, as we will see in the next section, there are commonly encounteredsituations in which the error terms are necessarily correlated with some of theregressors for the same observation Even in these circumstances, however, it
is usually possible, although not always easy, to define an information set Ωtfor each observation such that
Trang 32with briefly A more general class of MM estimators, of which both OLS and
IV are special cases, will be the subject of Chapter 9
8.2 Correlation Between Error Terms and Regressors
We now briefly discuss two common situations in which the error terms will
be correlated with the regressors and will therefore not have mean zero ditional on them The first one, usually referred to by the name errors invariables, occurs whenever the independent variables in a regression modelare measured with error The second situation, often simply referred to assimultaneity, occurs whenever two or more endogenous variables are jointlydetermined by a system of simultaneous equations
con-Errors in Variables
For a variety of reasons, many economic variables are measured with error Forexample, macroeconomic time series are often based, in large part, on surveys,and they must therefore suffer from sampling variability Whenever thereare measurement errors, the values economists observe inevitably differ, to agreater or lesser extent, from the true values that economic agents presumablyact upon As we will see, measurement errors in the dependent variable of aregression model are generally of no great consequence, unless they are verylarge However, measurement errors in the independent variables cause theerror terms to be correlated with the regressors that are measured with error,and this causes OLS to be inconsistent
The problems caused by errors in variables can be seen quite clearly in thecontext of the simple linear regression model Consider the model
y ◦ t = β1+ β2x ◦ t + u ◦ t , u ◦ t ∼ IID(0, σ2), (8.02) where the variables x ◦
Here v 1t and v 2t are measurement errors which are assumed, perhaps not
realistically in some cases, to be IID with variances ω2
If we suppose that the true DGP is a special case of (8.02) along with (8.03),
we see from (8.03) that x ◦
t = x t − v 1t and y ◦
t = y t − v 2t If we substitute theseinto (8.02), we find that
y t = β1+ β2(x t − v 1t ) + u ◦
t + v 2t
= β1+ β2x t + u ◦ t + v 2t − β2v 1t
Trang 338.2 Correlation Between Error Terms and Regressors 311
The measurement error in the independent variable also increases the variance
of the error terms, but it has another, much more severe, consequence as well
Because x t = x ◦
t + v 1t , and u t depends on v 1t , u t will be correlated with x t whenever β2 6= 0 In fact, since the random part of x t is v 1t, we see that
E(u t | x t ) = E(u t | v 1t ) = −β2v 1t , (8.05) because we assume that v 1t is independent of u ◦
t and v 2t From (8.05), we can
see, using the fact that E(u t) = 0 unconditionally, that
Cov(x t , u t ) = E(x t u t) = E¡x t E(u t | x t)¢
= −E¡(x ◦ t + v 1t )β2v 1t¢= −β2ω21 This covariance is negative if β2> 0 and positive if β2 < 0, and, since it does not depend on the sample size n, it will not go away as n becomes large An exactly similar argument shows that the assumption that E(u t | X t) = 0 is
false whenever any element of X t is measured with error In consequence, theOLS estimator will be biased and inconsistent
Errors in variables are a potential problem whenever we try to estimate aconsumption function, especially if we are using cross-section data Manyeconomic theories (for example, Friedman, 1957) suggest that household con-sumption will depend on “permanent” income or “life-cycle” income, but sur-veys of household behavior almost never measure this Instead, they typically
provide somewhat inaccurate estimates of current income If we think of y t as
measured consumption, x ◦
t as permanent income, and x t as estimated currentincome, then the above analysis applies directly to the consumption function
The marginal propensity to consume is β2, which must be positive, causing
the correlation between u t and x t to be negative As readers are asked to show
in Exercise 8.1, the probability limit of ˆβ2 is less than the true value β20 Inconsequence, the OLS estimator ˆβ2 is biased downward, even asymptotically
Of course, if our objective is simply to estimate the relationship between the
observed dependent variable y t and the observed independent variable x t,there is nothing wrong with using ordinary least squares to estimate equation
(8.04) In that case, u t would simply be defined as the difference between
y t and its expectation conditional on x t But our analysis shows that the
OLS estimators of β1 and β2 in equation (8.04) are not consistent for thecorresponding parameters of equation (8.02) In most cases, it is parameterslike these that we want to estimate on the basis of economic theory
There is an extensive literature on ways to avoid the inconsistency caused byerrors in variables See, among many others, Hausman and Watson (1985),
Trang 34Leamer (1987), and Dagenais and Dagenais (1997) The simplest and mostwidely-used approach is just to use an instrumental variables estimator.Simultaneous Equations
Economic theory often suggests that two or more endogenous variables aredetermined simultaneously In this situation, as we will see shortly, all of theendogenous variables will necessarily be correlated with the error terms in all
of the equations This means that none of them may validly appear in theregression functions of models that are to be estimated by least squares
A classic example, which well illustrates the econometric problems caused bysimultaneity, is the determination of price and quantity for a commodity at
the partial equilibrium of a competitive market Suppose that q t is quantity
and p t is price, both of which would often be in logarithms A linear (orloglinear) model of demand and supply is
functions, β d and β s are corresponding vectors of parameters, γ d and γ s are
scalar parameters, and u d
t and u s
t are the error terms in the demand and
supply functions Economic theory predicts that, in most cases, γ d < 0 and
γ s > 0, which is equivalent to saying that the demand curve slopes downward
and the supply curve slopes upward
Equations (8.06) and (8.07) are a pair of linear simultaneous equations for
the two unknowns p t and q t For that reason, these equations constitute what
is called a linear simultaneous equations model In this case, there are twodependent variables, quantity and price For estimation purposes, the keyfeature of the model is that quantity depends on price in both equations.Since there are two equations and two unknowns, it is straightforward to solve
equations (8.06) and (8.07) for p t and q t This is most easily done by rewritingthem in matrix notation as
·
u d t
u s t
¸
The solution to (8.08), which will exist whenever γ d 6= γ s, so that the matrix
on the left-hand side of (8.08) is nonsingular, is
·
u d t
u s t
¸!