A problem with Engle-Granger test statistics is that they asymptotic distribution of the test statistic under the null hypothesis is the Consequently, Engle-Granger tests with the same d
Trang 1Estimation Using an ECM
We mentioned in Section 13.4 that an error correction model can be usedeven when the data are nonstationary In order to justify this assertion, we
generated by the two equations (14.45) From the definition (14.41) of the
We may invert equations (14.45) as follows:
equations (14.50), then equation (14.49) becomes
that used in Section 13.3, it is easy enough to see that equation (14.51) is
a special case of an ECM like (13.62) Notice that it must be estimated bynonlinear least squares
In general, equation (14.51) is an unbalanced regression, because it mixes thefirst differences, which are I(0), with the levels, which are I(1) But the linear
of the very special structure of the DGP (14.45) It is the parameter thatappears in the equilibrium error that defines the cointegrating vector, not the
and short-run multipliers should be the same, and so, for the purposes ofestimation and testing, equation (14.51) is normally replaced by
Trang 2620 Unit Roots and Cointegration
Equation (14.52) is without doubt an unbalanced regression, and so we mustexpect that the OLS estimates will not have their usual distributions It
as readers are invited to check by simulation in Exercise 14.20
In the general case, with k cointegrated variables, we may estimate the
coint-egrating vector using the linear regression
Other approaches
When we cannot, or do not want to, specify an ECM, at least two othermethods are available for estimating a cointegrating vector One, proposed
by Phillips and Hansen (1990), is called fully modified estimation The idea
estimate of the bias The result turns out to be asymptotically multivariatenormal, and it is possible to estimate its asymptotic covariance matrix Toexplain just how fully modified estimation works would require more spacethan we have available Interested readers should consult the original paper
or Banerjee, Dolado, Galbraith, and Hendry (1993, Chapter 7)
A second approach, which is due to Saikkonen (1991), is much simpler todescribe and implement We run the regression
by OLS Observe that regression (14.54) is just regression (14.44) with the
Dickey-Fuller tests, the idea is to add enough leads and lags so that the error
terms appear to be serially independent Provided that p is allowed to increase
at the appropriate rate as n → ∞, this regression yields estimates that are
asymptotically efficient
Trang 3Inference in Regressions with I(1) Variables
From what we have said so far, it might seem that standard asymptotic resultsnever apply when a regression contains one or more regressors that are I(1).This is true for spurious regressions like (14.12), for unit root test regressionslike (14.18), and for error-correction models like (14.52) In all these cases,
certain statistics that are computed as ordinary t statistics actually follow
nonstandard distributions asymptotically
However, it is not true that the t statistic on every parameter in a regression
that involves I(1) variables follows a nonstandard distribution
asymptotic-ally It is not even true that the t statistic on every coefficient of an I(1)
variable follows such a distribution Instead, as Sims, Stock, and Watson
(1990) showed in a famous paper, the t statistic on any parameter that
ap-pears only as the coefficient of an I(0) variable, perhaps after the regressorsare rearranged, follows the standard normal distribution asymptotically Sim-
ilarly, an F statistic for a test of the hypothesis that any set of parameters
is zero follows its usual asymptotic distribution if all the parameters can be
written as coefficients of I(0) variables at the same time On the other hand,
t statistics and F statistics corresponding to parameters that do not satisfy
this condition generally follow nonstandard limiting distributions, althoughthere are certain exceptions that we will not discuss here; see West (1988)and Sims, Stock, and Watson (1990)
We will not attempt to prove these results, which are by no means trivial
Proofs may be found in the original paper by Sims et al., and there is a
some-what simpler discussion in Banerjee, Dolado, Galbraith, and Hendry (1993,Chapter 6) Instead, we will consider two examples that should serve to illus-trate the nature of the results First, consider a simple ECM reparametrized
We can rewrite equation (14.52) as
asymptotically distributed as N (0, 1).
I(0) variable Consequently, the t statistic on every coefficient in (14.52) is
Trang 4622 Unit Roots and Cointegration
asymptotically normally distributed Despite this, it is not the case that an
distribution under the null hypothesis This is because we cannot rewrite
would also be asymptotically normal, with the same rate of convergence, in
super-consistent The phenomenon is explained by the fact, which we willnot attempt to demonstrate in detail here, that the two random variables
therefore perfectly correlated asymptotically It is straightforward (see cise 14.21) to show that this implies that
As a second example, consider the augmented Dickey-Fuller test regression
which is a special case of equation (14.32) This can be rewritten as
coefficient of an I(0) variable In the second line of (14.58), it does multiply
asymptotic distribution As we saw in Section 14.3, that is indeed the case,
(14.57) does follow the standard normal distribution asymptotically
also yield statistics that follow the usual asymptotic F distribution That
to include in the test regression (14.33) that is used to perform augmentedDickey-Fuller tests
Trang 5Estimation by a Vector Autoregression
The procedures we have discussed so far for estimating and making inferencesabout cointegrating vectors are all in essence single-equation methods A verypopular alternative to those methods is to estimate a vector autoregression,
or VAR, for all of the possibly cointegrated variables The best-known suchmethods were introduced by Johansen (1988, 1991) and initially applied byJohansen and Juselius (1990, 1992), and a similar approach was introducedindependently by Ahn and Reinsel (1988, 1990) Johansen (1995) provides adetailed exposition An advantage of these methods is that they can allow formore than one cointegrating relation among a set of more than two variables.Consider the VAR
a row vector of deterministic variables, such as a constant term and a trend,
The VAR (14.59) is written in levels It can be reparametrized as
cointegrated by testing hypotheses about the g × g matrix Π, which is called
the impact matrix
If we assume, as usual, that the differenced variables are I(0), then everything
is to be satisfied, this term must be I(0) as well It clearly is so if the matrix
Π is a zero matrix In this extreme case, there is no cointegration at all.
However, it can also be I(0) if Π is nonzero but does not have full rank In fact, the rank of Π is the number of cointegrating relations.
To see why this is so, suppose that the matrix Π has rank r, with 0 ≤ r < g.
In this case, we can always write
Trang 6624 Unit Roots and Cointegration
where η and α are both g × r matrices Recall that the rank of a matrix
is the number of linearly independent columns Here, any set of r linearly independent columns of Π is a set of linear combinations of the r columns
of η See also Exercise 14.19 When equation (14.61) holds, we see that
follows that there are r independent cointegrating relations.
We can now see just how the number of cointegrating vectors is related to
the rank of the matrix Π In the extreme case in which r = 0, there are
no cointegrating vectors at all, and Π = O When r = 1, there is a single
r = 3, there is a three-dimensional space of cointegrating vectors, spanned
linear combination of these elements would be stationary, which implies that
The system (14.60) with the constraint (14.61) imposed can be written as
Estimating this system of equations yields estimates of the r cointegrating
vectors However, it can be seen from (14.62) that not all of the elements of
η and α can be identified, since the factorization (14.61) is not unique for a
given Π In fact, if Θ is any nonsingular r × r matrix,
It is therefore necessary to make some additional assumption in order to vert equation (14.62) into an identified model
con-We now consider the simpler case in which g = 2, r = 1, and p = 0 In this
case, the VAR (14.60) becomes
∆y t1 = X t b1+ π11y t−1,1 + π21y t−1,2 + u t1 ,
∆y t2 = X t b2+ π12y t−1,1 + π22y t−1,2 + u t2 , (14.64)
Trang 7one unit eigenvalue and the other eigenvalue less than 1 in absolute value.This requirement is identical to requiring the matrix
·
¸
to have one zero eigenvalue and the other between −2 and 0 Let the zero
Unlike equations (14.64), the restricted equations (14.65) are nonlinear Thereare at least two convenient ways to estimate them One is first to estimatethe unrestricted equations (14.64) and then use the GNR (12.53) discussed
in Section 12.3, possibly with continuous updating of the estimate of thecontemporaneous covariance matrix Another is to use maximum likelihood,
distributed This second method extends straightforwardly to the estimation
of the more general restricted VAR (14.62) The normality assumption is notreally restrictive, since the ML estimator is a QMLE even when the normalityassumption is not satisfied; see Section 10.4
Maximum likelihood estimation of a system of nonlinear equations was treated
in Section 12.3 We saw there that one approach is to minimize the minant of the matrix of sums of squares and cross-products of the residuals
For simplicity, we suppose for the moment that X is an empty matrix The
general case will be dealt with in more detail in the next section Then the
of algebra (see Exercise 14.22) shows that this determinant is equal to the
Trang 8626 Unit Roots and Cointegration
of the notation used in Section 12.5 in the context of LIML estimation, sincethe algebra of LIML is very similar to that used here In the present simple
case, the first-order condition for minimizing κ reduces to a quadratic equation
of κ given by equation (14.66) is smaller; see Exercise 14.23 for details.
As with the other methods we have discussed, estimating a cointegrating tor by a VAR yields a super-consistent estimator Bias is in general less thanwith either the levels estimator (14.46) or the ECM estimator obtained byrunning regression (14.52) For small sample sizes, there appears to be a ten-dency for there to be outliers in the left-hand tail of the distribution, leading
vec-to a higher variance than with the other two methods This phenomenonapparently disappears for samples of size greater than about 100, however;see Exercise 14.24
14.6 Testing for Cointegration
The three methods discussed in the last section for estimating a cointegratingvector can all be extended to provide tests for whether cointegrating relationsexist for a set of I(1) variables, and, in the case in which a VAR is used, todetermine how many such relations exist We begin with a method based onthe cointegrating regression (14.44)
Engle-Granger Tests
The simplest, and probably still the most popular, way to test for gration was proposed by Engle and Granger (1987) The idea is to estimatethe cointegrating regression (14.44) by OLS and then subject the resulting
variables are not cointegrated, there is no such linear combination, and theresiduals, being a linear combination of I(1) variables, are themselves I(1)
It may seem curious to have a null hypothesis of no cointegration, but thisfollows inevitably from the nature of any unit root test Recall from the simple
model (14.36) that, when there is no cointegration, the matrix Φ of (14.37)
is restricted so as to have two unit eigenvalues The alternative hypothesis ofcointegration implies that there is just one, the only constraint on the other
Trang 9eigenvalue being that its absolute value should be less than 1 It is thereforenatural from this point of view to have a test with a null hypothesis of nocointegration, with the restriction that there are two unit roots, against analternative of cointegration, with only one This feature applies to all thetests for cointegration that we consider.
from regression (14.44) An augmented Engle-Granger (EG) test is thenperformed in almost exactly the same way as an augmented Dickey-Fullertest, by running the regression
where p is chosen to remove any evidence of serial correlation in the residuals.
As with the ADF test, the test statistic may be either a τ statistic or a
defined by equation (14.34)
As the above notation suggests, the asymptotic distributions of these test
statistics depend on g When g = 1, we have a limiting case, since there is then
alternative Not surprisingly, for g = 1, the asymptotic distribution of each of
the Engle-Granger statistics is identical to the asymptotic distribution of the
then running regression (14.67) is the same (except for the initial observations)
as directly running an ADF testing regression like (14.32) If there is morethan one variable, but some or all of the components of the cointegrating
vector are known, then the proper value of g is 1 plus the number of parameters
we have g = 1 whatever the number of variables.
The densities move steadily to the left as g, the number of possibly
cointe-grated variables, increases In consequence, the critical values become larger
in absolute value, and the power of the test diminishes The other Granger tests display similar patterns
Engle-Since a set of g I(1) variables is cointegrated if there is a linear combination
of them that is I(0), any g independent linear combinations of the variables
Trang 10628 Unit Roots and Cointegration
−8.0 −7.0 −5.0 −4.0 −2.0 −1.0 0.0 1.0 2.0 3.0 4.0
N (0, 1) τc(1)
τc(12)
−2.861
−6.112
τ
f (τ )
Figure 14.4 Asymptotic densities of Engle-Granger τ c tests
is also a cointegrated set In other words, cointegration is a property of the linear space spanned by the variables, not of the particular choice of variables that span the space A problem with Engle-Granger test statistics is that they
asymptotic distribution of the test statistic under the null hypothesis is the
Consequently, Engle-Granger tests with the same data but different choices
ECM Tests
A second way to test for cointegration involves the estimation of an error-correction model We can base an ECM test for the null hypothesis that the
in that equation must be zero A suitable test statistic is thus the t statistic
1 × g vector, it follows the distribution that Ericsson and MacKinnon (2002)
When g = 1, the asymptotic distribution of the ECM statistic is identical to that of the corresponding Dickey-Fuller τ statistic This follows immediately
Trang 11−8.0 −7.0 −6.0 −4.0 −2.0 −1.0 0.0 1.0 2.0 3.0 4.0
N (0, 1) κc (1) = τ c(1)
κ c(12)
−2.861
−5.183
κ
f (κ)
Figure 14.5 Asymptotic densities of ECM κ c tests
from the fact that, for g = 1, equation (14.53) collapses to
which is equivalent to equation (14.31) However, when k > 1, the distribu-tions of the various κ statistics are not the same as those of the corresponding Engle-Granger τ statistics.
Equation (14.53) is less likely to suffer from serial correlation than the Engle-Granger test regression (14.67) because the error-correction term often has considerable explanatory power when there really is cointegration If serial
to equation (14.53) without affecting the asymptotic distributions of the test statistics Indeed, one can add any stochastic variable that is I(0) and exogen-ous or predetermined, as well as nontrending deterministic variables Thus
it is possible to perform ECM tests within the context of a well-specified econometric model, of which equation (14.53) is a special case Indeed, this is probably the best way to perform such a test, and it is one of the things that makes ECM tests attractive
figure is comparable to Figure 14.4 It can be seen that, for g > 1, the
critical values are somewhat smaller in absolute value than they are for the
corresponding EG tests The distributions of the κ statistics are also more spread out than those of the corresponding τ statistics, with positive values
much more likely to occur
Trang 12630 Unit Roots and Cointegration
Under the alternative hypothesis of cointegration, an ECM test is more likely
to reject the false null than an EG test Consider equation (14.52)
is just a version of the Engle-Granger test regression (14.67) We remarked
expect (14.68) to fit better than (14.67) and to be less likely to suffer fromserially correlated errors Thus we should expect the EG test to have lesspower than the ECM test in most cases It must be noted, however, that theECM test shares with the EG test the disadvantage that it depends on the
For more detailed discussions of ECM tests, see Campos, Ericsson, and Hendry(1996), Banerjee, Dolado, and Mestre (1998), and Ericsson and MacKinnon(2002) The densities graphed in Figure 14.5 are taken from the last of thesepapers, which provides programs that can be used to compute critical values
and P values for these tests.
Tests Based on a Vector Autoregression
A third way to test for cointegration is based on the VAR (14.60) The idea
is to estimate this VAR subject to the constraint (14.61) for various values
of the rank r of the impact matrix Π, using ML estimation based on the
independent across observations Null hypotheses for which there are any
number of cointegrating relations from 0 to g − 1 can then be tested against alternatives with a greater number of relations, up to a maximum of g Of course, if there really were g cointegrating relations, all the variables would
be I(0), and so this case is usually only of theoretical interest The mostconvenient test statistics are likelihood ratio (LR) statistics
We saw in the last section that a convenient way to obtain ML estimates ofthe restricted VAR (14.62) is to minimize the determinant of the matrix ofsums of squares and cross-products of the residuals We now describe how to
do this in general, and how to use the result in order to compute estimates
of sets of cointegrating vectors and LR test statistics We will not enter into
a discussion of why the recipes we provide work, since doing so would berather complicated But, since the methodology is in very common use inpractice, we will give detailed instructions as to how it can be implemented.See Banerjee, Dolado, Galbraith, and Hendry (1993, Chapter 8), Davidson(2000, Chapter 16), and Johansen (1995) for more detailed treatments
Trang 13variables X t and the lags ∆Y t−1 through ∆Y t−p This requires us to run
2g OLS regressions, all of which involve the same regressors, and yields two
The next step is to compute the g × g sample covariance matrices
Then we choose the corresponding eigenvectors to be the columns of a g × g
eigenvalue-eigenvector relation implies that AW = WΛ, where the diagonal entries of
The purpose of solving equations (14.70) in this way is that the first r columns
nec-essary identifying restrictions so that α and η are uniquely determined; recall
the indeterminacy expressed by equation (14.63) As we remarked in the last
section, once η is given, the equations (14.62) are linear in the other
para-meters, which can therefore be estimated by least squares
It can be shown that the maximized loglikelihood function for the restrictedmodel (14.62) is
Trang 14632 Unit Roots and Cointegration
Thus we can calculate the maximized loglikelihood function for any value ofthe number of cointegrating vectors, once we have found the eigenvalues of
the matrix (14.71) For given r, (14.73) depends on the r largest eigenvalues.
would not exist
As r increases, so does the value of the maximized loglikelihood function
given by expression (14.73) This makes sense, since we are imposing fewer
This is often called the trace statistic, because it can be thought of as the sum
of a subset of the elements on the principal diagonal of the diagonal matrix
−n log(I − Λ) Because the impact matrix Π cannot be written as a matrix
of coefficients of I(0) variables (recall the discussion in the last section), thedistributions of the trace statistic are nonstandard These distributions have
is used to test the null hypothesis that there are r cointegrating vectors against the alternative that there are g of them.
When the null hypothesis is that there are r cointegrating vectors and the alternative is that there are r + 1 of them, there is just one term in the sum
that appears in expression (14.74) The test statistic is then
distributions of this statistic for various values of r have been tabulated.
Like those of unit-root tests and single-equation cointegration tests, the
well be desirable to impose restrictions on the matrix B, and the distributions
also depend on what restrictions, if any, are imposed
A further complication is that some of the I(1) variables may be known not
variables in one part as exogenous and those in the other part as potentiallycointegrated The distributions of the test statistics then depend on how manyexogenous variables there are For details, see Harbo, Johansen, Nielsen, andRahbek (1998) and Pesaran, Shin, and Smith (2000)
Trang 150 5 10 15 20 25 30 35 40 45 50 55 60
r = 0
r = 1
r = 2
r = 3
r = 4
r = 5
λmax Figure 14.6 Asymptotic densities of some λmax tests
that r = 0, 1, 2, 3, 4, 5 under one popular assumption about B, namely, that
by Osterwald-Lenum (1992) We see from the figure that the mean and
becomes more symmetrical The mean and variance of the trace statistic,
rapidly as g − r increases Figure 14.6 is based on results from MacKinnon,
Haug, and Michelis (1999), which provides programs that can be used to
for all the standard cases, including systems with exogenous I(1) variables
combina-tions of them We will not take the time to prove this important property, but it is a reasonably straightforward consequence of the definitions given in this section Intuitively, it is a consequence of the fact that no particular variable or linear combination of variables is singled out in the specification
of the VAR (14.62), in contrast to the specifications of the regressions used
to implement EG and ECM tests
Trang 16634 Unit Roots and Cointegration
14.7 Final Remarks
This chapter has provided a reasonably brief introduction to the modeling
of nonstationary time series, a topic which has engendered a massive ature in a relatively short period of time A deeper treatment would haverequired a book instead of a chapter The asymptotic theory that is applica-ble when some variables have unit roots is very different from the conventionalasymptotic theory that we have encountered in previous chapters Moreover,the enormous number of different tests, each with its own nonstandard limit-ing distribution, can be intimidating However, we have seen that the samefundamental ideas underlie many of the techniques for both estimation andhypothesis testing in models that involve variables which have unit roots.14.8 Exercises
liter-14.1 Calculate the autocovariance E(w t ws ), s < t, of the standardardized random
walk given by (14.01).
14.2 Suppose that (1 − ρ(L))u t = e t is the autoregressive representation of the
series u t , where e t is white noise, and ρ(z) is a polynomial of degree p with
no constant term If u t has exactly one unit root, show that the polynomial
1 − ρ(z) can be factorized as
1 − ρ(z) = (1 − z)(1 − ρ0(z)), where 1−ρ0(z) is a polynomial of degree p−1 with no constant term and all its
roots strictly outside the unit circle Give the autoregressive representation of
the first-differenced series (1 − L)u t, and show that it implies that this series
by inductive arguments That is, show directly that the results are true for
n = 1, and then for each one show that, if the result is true for a given n, it
is also true for n + 1.
14.4 Consider the following random walk, in which a second-order polynomial in time is included in the defining equation:
y t = β0+ β1t + β2t2+ y t−1 + u t , u t ∼ IID(0, σ2).
Show that y t can be generated in terms of a standardized random walk w t
that satisfies (14.01) by the equation
y t = y0+ β0t + β1−1
2t(t + 1) + β2−1
6t(t + 1)(2t + 1) + σw t .
Can you obtain a similar result for the case in which the second-order
poly-nomial is replaced by a polypoly-nomial of degree p in time?
Trang 1714.5 For sample sizes of 50, 100, 200, 400, and 800, generate N pairs of data from
the DGP
y t = ρ1y t−1 + u t1 , y0= 0, u t1 ∼ NID(0, 1),
x t = ρ2x t−1 + u t2 , x0= 0, u t2 ∼ NID(0, 1), for the following values of ρ1and ρ2: −0.7, 0.0, 0.7, and 1 Then run regression (14.12) and record the proportion of the time that the ordinary t test for
β2= 0 rejects the null hypothesis at the 05 level Thus you need to perform
16 experiments for each of 5 sample sizes Choose a reasonably large value
of N, but not so large that you use an unreasonable amount of computer time The smallest value that would probably make sense is N = 10,000.
For which values of ρ1 and ρ2 does it seem plausible that the t test based
on the spurious regression (14.12) rejects the correct proportion of the time asymptotically? For which values is it clear that the test overrejects asymp- totically? Are there any values for which it appears that the test underrejects asymptotically?
Was it really necessary to run all 16 experiments? Explain.
14.6 Repeat the previous exercise using regression (14.13) instead of regression
(14.12) For which values of ρ1 and ρ2 does it seem plausible that the t test
based on this regression rejects the correct proportion of the time ically? For which values is it clear that the test overrejects asymptotically? Are there any values for which it appears that the test underrejects asymp- totically?
asymptot-14.7 Repeat some of the experiments in Exercise 14.5 with ρ1 = ρ2 = 0.8, using
a HAC covariance matrix estimator instead of the OLS covariance matrix
estimator for the computation of the t statistic A reasonable rule of thumb
is to set the lag truncation parameter p equal to three times the fourth root
of the sample size, rounded to the nearest integer You should also do a few experiments with sample sizes between 1,000 and 5,000 in order to see how
slowly the behavior of the t test approaches its nominal asymptotic behavior.
14.8 Repeat exercise 14.7 with unit root processes in place of stationary AR(1) cesses You should find that the use of a HAC estimator alleviates the extent
pro-of spurious regression, in the sense that the probability pro-of rejection tends to 1
more slowly as n → ∞ Intuitively, why should using a HAC estimator work,
even if only in very large samples, with stationary AR(1) processes but not with unit root processes?
14.9 The HAC estimators used in the preceding two exercises are estimates of the covariance matrix
(X > X) −1 X > ΩX(X > X) −1 , (14.76) where Ω is the true covariance matrix of the error terms Do just a few
experiments for sample sizes of 20, 40, and 60, with AR(1) variables in some
and unit root variables in others, in which you use the true Ω in (14.76) rather
than using a HAC estimator Hint: The result of Exercise 7.10 is useful for
the construction of X > ΩX You should find that the rejection rate is very
close to nominal even for these small samples.
Trang 18636 Unit Roots and Cointegration14.10 Consider the model with typical DGP
where w t is the standardized random walk (14.02) Demonstrate that any pair of terms from either sum on the right-hand side of the above expression
are uncorrelated Let the fourth moment of the white-noise process ε t be m4 Then show that the variance ofPn
of order n4 as n → ∞ Hint: Use the results of Exercise 14.3.
14.12 Consider the standardized Wiener process W (r) defined by (14.26) Show that, for 0 ≤ r1< r2≤ r3 < r4 ≤ 1, W (r2) − W (r1) and W (r4) − W (r3) are independent This property is called the property of independent increments
of the Wiener process Show that the covariance of W (r) and W (s) is equal
to min(r, s).
The process G(r), r ∈ [0, 1], defined by G(r) = W (r) − rW (1), where W (r)
is a standardized Wiener process, is called a Brownian bridge Show that
G(r) ∼ N (0, r(1 − r)), and that the covariance of G(r) and G(s) is s(1 − r) for r > s.
14.13 By using arguments similar to those leading to the result (14.29), demonstrate the result (14.30) For this purpose, the result of Exercise 4.8 may be helpful.
14.14 Show that, if w t is the standardized random walk (14.01), Pn
t=1 w t is of
order n 3/2 as n → ∞ By use of the definition (14.28) of the Riemann
integral, show that
Demonstrate that this plim is distributed as N (0,1/3).
Trang 19Show that the probability limit of the formula (14.20) for the statistic z c can
be written in terms of a standardized Wiener process W (r) as
14.15 The file intrates-m.data contains several monthly interest rate series for the
United States from 1955 to 2001 Let R tdenote the 10-year government bond rate Using data for 1957 through 2001, test the hypothesis that this series
has a unit root with ADF τ c , τ ct , τ ctt , z c , z ct , and z ctt tests, using whatever
value(s) of p seem reasonable.
14.16 Consider the simplest ADF testing regression
∆y t = β 0 y t−1 + δ∆y t−1 + e t ,
and suppose that the data are generated by the simplest random walk:
y t = w t , where w t is the standardized random walk (14.01) If P1 is the
orthogonal projection on to the lagged dependent variable ∆y t−1 , and if w −1
is the n vector with typical element w t−1, show that the expressions
Ax i = λ i x i Prove that the x i are linearly independent.
14.18 Show that the expression n −1Pn
t=1 v t1 v t2 , where v t1 and v t2 are given
by (14.41), has an expectation and a variance which both tend to finite limits
as n → ∞ For the variance, the easiest way to proceed is to express the v tias
in (14.41), and to count the number of nonzero contributions to the variance.
14.19 If the p × q matrix A has rank r, where r ≤ p and r ≤ q, show that there exist a p × r matrix B and a q × r matrix C, both of full column rank r, such that A = BC > Show further that any matrix of the form BC > , where B is
p × r with r ≤ p and C is q × r with r ≤ q, has rank r if both B and C have rank r.
14.20 Generate two I(1) series y t1 and y t2 using the DGP given by (14.45) with
x11 = x21 = 1, x12 = 0.5, and x22 = 0.3 The series v t1 and v t2 should be
generated by (14.41), with λ1= 1 and λ2 = 0.7, the series e t1 and e t2 being white noise with a contemporaneous covariance matrix
Σ =
h
1 0.7 0.7 1.5
i
.
Trang 20638 Unit Roots and Cointegration
Perform a set of simulation experiments for sample sizes n = 30, 50, 100,
200, and 500 in which the parameter η2 of the stationary linear combination
y t1 − η2y t2 is estimated first by (14.46), and then as −ˆ δ2/ˆ δ1from the sion (14.52) You should observe that the first estimator is substantially more biased than the second.
regres-Verify the super-consistency of both estimators by computing the first two
moments of n(ˆ η2−η2) and showing that they are roughly constant as n varies,
at least for larger values of n.
of the parameter η2 of the cointegration relation The easiest way to proceed
is to solve the quadratic equation (14.78), choosing the root for which κ is
smallest.
14.25 Let the p × p matrix A be symmetric, and suppose that A has two distinct eigenvalues λ1and λ2, with corresponding eigenvectors z1and z2 Prove that
z1 and z2 are orthogonal.
Use this result to show that there is a g × g matrix Z, with Z > Z = I (that
is, Z is an orthogonal matrix), such that AZ = ZΛ, where Λ is a diagonal matrix the entries of which are the eigenvalues of A.
14.26 Let r t denote the logarithm of the 10-year government bond rate, and let s t
denote the logarithm of the 1-year government bond rate, where monthly data
on both rates are available in the file intrates-m.data Using data for 1957
through 2001, use whatever augmented Engle-Granger τ tests seem
appropri-ate to test the null hypothesis that these two series are not cointegrappropri-ated 14.27 Consider once again the Canadian consumption data in the file consump- tion.data, for the period 1953:1 to 1996:4 Perform a variety of appropriate tests of the hypotheses that the levels of consumption and income have unit roots Repeat the exercise for the logs of these variables.
Trang 21If you fail to reject the hypotheses that the levels or the logs of these variables have unit roots, proceed to test whether they are cointegrated, using two ver- sions of the EG test procedure, one with consumption, the other with income,
as the regressand in the cointegrating regression Similarly, perform two sions of the ECM test Finally, test the null hypothesis of no cointegration using Johansen’s VAR-based procedure.
Trang 22We have already discussed a large number of procedures that can be used
as specification tests These include t and F tests for omitted variables and
for parameter constancy (Section 4.4), along with similar tests for nonlinearregression models (Section 6.7) and IV regression (Section 8.5), tests for het-eroskedasticity (Section 7.5), tests for serial correlation (Section 7.7), tests
of common factor restrictions (Section 7.9), DWH tests (Section 8.7), tests
of overidentifying restrictions (Sections 8.6, 9.4, 9.5, 12.4, and 12.5), and thethree classical tests for models estimated by maximum likelihood, notably LMtests (Section 10.6)
In this chapter, we discuss a number of other procedures that are designedfor testing the specification of econometric models Some of these proceduresexplicitly involve testing a model against a less restricted alternative Others
do not make the alternative explicit and are intended to have power against alarge number of plausible alternatives In the next section, we discuss a variety
of tests that are based on artificial regressions Then, in Section 15.3, wediscuss nonnested hypothesis tests, which are designed to test the specification
of a model when alternative models are available In Section 15.4, we discussmodel selection based on information criteria Finally, in Section 15.5, weintroduce the concept of nonparametric estimation Nonparametric methodsavoid specification errors caused by imposing an incorrect functional form,and the validity of parametric models can be checked by comparing themwith nonparametric ones
Trang 2315.2 Specification Tests Based on Artificial Regressions
In previous chapters, we have encountered numerous examples of artificialregressions These include the Gauss-Newton regression (Section 6.7) andits heteroskedasticity-robust variant (Section 6.8), the OPG regression (Sec-tion 10.5), and the binary response model regression (Section 11.3) We canwrite any of these artificial regressions as
where θ is a parameter vector of length k, r(θ) is a vector, often but by no means always of length equal to the sample size n, and R(θ) is a matrix with
as many rows as r(θ) and k columns For example, in the case of the GNR,
r(θ) is a vector of residuals, written as a function of the data and parameters,
and R(θ) is a matrix of derivatives of the regression function with respect to
the parameters
In order for (15.01) to be a valid artificial regression, the vector r(θ) and the matrix R(θ) must satisfy certain properties, which all of the artificial
regressions we have studied do satisfy These properties are given in outline
in Exercise 8.20, and we restate them more formally here We use a notation
that was introduced in Section 9.5, whereby M denotes a model, µ denotes a
under the DGP µ See the discussion in Section 9.5.
An artificial regression of the form (15.01) corresponds to a model M with
parameter vector θ, and to a root-n consistent asymptotically normal
• The artificial regressand and the artificial regressors are orthogonal when
size, and N is the number of rows of r and R, or by
Trang 24642 Testing the Specification of Econometric Models
• The artificial regression allows for one-step estimation, in the sense that,
if ´b denotes the vector of OLS parameter estimates obtained by running
plim
The Gauss-Newton regression for a nonlinear regression model, together withthe least-squares estimator of the parameters of the model, satisfies the aboveconditions For the GNR, the asymptotic covariance matrix is given by equa-tion (15.03) The OPG regression for any model that can be estimated bymaximum likelihood, together with the ML estimator of its parameters, alsosatisfies the above conditions, but the asymptotic covariance matrix is given
by equation (15.02) See Davidson and MacKinnon (2001) for a more detaileddiscussion of artificial regressions
Now consider the artificial regression
encountered instances of regressions like (15.05), where both R(θ) and Z(θ) were matrices of derivatives, with R(θ) corresponding to the parameters of
a restricted version of the model and Z(θ) corresponding to additional
para-meters that appear only in the unrestricted model In such a case, if the
arti-ficial regression like (15.05) and testing the hypothesis that c = 0 provides
a way of testing those restrictions; recall the discussion in Section 6.7 in the
estimates from the restricted model
A great many specification tests may be based on artificial regressions of theform (15.05) The null hypothesis under test is that the model M to whichregression (15.01) corresponds is correctly specified It is not necessary that
Z(θ) which satisfies the following three conditions can be used in (15.05) to
obtain a valid specification test
R1 For every DGP µ ∈ M,
plim
Copyright c
Trang 25R2 Let r µ , R µ , and Z µ denote r(θ µ ), R(θ µ ), and Z(θ µ), respectively Then,
for any µ ∈ M, if the asymptotic covariance matrix is given by (15.02),
which is required to be asymptotically multivariate normal If instead theasymptotic covariance matrix is given by equation (15.03), then the ma-trix (15.07) must be multiplied by the probability limit of the estimatederror variance from the artificial regression
R3 The Jacobian matrix containing the partial derivatives of the elements
Since a proof of the sufficiency of these conditions requires a good deal ofalgebra, we relegate it to a technical appendix
When these conditions are satisfied, we can test the correct specification ofthe model M against an alternative in which equation (15.06) does not hold
by testing the hypothesis that c = 0 in regression (15.05) If the asymptotic
covariance matrix is given by equation (15.02), then the difference between theexplained sum of squares from regression (15.05) and the ESS from regression
the null hypothesis This is not true when the asymptotic covariance matrix
is given by equation (15.03), in which case we can use an asymptotic t test if
r = 1 or an asymptotic F test if r > 1.
The RESET Test
One of the oldest specification tests for linear regression models, but one that
is still widely used, is the regression specification error test, or RESET test,which was originally proposed by Ramsey (1969) The idea is to test the nullhypothesis that
regression
The test statistic is the ordinary t statistic for γ = 0.
At first glance, the RESET procedure may not seem to be based on an artificial
regression But it is easy to show (Exercise 15.2) that the t statistic for γ = 0
Trang 26644 Testing the Specification of Econometric Models
in regression (15.09) is identical to the t statistic for c = 0 in the GNR
ˆ
three conditions for a valid specification test regression are satisfied First,
is equally easy to check For condition R3, let z(β) be the n vector with
the-orem, is asymptotically normal with mean zero and finite variance It is
Thus condition R3 holds, and the RESET test, implemented either by either
of the regressions (15.09) or (15.10), is seen to be asymptotically valid.Actually, the RESET test is not merely valid asymptotically It is exact infinite samples whenever the model that is being tested satisfies the strong
assumptions needed for t statistics to have their namesake distribution; see
Section 4.4 for a statement of those assumptions To see why, note that the
readers are invited to show in Exercise 15.3, this implies that the t statistic for c = 0 yields an exact test under classical assumptions.
Like most specification tests, the RESET procedure is designed to have poweragainst a variety of alternatives However, it can also be derived as a testagainst a specific alternative Suppose that
at x = 0 A simple example of such a function is
Copyright c
Trang 27We first encountered the family of functions τ (·) in Section 11.3, in connection
with tests of the functional form of binary response models
By l’Hˆopital’s Rule, the nonlinear regression model (15.11) reduces to the
linear regression model (15.08) when δ = 0 It is not hard to show, using equations (11.29), that the GNR for testing the null hypothesis that δ = 0 is
Thus RESET can be derived as a test of δ = 0 in the nonlinear regression
model (15.11) For more details, see MacKinnon and Magee (1990), whichalso discusses some other specification tests that can be used to test (15.08)against nonlinear models involving transformations of the dependent variable.Some versions of the RESET procedure add the cube, and sometimes also the
the alternative is (15.11), but it may give the test more power against someother alternatives In general, however, we recommend the simplest version
of the test, namely, the t test for γ = 0 in regression (15.09).
Conditional Moment Tests
If a model M is correctly specified, many random quantities that are functions
of the dependent variable(s) should have expectations of zero Often, theseexpectations are taken conditional on some information set For example,
observations t This sort of requirement, following from the hypothesis that
M is correctly specified, is known as a moment condition
A moment condition is purely theoretical However, we can often calculatethe empirical counterpart of a moment condition and use it as the basis of aconditional moment test For a linear regression, of course, we already know
t statistic for this additional regressor to have a coefficient of 0.
More generally, consider a moment condition of the form
model M As the notation implies, the expectation in (15.12) is computed
using a DGP in M with parameter vector θ The t subscript on the moment
on exogenous or predetermined variables Equation (15.12) implies that the
Trang 28646 Testing the Specification of Econometric Models
test whether condition (15.12) holds for each observation, but we can testwhether it holds on average Since we will be interested in asymptotic tests,
it is natural to consider the probability limit of the average Thus we canreplace (15.12) by the somewhat weaker condition
to as an empirical moment We wish to test whether its value is significantlydifferent from zero
variance could be consistently estimated by the usual sample variance,
account of its dependence on y, we have to take this parameter uncertainty
The easiest way to see the effects of parameter uncertainty is to considerconditional moment tests based on artificial regressions Suppose there is anartificial regression of the form (15.01) in correspondence with the model M
artificial regression If the number N of artificial observations is not equal
to the sample size n, some algebraic manipulation may be needed in order
to express the moment functions in a convenient form, but we ignore such
problems here and suppose that N = n.
Now consider the artificial regression of which the typical observation is
Copyright c
Trang 29test statistic whenever equation (15.16) is evaluated at a root-n consistent
equation and taking probability limits, it is not difficult to see that this t
sta-tistic is actually testing the hypothesis that
plim
n→∞
1
condition that we wish to test, as can be seen from the following argument:
It is clear from expression (15.18) that, as we indicated above, the
moment but not in the leading-order term for the latter one The reduction
in variance caused by the projection is a phenomenon analogous to the loss
of degrees of freedom in Hansen-Sargan tests caused by the need to estimateparameters; recall the discussion in Section 9.4 Indeed, since moment func-tions are zero functions, conditional moment tests can be interpreted as tests
of overidentifying restrictions
Examples of Conditional Moment Tests
Suppose the model under test is the nonlinear regression model (6.01), andthe moment functions can be written as
of exogenous or predetermined variables and the parameters We are using
β instead of θ to denote the vector of parameter estimates here because the
condition (15.13) can be based on the following Gauss-Newton regression:
Trang 30648 Testing the Specification of Econometric Models
under the usual regularity conditions for nonlinear regression, all we have
to show is that conditions R1–R3 are satisfied by the GNR (15.20) tion R1 is trivially satisfied, since what it requires is precisely what we wish
Condi-to test Condition R2, for the covariance matrix (15.03), follows easily from
predetermined variables
Condition R3 requires a little more work, however Let z(β) and u(β) be the
Since the elements of z(β) are predetermined, so are those of its derivative
from a law of large numbers that the first term of expression (15.21) tends to
which is condition R3 for the GNR (15.20) Thus we conclude that this GNRcan be used to test the moment condition (15.13)
The above reasoning can easily be generalized to allow us to test more than
one moment condition at a time Let Z(β) denote an n × r matrix of
func-tions of the data, each column of which is asymptotically orthogonal to the
vector u under the null hypothesis that is to be tested, in the sense that
hypo-thesis An ordinary F test for c = 0 is also asymptotically valid.
Conditional moment tests based on the GNR are often useful for linear andnonlinear regression models, but they evidently cannot be used when the GNRitself is not applicable With models estimated by maximum likelihood, testscan be based on the OPG regression that was introduced in Section 10.5 This
consistent and asymptotically normal; see Section 10.3
The OPG regression was originally given in equation (10.72) It is repeatedhere for convenience with a minor change of notation:
Copyright c
Trang 31The regressand is an n vector of 1s, and the regressor matrix is the matrix
of contributions to the gradient, with typical element defined by (10.26) Theartificial regression corresponds to the model implicitly defined by the matrix
once more the notation hides the dependence on the data Then the testing
regression is simplicity itself: We add m(θ) to regression (15.23) as an extra
regressor, obtaining
The test statistic is the t statistic on the extra regressor The regressors here can be evaluated at any root-n consistent estimator, but it is most common
If several moment conditions are to be tested simultaneously, then we can
form the n × r matrix M(θ), each column of which is a vector of moment
functions The testing regression is then
test statistics are available, including the explained sum of squares, n times
the null hypothesis, as is r times the third If the regressors in equation (15.25)
the F statistic is asymptotically valid.
The artificial regression (15.23) is valid for a very wide variety of models.Condition R2 requires that we be able to apply a central limit theorem to
to find a suitable central limit theorem Condition R3 is also satisfied undervery mild regularity conditions What it requires is that the derivatives of
Formally, we require that
to show that equation (15.26) holds under the usual regularity conditionsfor ML estimation This property and its use in conditional moment tests
Trang 32650 Testing the Specification of Econometric Models
implemented by an OPG regression were first established by Newey (1985)
It is straightforward to extend this result to the case in which we have a
matrix M (θ) of moment functions.
As we noted in Section 10.5, many tests based on the OPG regression are prone
to overreject the null hypothesis, sometimes very severely, in finite samples
It is therefore often a good idea to bootstrap conditional moment tests based
on the OPG regression Since the model under test is estimated by maximumlikelihood, a fully parametric bootstrap is appropriate It is generally quiteeasy to implement such a bootstrap, unless estimating the original model isunusually difficult or expensive
Tests for Skewness and Kurtosis
One common application of conditional moment tests is checking the residualsfrom an econometric model for skewness and excess kurtosis By “excess”
distribution; see Exercise 4.2 The presence of significant departures fromnormality may indicate that a model is misspecified, or it may indicate that
we should use a different estimation method For example, although leastsquares may still perform well in the presence of moderate skewness and excesskurtosis, it cannot be expected to do so when the error terms are extremelyskewed or have very thick tails
Both skewness and excess kurtosis are often encountered in returns data fromfinancial markets, especially when the returns are measured over short periods
of time A good model should eliminate, or at least substantially reduce,the skewness and excess kurtosis that is generally evident in daily, weekly,and, to a lesser extent, monthly returns data Thus one way to evaluate amodel for financial returns, such as the ARCH models that were discussed inSection 13.5, is to test the residuals for skewness and excess kurtosis
We cannot base tests for skewness and excess kurtosis in regression models
on the GNR, because the GNR is designed only for testing against tives that involve the conditional mean of the dependent variable There is
or predetermined variables in such a way that the moment function (15.19)corresponds to the condition we wish to test Instead, one valid approach
is to test the slightly stronger assumption that the error terms are normallydistributed by using the OPG regression We now discuss this approach andshow that even simpler tests are available
The OPG regression that corresponds to the linear regression model
Trang 33Here u t (β) ≡ y t − X t β, and the assumption that the error terms are normal
implies that they are not skewed and do not suffer from excess kurtosis Totest the assumption that they are not skewed, the appropriate test regressor
This is just a special case of regression (15.24), and the test statistic is simply
the t statistic for c = 0.
Regression (15.28) is unnecessarily complicated First, observe that the testregressor is asymptotically orthogonal under the null to the regressor that
corresponds to the parameter σ To see this, evaluate the regressors at the
and so we see that
regression (15.28) is asymptotically unchanged if we simply omit the regressor
corresponding to σ.
linear combination of the regressors that correspond to β; recall the discussion
in Section 2.4 in connection with the FWL Theorem Thus, since we assumed
that there is a constant term in the regression, the t statistic is unchanged if
asymptotically orthogonal to all the regressors that correspond to β, as can
be seen from the following calculation:
The above arguments imply that we can obtain a valid test simply by using
the t statistic from the regression
which is numerically identical to the t statistic for the sample mean of the
single regressor here to be 0 Because the plim of the error variance is just 1,
Trang 34652 Testing the Specification of Econometric Models
since the regressor and regressand are asymptotically orthogonal, both of these
t statistics are asymptotically equal to
ˆ
t
so the plim of the denominator is the square root of
hypothesis that the error terms are normally distributed
asymptotic-ally orthogonal to the other regressors in (15.27) without changing the t
t − 6σ2u2
so running the test regression
which is defined in terms of the normalized residuals, provides an appropriatetest statistic As readers are invited to check in Exercise 15.8, this statistic isasymptotically equivalent to the simpler statistic
It is important that the denominator of the normalized residual be the ML
asymptotic N (0, 1) distribution under the null hypothesis of normality.
Copyright c
Trang 35The squares of τ3 and τ4 are widely used as a test statistics for skewness and
we prefer to use the statistics themselves rather than their squares, since the
to the right and negative when they are skewed to the left Similarly, the
is negative excess kurtosis
It can be shown (see Exercise 15.8 again) that the test statistics (15.31) and(15.33) are asymptotically independent under the null Therefore, a joint testfor skewness and excess kurtosis can be based on the statistic
forms, by Jarque and Bera (1980) and Kiefer and Salmon (1983); see alsoBera and Jarque (1982) Many regression packages calculate these statistics
as a matter of course
depend solely on normalized residuals This implies that, for a linear sion model with fixed regressors, they are pivotal under the null hypothesis ofnormality Therefore, if we use the parametric bootstrap in this situation, wecan obtain exact tests based on these statistics; see the discussion at the end
regres-of Section 7.7 Even for nonlinear regression models or models with laggeddependent variables, parametric bootstrap tests should work very much betterthan asymptotic tests
fur-nishes the normalized residuals does not contain a constant or the equivalent
In such unusual cases, it is necessary to proceed differently, for instance, byusing the full OPG regression (15.27) with one or two test regressors TheOPG regression can also be used to test for skewness and excess kurtosis inmodels that are not regression models, such as the models with ARCH errorsthat were discussed in Section 13.6
Information Matrix Tests
In Section 10.3, we first encountered the information matrix equality Thisfamous result, which is given in equation (10.34), tells us that, for a model
estimated by maximum likelihood with parameter vector θ, the asymptotic information matrix, I(θ), is equal to minus the asymptotic Hessian, H(θ).
The proof of this result, which was given in Exercises 10.6 and 10.7, depends
on the DGP being a special case of the model Therefore, we should expectthat, in general, the information matrix equality does not hold when the model
we are estimating is misspecified This suggests that testing this equality isone way to test the specification of a statistical model This idea was first