Then, in Section 15.3, wediscuss nonnested hypothesis tests, which are designed to test the specification of a model when alternative models are available.. 15.2 Specification Tests Base
Trang 1Chapter 15 Testing the Specification
of Econometric Models
15.1 Introduction
As we first saw in Section 3.7, estimating a misspecified regression modelgenerally yields biased and inconsistent parameter estimates This is true forregression models whenever we incorrectly omit one or more regressors thatare correlated with the regressors included in the model Except in certainspecial cases, some of which we have discussed, it is also true for more generaltypes of model and more general types of misspecification This suggeststhat the specification of every econometric model should be thoroughly testedbefore we even tentatively accept its results
We have already discussed a large number of procedures that can be used
as specification tests These include t and F tests for omitted variables and
for parameter constancy (Section 4.4), along with similar tests for nonlinearregression models (Section 6.7) and IV regression (Section 8.5), tests for het-eroskedasticity (Section 7.5), tests for serial correlation (Section 7.7), tests
of common factor restrictions (Section 7.9), DWH tests (Section 8.7), tests
of overidentifying restrictions (Sections 8.6, 9.4, 9.5, 12.4, and 12.5), and thethree classical tests for models estimated by maximum likelihood, notably LMtests (Section 10.6)
In this chapter, we discuss a number of other procedures that are designedfor testing the specification of econometric models Some of these proceduresexplicitly involve testing a model against a less restricted alternative Others
do not make the alternative explicit and are intended to have power against alarge number of plausible alternatives In the next section, we discuss a variety
of tests that are based on artificial regressions Then, in Section 15.3, wediscuss nonnested hypothesis tests, which are designed to test the specification
of a model when alternative models are available In Section 15.4, we discussmodel selection based on information criteria Finally, in Section 15.5, weintroduce the concept of nonparametric estimation Nonparametric methodsavoid specification errors caused by imposing an incorrect functional form,and the validity of parametric models can be checked by comparing themwith nonparametric ones
Trang 215.2 Specification Tests Based on Artificial Regressions 64115.2 Specification Tests Based on Artificial Regressions
In previous chapters, we have encountered numerous examples of artificialregressions These include the Gauss-Newton regression (Section 6.7) andits heteroskedasticity-robust variant (Section 6.8), the OPG regression (Sec-tion 10.5), and the binary response model regression (Section 11.3) We canwrite any of these artificial regressions as
the parameters
In order for (15.01) to be a valid artificial regression, the vector r(θ) and the matrix R(θ) must satisfy certain properties, which all of the artificial
regressions we have studied do satisfy These properties are given in outline
in Exercise 8.20, and we restate them more formally here We use a notation
that was introduced in Section 9.5, whereby M denotes a model, µ denotes a
under the DGP µ See the discussion in Section 9.5.
An artificial regression of the form (15.01) corresponds to a model M with
parameter vector θ, and to a root-n consistent asymptotically normal
• The artificial regressand and the artificial regressors are orthogonal when
size, and N is the number of rows of r and R, or by
Trang 3• The artificial regression allows for one-step estimation, in the sense that,
if ´b denotes the vector of OLS parameter estimates obtained by running
plim
The Gauss-Newton regression for a nonlinear regression model, together withthe least-squares estimator of the parameters of the model, satisfies the aboveconditions For the GNR, the asymptotic covariance matrix is given by equa-tion (15.03) The OPG regression for any model that can be estimated bymaximum likelihood, together with the ML estimator of its parameters, alsosatisfies the above conditions, but the asymptotic covariance matrix is given
by equation (15.02) See Davidson and MacKinnon (2001) for a more detaileddiscussion of artificial regressions
Now consider the artificial regression
encountered instances of regressions like (15.05), where both R(θ) and Z(θ) were matrices of derivatives, with R(θ) corresponding to the parameters of
a restricted version of the model and Z(θ) corresponding to additional
para-meters that appear only in the unrestricted model In such a case, if the
arti-ficial regression like (15.05) and testing the hypothesis that c = 0 provides
a way of testing those restrictions; recall the discussion in Section 6.7 in the
estimates from the restricted model
A great many specification tests may be based on artificial regressions of theform (15.05) The null hypothesis under test is that the model M to whichregression (15.01) corresponds is correctly specified It is not necessary that
Z(θ) which satisfies the following three conditions can be used in (15.05) to
obtain a valid specification test
R1 For every DGP µ ∈ M,
plim
Trang 415.2 Specification Tests Based on Artificial Regressions 643
for any µ ∈ M, if the asymptotic covariance matrix is given by (15.02),
which is required to be asymptotically multivariate normal If instead theasymptotic covariance matrix is given by equation (15.03), then the ma-trix (15.07) must be multiplied by the probability limit of the estimatederror variance from the artificial regression
R3 The Jacobian matrix containing the partial derivatives of the elements
Since a proof of the sufficiency of these conditions requires a good deal ofalgebra, we relegate it to a technical appendix
When these conditions are satisfied, we can test the correct specification ofthe model M against an alternative in which equation (15.06) does not hold
by testing the hypothesis that c = 0 in regression (15.05) If the asymptotic
covariance matrix is given by equation (15.02), then the difference between theexplained sum of squares from regression (15.05) and the ESS from regression
the null hypothesis This is not true when the asymptotic covariance matrix
is given by equation (15.03), in which case we can use an asymptotic t test if
r = 1 or an asymptotic F test if r > 1.
The RESET Test
One of the oldest specification tests for linear regression models, but one that
is still widely used, is the regression specification error test, or RESET test,which was originally proposed by Ramsey (1969) The idea is to test the nullhypothesis that
regression
The test statistic is the ordinary t statistic for γ = 0.
At first glance, the RESET procedure may not seem to be based on an artificial
regression But it is easy to show (Exercise 15.2) that the t statistic for γ = 0
Trang 5in regression (15.09) is identical to the t statistic for c = 0 in the GNR
ˆ
three conditions for a valid specification test regression are satisfied First,
is equally easy to check For condition R3, let z(β) be the n vector with
the-orem, is asymptotically normal with mean zero and finite variance It is
Thus condition R3 holds, and the RESET test, implemented either by either
of the regressions (15.09) or (15.10), is seen to be asymptotically valid.Actually, the RESET test is not merely valid asymptotically It is exact infinite samples whenever the model that is being tested satisfies the strong
assumptions needed for t statistics to have their namesake distribution; see
Section 4.4 for a statement of those assumptions To see why, note that the
readers are invited to show in Exercise 15.3, this implies that the t statistic for c = 0 yields an exact test under classical assumptions.
Like most specification tests, the RESET procedure is designed to have poweragainst a variety of alternatives However, it can also be derived as a testagainst a specific alternative Suppose that
at x = 0 A simple example of such a function is
Trang 615.2 Specification Tests Based on Artificial Regressions 645
We first encountered the family of functions τ (·) in Section 11.3, in connection
with tests of the functional form of binary response models
By l’Hˆopital’s Rule, the nonlinear regression model (15.11) reduces to the
linear regression model (15.08) when δ = 0 It is not hard to show, using equations (11.29), that the GNR for testing the null hypothesis that δ = 0 is
Thus RESET can be derived as a test of δ = 0 in the nonlinear regression
model (15.11) For more details, see MacKinnon and Magee (1990), whichalso discusses some other specification tests that can be used to test (15.08)against nonlinear models involving transformations of the dependent variable.Some versions of the RESET procedure add the cube, and sometimes also the
the alternative is (15.11), but it may give the test more power against someother alternatives In general, however, we recommend the simplest version
of the test, namely, the t test for γ = 0 in regression (15.09).
Conditional Moment Tests
If a model M is correctly specified, many random quantities that are functions
of the dependent variable(s) should have expectations of zero Often, theseexpectations are taken conditional on some information set For example,
observations t This sort of requirement, following from the hypothesis that
M is correctly specified, is known as a moment condition
A moment condition is purely theoretical However, we can often calculatethe empirical counterpart of a moment condition and use it as the basis of aconditional moment test For a linear regression, of course, we already know
t statistic for this additional regressor to have a coefficient of 0.
More generally, consider a moment condition of the form
model M As the notation implies, the expectation in (15.12) is computed
using a DGP in M with parameter vector θ The t subscript on the moment
on exogenous or predetermined variables Equation (15.12) implies that the
Trang 7m t (y t , θ) are elementary zero functions in the sense of Section 9.5 We cannot
test whether condition (15.12) holds for each observation, but we can testwhether it holds on average Since we will be interested in asymptotic tests,
it is natural to consider the probability limit of the average Thus we canreplace (15.12) by the somewhat weaker condition
to as an empirical moment We wish to test whether its value is significantlydifferent from zero
variance could be consistently estimated by the usual sample variance,
account of its dependence on y, we have to take this parameter uncertainty
The easiest way to see the effects of parameter uncertainty is to considerconditional moment tests based on artificial regressions Suppose there is anartificial regression of the form (15.01) in correspondence with the model M
artificial regression If the number N of artificial observations is not equal
to the sample size n, some algebraic manipulation may be needed in order
to express the moment functions in a convenient form, but we ignore such
problems here and suppose that N = n.
Now consider the artificial regression of which the typical observation is
Trang 815.2 Specification Tests Based on Artificial Regressions 647
test statistic whenever equation (15.16) is evaluated at a root-n consistent
equation and taking probability limits, it is not difficult to see that this t
sta-tistic is actually testing the hypothesis that
plim
n→∞
1
condition that we wish to test, as can be seen from the following argument:
It is clear from expression (15.18) that, as we indicated above, the
moment but not in the leading-order term for the latter one The reduction
in variance caused by the projection is a phenomenon analogous to the loss
of degrees of freedom in Hansen-Sargan tests caused by the need to estimateparameters; recall the discussion in Section 9.4 Indeed, since moment func-tions are zero functions, conditional moment tests can be interpreted as tests
of overidentifying restrictions
Examples of Conditional Moment Tests
Suppose the model under test is the nonlinear regression model (6.01), andthe moment functions can be written as
of exogenous or predetermined variables and the parameters We are using
β instead of θ to denote the vector of parameter estimates here because the
condition (15.13) can be based on the following Gauss-Newton regression:
Trang 9where ˆβ is the vector of NLS estimates of the parameters, and X t (β) is the
under the usual regularity conditions for nonlinear regression, all we have
to show is that conditions R1–R3 are satisfied by the GNR (15.20) tion R1 is trivially satisfied, since what it requires is precisely what we wish
Condi-to test Condition R2, for the covariance matrix (15.03), follows easily from
predetermined variables
Condition R3 requires a little more work, however Let z(β) and u(β) be the
Since the elements of z(β) are predetermined, so are those of its derivative
from a law of large numbers that the first term of expression (15.21) tends to
which is condition R3 for the GNR (15.20) Thus we conclude that this GNRcan be used to test the moment condition (15.13)
The above reasoning can easily be generalized to allow us to test more than
one moment condition at a time Let Z(β) denote an n × r matrix of
func-tions of the data, each column of which is asymptotically orthogonal to the
vector u under the null hypothesis that is to be tested, in the sense that
hypo-thesis An ordinary F test for c = 0 is also asymptotically valid.
Conditional moment tests based on the GNR are often useful for linear andnonlinear regression models, but they evidently cannot be used when the GNRitself is not applicable With models estimated by maximum likelihood, testscan be based on the OPG regression that was introduced in Section 10.5 This
consistent and asymptotically normal; see Section 10.3
The OPG regression was originally given in equation (10.72) It is repeatedhere for convenience with a minor change of notation:
Trang 1015.2 Specification Tests Based on Artificial Regressions 649
The regressand is an n vector of 1s, and the regressor matrix is the matrix
of contributions to the gradient, with typical element defined by (10.26) Theartificial regression corresponds to the model implicitly defined by the matrix
once more the notation hides the dependence on the data Then the testing
regression is simplicity itself: We add m(θ) to regression (15.23) as an extra
regressor, obtaining
The test statistic is the t statistic on the extra regressor The regressors here can be evaluated at any root-n consistent estimator, but it is most common
If several moment conditions are to be tested simultaneously, then we can
form the n × r matrix M(θ), each column of which is a vector of moment
functions The testing regression is then
test statistics are available, including the explained sum of squares, n times
the null hypothesis, as is r times the third If the regressors in equation (15.25)
the F statistic is asymptotically valid.
The artificial regression (15.23) is valid for a very wide variety of models.Condition R2 requires that we be able to apply a central limit theorem to
to find a suitable central limit theorem Condition R3 is also satisfied undervery mild regularity conditions What it requires is that the derivatives of
Formally, we require that
to show that equation (15.26) holds under the usual regularity conditionsfor ML estimation This property and its use in conditional moment tests
Trang 11implemented by an OPG regression were first established by Newey (1985).
It is straightforward to extend this result to the case in which we have a
matrix M (θ) of moment functions.
As we noted in Section 10.5, many tests based on the OPG regression are prone
to overreject the null hypothesis, sometimes very severely, in finite samples
It is therefore often a good idea to bootstrap conditional moment tests based
on the OPG regression Since the model under test is estimated by maximumlikelihood, a fully parametric bootstrap is appropriate It is generally quiteeasy to implement such a bootstrap, unless estimating the original model isunusually difficult or expensive
Tests for Skewness and Kurtosis
One common application of conditional moment tests is checking the residualsfrom an econometric model for skewness and excess kurtosis By “excess”
distribution; see Exercise 4.2 The presence of significant departures fromnormality may indicate that a model is misspecified, or it may indicate that
we should use a different estimation method For example, although leastsquares may still perform well in the presence of moderate skewness and excesskurtosis, it cannot be expected to do so when the error terms are extremelyskewed or have very thick tails
Both skewness and excess kurtosis are often encountered in returns data fromfinancial markets, especially when the returns are measured over short periods
of time A good model should eliminate, or at least substantially reduce,the skewness and excess kurtosis that is generally evident in daily, weekly,and, to a lesser extent, monthly returns data Thus one way to evaluate amodel for financial returns, such as the ARCH models that were discussed inSection 13.5, is to test the residuals for skewness and excess kurtosis
We cannot base tests for skewness and excess kurtosis in regression models
on the GNR, because the GNR is designed only for testing against tives that involve the conditional mean of the dependent variable There is
or predetermined variables in such a way that the moment function (15.19)corresponds to the condition we wish to test Instead, one valid approach
is to test the slightly stronger assumption that the error terms are normallydistributed by using the OPG regression We now discuss this approach andshow that even simpler tests are available
The OPG regression that corresponds to the linear regression model
Trang 1215.2 Specification Tests Based on Artificial Regressions 651
implies that they are not skewed and do not suffer from excess kurtosis Totest the assumption that they are not skewed, the appropriate test regressor
This is just a special case of regression (15.24), and the test statistic is simply
the t statistic for c = 0.
Regression (15.28) is unnecessarily complicated First, observe that the testregressor is asymptotically orthogonal under the null to the regressor that
corresponds to the parameter σ To see this, evaluate the regressors at the
and so we see that
regression (15.28) is asymptotically unchanged if we simply omit the regressor
corresponding to σ.
linear combination of the regressors that correspond to β; recall the discussion
in Section 2.4 in connection with the FWL Theorem Thus, since we assumed
that there is a constant term in the regression, the t statistic is unchanged if
asymptotically orthogonal to all the regressors that correspond to β, as can
be seen from the following calculation:
The above arguments imply that we can obtain a valid test simply by using
the t statistic from the regression
which is numerically identical to the t statistic for the sample mean of the
single regressor here to be 0 Because the plim of the error variance is just 1,
Trang 13since the regressor and regressand are asymptotically orthogonal, both of these
t statistics are asymptotically equal to
ˆ
t
so the plim of the denominator is the square root of
hypothesis that the error terms are normally distributed
asymptotic-ally orthogonal to the other regressors in (15.27) without changing the t
t − 6σ2u2
so running the test regression
which is defined in terms of the normalized residuals, provides an appropriatetest statistic As readers are invited to check in Exercise 15.8, this statistic isasymptotically equivalent to the simpler statistic
It is important that the denominator of the normalized residual be the ML
asymptotic N (0, 1) distribution under the null hypothesis of normality.
Trang 1415.2 Specification Tests Based on Artificial Regressions 653
we prefer to use the statistics themselves rather than their squares, since the
to the right and negative when they are skewed to the left Similarly, the
is negative excess kurtosis
It can be shown (see Exercise 15.8 again) that the test statistics (15.31) and(15.33) are asymptotically independent under the null Therefore, a joint testfor skewness and excess kurtosis can be based on the statistic
forms, by Jarque and Bera (1980) and Kiefer and Salmon (1983); see alsoBera and Jarque (1982) Many regression packages calculate these statistics
as a matter of course
depend solely on normalized residuals This implies that, for a linear sion model with fixed regressors, they are pivotal under the null hypothesis ofnormality Therefore, if we use the parametric bootstrap in this situation, wecan obtain exact tests based on these statistics; see the discussion at the end
regres-of Section 7.7 Even for nonlinear regression models or models with laggeddependent variables, parametric bootstrap tests should work very much betterthan asymptotic tests
fur-nishes the normalized residuals does not contain a constant or the equivalent
In such unusual cases, it is necessary to proceed differently, for instance, byusing the full OPG regression (15.27) with one or two test regressors TheOPG regression can also be used to test for skewness and excess kurtosis inmodels that are not regression models, such as the models with ARCH errorsthat were discussed in Section 13.6
Information Matrix Tests
In Section 10.3, we first encountered the information matrix equality Thisfamous result, which is given in equation (10.34), tells us that, for a model
estimated by maximum likelihood with parameter vector θ, the asymptotic information matrix, I(θ), is equal to minus the asymptotic Hessian, H(θ).
The proof of this result, which was given in Exercises 10.6 and 10.7, depends
on the DGP being a special case of the model Therefore, we should expectthat, in general, the information matrix equality does not hold when the model
we are estimating is misspecified This suggests that testing this equality isone way to test the specification of a statistical model This idea was first
Trang 15suggested by White (1982), who called tests based on it information matrixtests, or IM tests These tests were later reinterpreted as conditional momenttests by Newey (1985) and White (1987).
Consider a statistical model characterized by the loglikelihood function
for i = 1, , k and j = 1, , i Expression (15.35) is a typical element of the
information matrix equality The first term is an element of the asymptoticHessian, and the second term is the corresponding element of the outer prod-uct of the gradient, the expectation of which is the asymptotic information
conditions of the form (15.35)
Equation (15.35) is a conditional moment in the form (15.13) We can fore calculate IM test statistics by means of the OPG regression, a proce-dure that was originally suggested by Chesher (1983) and Lancaster (1984)
there-The matrix M (θ) that appears in regression (15.25) is constructed as an
This matrix and the other matrix of regressors G(θ) in (15.25) are usually
sum of squares, or, equivalently, n − SSR from this regression If the matrix
to be dropped, and the number of degrees of freedom for the test reducedaccordingly
In Exercise 15.11, readers are asked to develop the OPG version of the mation matrix test for a particular linear regression model As the exerciseshows, the IM test in this case is sensitive to excess kurtosis, skewness, skew-ness interacted with the regressors, and any form of heteroskedasticity thatthe test of White (1980) would detect; see Section 7.5 This suggests that wemight well learn more about what is wrong with a regression model by testingfor heteroskedasticity, skewness, and kurtosis separately instead of performing
infor-an information matrix test We should certainly do that if the IM test rejectsthe null hypothesis
Trang 1615.3 Nonnested Hypothesis Tests 655
As we have remarked before, tests based on the OPG regression are extremelyprone to overreject in finite samples This is particularly true for informationmatrix tests when the number of parameters is not small; see Davidson andMacKinnon (1992, 1998) Fortunately, the OPG variant of the IM test is by
no means the only one that can be used Davidson and MacKinnon (1998)compare the OPG version of the IM test for linear regression models withtwo other versions One of these is the efficient score, or ES, variant (seeSection 10.5), and the other is based on the double-length regression, or DLR,originally proposed by Davidson and MacKinnon (1984a) They also comparethe OPG variant of the IM test for probit models with an efficient scorevariant that was proposed by Orme (1988) Although the DLR and both ESversions of the IM test are much more reliable than the corresponding OPGversions, their finite-sample properties are far from ideal, and they too should
be bootstrapped whenever the sample size is not extremely large
15.3 Nonnested Hypothesis Tests
Hypothesis testing usually involves nested models, in which the model thatrepresents the null hypothesis is a special case of a more general model thatrepresents the alternative hypothesis For such a model, we can always test thenull hypothesis by testing the restrictions that it imposes on the alternative.But economic theory often suggests models that are nonnested This meansthat neither model can be written as a special case of the other withoutimposing restrictions on both models In such a case, we cannot simply testone of the models against the other, less restricted, one
There is an extensive literature on nonnested hypothesis testing It provides
a number of ways to test the specification of statistical models when one ormore nonnested alternatives exists In this section, we briefly discuss some ofthe simplest and most widely-used nonnested hypothesis tests, primarily inthe context of regression models
Testing Nonnested Linear Regression Models
Suppose we have two competing economic theories which imply different linear
set We can write the two models as
for whichever model actually generated the data, and we can base inferences
on the usual OLS covariance matrix
Trang 17For the models H1and H2given in equations (15.37) to be nonnested, it must
be the case that neither of them is a special case of the other This implies
that S(X) cannot be a subspace of S(Z), and vice versa In other words, there must be at least one regressor among the columns of X that does not lie in S(Z), and there must be at least one regressor among the columns of Z that does not lie in S(X) We will assume that this is the case.
The simplest and most widely-used nonnested hypothesis tests start from theartificial comprehensive model
could simply estimate this model and test whether α = 0 However, this is
not possible, because at least one, and usually quite a few, of the parameters
the regression function of the artificial model, but the number of parameters
that can be identified is the dimension of the subspace S(X, Z) This cannot
combinations of them, may appear in both regression functions
The simplest way to base a test on equation (15.38) is to estimate a restrictedversion of it that is identified, namely, the inclusive regression
2
We can estimate the model (15.39) by OLS and test the null hypothesis that
to recommend it, it is not often thought of as a nonnested hypothesis test,and it does not generalize in a very satisfactory way to the case of nonlinearregression models Moreover, it is generally less powerful than the nonnested
data We will have more to say about this test below
Another way to make equation (15.38) identified is to replace the unknown
vector γ by a vector of parameter estimates This idea was first suggested by
equation (15.38) becomes
Trang 1815.3 Nonnested Hypothesis Tests 657
nonnested hypothesis test that Davidson and MacKinnon called the J test It
is based on the ordinary t statistic for α = 0 in equation (15.40), which they
It is not at all obvious that the J statistic is asymptotically distributed as
all, as can be seen from the second equation of (15.40), the test regressordepends on the regressand Thus one might expect the regressand to bepositively correlated with the test regressor, even when the null hypothesis istrue This is generally the case, but only in finite samples The proof that
the J statistic is asymptotically valid depends on the fact that, under the null
hypothesis, the numerator of the test statistic is
can easily be obtained by applying the FWL Theorem to the second line of
There are only two terms on the right-hand side of the equation, because
The first term on the right-hand side of equation (15.41) is a weighted average
of the elements of the vector u Under standard regularity conditions, we may
So too, under standard regularity conditions, are the cross-product matrices of
times the numerator of the J statistic has the same asymptotic distribution
1 This J statistic should not be confused with the Hansen-Sargan statistic cussed in Section 9.4, which some authors refer to as the J statistic.
Trang 19dis-statistic consistently estimates the variance that appears in expression (15.42);
see Exercise 15.12 The J statistic itself is therefore asymptotically distributed
as N (0, 1) under the null hypothesis.
Although the J test is asymptotically valid, it generally is not exact in finite
samples, although there is an exception in one very special case, which istreated in Exercise 15.13 In fact, because the second term on the right-handside of equation (15.41) usually has a positive expectation under the null,
the numerator of the J statistic generally has a positive mean, and so does the test statistic itself In consequence, the J test tends to overreject, often
quite severely, in finite samples Theoretical results in Davidson and non (2002a), which are consistent with the results of simulation experimentsreported in a number of papers, suggest that the overrejection tends to beparticularly severe when at least one of the following conditions holds:
MacKin-• The sample size is small;
• The model under test does not fit very well;
Bootstrapping the J test dramatically improves its finite-sample performance.
or a semiparametric bootstrap DGP, as discussed in Section 4.6 If the latter isused, it is very important to rescale the residuals before they are resampled In
most cases, the bootstrap J test is quite reliable, even in very small samples;
see Godfrey (1998) and Davidson and MacKinnon (2002a) An even morereliable test may be obtained by using a more sophisticated bootstrappingprocedure proposed by Davidson and MacKinnon (2002b)
Another way to obtain a nonnested test that is more reliable than the
by another estimate of γ, namely,
˜
is then
and the test statistic is, once again, the t statistic for α = 0 This test
statistic, which was originally proposed by Fisher and McAleer (1981), is
properties under the null hypothesis than the ordinary J test In fact, the test
the classical normal linear model, for exactly the same reason that the RESETtest is exact in a similar situation; see Godfrey (1983) and Exercise 15.3
Trang 2015.3 Nonnested Hypothesis Tests 659
accompanied by equally good performance under the alternative As can be
seen from the second of equations (15.44), the vector y is projected onto X
powerful than the J test; see, for example, Davidson and MacKinnon (1982).
reject provides little information In contrast, the J test, when bootstrapped,
appears to be both reliable and powerful in samples of reasonable size
been proposed for linear regression models In particular, several tests havebeen based on the pioneering work of Cox (1961, 1962), which we will discussfurther below The most notable of these were proposed by Pesaran (1974) andGodfrey and Pesaran (1983) However, since these tests are asymptotically
equivalent to the J test, have finite-sample properties that are either dreadful
(for the first test) or mediocre (for the second one), and are more complicated
to compute than the J test, especially in the case of the second one, there
appears to be no reason to employ them in practice
Testing Nonnested Nonlinear Regression Models
The J test can readily be extended to nonlinear regression models Suppose
the two models are
When we say that these two models are nonnested, we mean that there are
values of β, usually infinitely many of them, for which there is no admissible γ for which x(β) = z(γ), and, similarly, values of γ for which there is no admissible β such that z(γ) = x(β) In other words, neither model is a
special case of the other unless we impose restrictions on both models Theartificial comprehensive model analogous to equation (15.38) is
y = (1 − α)x(β) + αz(γ) + u, and the J statistic is the t statistic for α = 0 in the nonlinear regression
and MacKinnon (1981)
Because some of the parameters of the nonlinear regression (15.46) may not be
well identified, the J statistic can be difficult to compute This difficulty can
Trang 21be avoided in the usual way, that is, by running the GNR which corresponds
a = 0 in regression (15.47) is called the P statistic Under the null hypothesis,
it is asymptotically equal to the corresponding J statistic The P test is much
Numerous other nonnested tests are available for nonlinear regression models
contrast, a bootstrap version of the P test should be reasonably reliable and
computer time is not a constraint
The J and P tests can both be made robust to heteroskedasticity of unknown
form either by using heteroskedasticity-robust standard errors (Section 5.5) or
by using the HRGNR (Section 6.8) Like ordinary J and P tests, these tests
should be bootstrapped However, bootstrapping heteroskedasticity-robust
tests requires procedures different from those used to bootstrap ordinary t and F tests, because the bootstrap DGP has to preserve the relationship
between the regressors and the variances of the error terms This means that
we cannot use IID errors or resampled residuals For introductory discussions
of bootstrap methods for regression models with heteroskedastic errors, seeHorowitz (2001) and MacKinnon (2002)
It is straightforward to extend the J and P tests to handle more than two
nonnested alternatives For concreteness, suppose there are three competing
of that model
The P test can also be extended to linear and nonlinear multivariate regression
models; see Davidson and MacKinnon (1983) One starts by formulating anartificial comprehensive model analogous to (15.38), with just one additional
and then obtains a P test based on the multivariate GNR (12.53) for the
model under test Because there is more than one plausible way to specifythe artificial comprehensive model, more than one such test can be computed
Trang 2215.3 Nonnested Hypothesis Tests 661
Interpreting Nonnested Tests
All of the nonnested hypothesis tests that we have discussed are really just
This can be done by interchanging the roles of the two models For example,
the other, there are four possible outcomes:
• Reject both models;
• Do not reject either model.
Since the first two outcomes lead us to prefer one of the models, it is tempting
to see them as natural and desirable However, the last two outcomes, whichare by no means uncommon in practice, can also be very informative If bothmodels are rejected, then we need to find some other model that fits better
If neither model is rejected, then we have learned that the data appear to becompatible with both hypotheses
Because nonnested hypothesis tests are designed as specification tests, ratherthan as procedures for choosing among competing models, it is not at allsurprising that they sometimes do not lead us to choose one model over theother If we simply want to choose the “best” model out of some set ofcompeting models, whether or not any of them is satisfactory, then we shoulduse a completely different approach, based on what are called informationcriteria This approach will be discussed in the next section
Encompassing Tests
very similar to the idea behind indirect inference, a topic we briefly discussed
in Section 13.3 Binding functions, as defined in the context of indirect
Trang 23estimates of the values of the binding functions under the assumption that
As a concrete example, consider the linear case in which the two models
be exogenous or predetermined, we see that
We can estimate this probability limit by dropping the plims on the
equation (15.43) An encompassing test can therefore be based on the vector
square matrix of full rank Since some columns of Z generally lie in S(X),
as test regressors, that is, by using the inclusive regression (15.39) as a test
we have already discussed
The parallels between this sort of encompassing test and the DWH test cussed in Section 8.6 are illuminating Both tests can be implemented as
dis-F tests — in the case of the DWH test, an dis-F test based on regression (8.77).
In both cases, the F test almost always has fewer degrees of freedom in the
numerator than the number of parameters The interested reader may find itworthwhile to show explicitly that a DWH test can be set up as a conditionalmoment test
For a detailed discussion of the concept of encompassing and various teststhat are based on it, see Hendry (1995, Chapter 14) Encompassing tests areavailable for a variety of nonlinear models; see Mizon and Richard (1986).However, there can be practical difficulties with these tests These difficultiesare similar to the ones that can arise with Hausman tests which are baseddirectly on a vector of contrasts; see Section 8.6 The basic problem is that it
can be difficult to ascertain the dimension of the space analogous to S(X, Z),
and, in consequence, it can be difficult to determine the appropriate number
of degrees of freedom for the test
Trang 2415.3 Nonnested Hypothesis Tests 663
Cox Tests
Nonnested hypothesis tests are available for a large number of models that arenot regression models Most of these tests are based on one of two approaches
The first approach, which previously led to the J and P tests, involves forming
an artificial comprehensive model and then replacing the parameters of the
of this approach, Exercise 15.19 asks readers to derive a test similar to the
P test for binary response models The second approach, which we briefly
discuss in this subsection, is based on two classic papers by Cox (1961, 1962)
It leads to what are generally called Cox tests
Suppose the two nonnested models are each to be estimated by maximumlikelihood, and that their loglikelihood functions are
used in Chapter 10, omits the dependence on the data for clarity Cox’soriginal idea was to extend the idea of a likelihood ratio test, and so he
well-defined asymptotic distribution It is then convenient to center this variable
by subtracting its expectation Since, according to equations (15.50), both
the expression
to estimate the variance of the result Cox solved this problem by showingthat the statistic
is indeed asymptotically normally distributed, with mean 0 and a variancethat can be estimated consistently using a formula given in his 1962 paper