INFERENCE AND CAUSALITY IN ECONOMIC TIME SERIES MODELS Causal orderings and their implications 3.1.. A canonical form for wide sense stationary multiple time series 3.2.. Although Wiene
Trang 1INFERENCE AND CAUSALITY IN ECONOMIC TIME SERIES MODELS
Causal orderings and their implications
3.1 A canonical form for wide sense stationary multiple time series
3.2 The implications of unidirectional causality
Some practical problems for further research
6.1 The parameterization problem and asymptotic distribution theory
Handbook of Econometrics, Volume II, Edited hv 2 Griliches and M.D Intrrlrgator
0 Elsevier Science Puhlishers B V, I984
Trang 21102
1 Introduction
Many econometricians are apt to be uncomfortable when thinking about the concept “causality” (in part, because they usually do so under some duress) On the one hand, the concept is a primitive notion which is indispensable when thinking about economic phenomena, econometric models, and the relation between the two On the other, the idea is notoriously difficult to formalize, as casual reading in the philosophy of science will attest In this chapter we shall be concerned with a particular formalization that has proved useful in empirical work: hence the juxtaposition of “causality” and “inference” It also bears close relation to notions of strictly exogenous and predetermined variables, which have considerable operational significance in statistical inference, and to the concepts
of causal orderings and realizability which are important in model construction in econometrics and engineering, respectively
Our concept of causality was introduced to economists by C W J Granger [Granger (1963, 1969)], who built on earlier work by Wiener (1956) We shall refer to the concept as Wiener-Granger causality It applies to relations among time series Let X = {x,, t real} and Y = { y,, t real} be two time series, and let
X, and Y denote their entire histories up to and including time t: X, = {x,_,, s 2 0}, Y, = { yrp,, s 2 O} Let U, denote all information accumulated as of time t,
and suppose that X, G U, if and only if s I t, and Y, 5 U, if and only if s I t If
we are better able to predict x, using U,_, than we are using U,_ 1 - Y,_ 1, then Y CUUS~S X If we are better able to “predict” x, using U,_ 1 U y, than we are using
U,_ 1, then Y causes X instantaneously.’
Since Wiener-Granger causality is defined in terms of predictability, it cannot
be an acceptable definition of causation for most philosophers of science [Bunge (1959, ch 12)] We do not take up that argument in this chapter Rather, we concentrate on the operational usefulness of the definition in the construction, estimation, and application of econometric models In Section 2, for example, we consider the logical relationships among Wiener-Granger causality, Simon’s (1952) definition of causal ordering, the engineer’s criterion of realizability [e.g Zemanian (1972)], and the concept of structure set forth by Hurwicz (1962) Although Wiener-Granger causality is an empirical rather than a logical or ontological concept, it must be made much more specific before propositions like
‘Granger’s (1963, 1969) definitions assume that the time series are stationary, predictors are linear least-squares projections, and mean-square error is the criterion for comparison of forecasts While these assumptions are convenient to make when conducting empirical tests of the proposition that causality of a certain type is absent, they are not SUI generis and therefore have not been imposed here
Trang 3Ch 19: Inference und Causali@
“Y does not cause X” can be refuted, even in principle One must always specify the set of “all information” assumed in the definition since Y may cause X for some sets but not others One must also have a criterion for the comparison of predictors, and the validity of propositions like “Y does not cause X” can be assessed only for restricted classes of predictors and distribution functions In Section 3 we take up the case, frequently assumed in application, in which
U, = X, u q, predictors are linear, and the time series are jointly wide sense stationary, purely nondeterministic, and have autoregressive representations
In Sections 4 and 5 we move on to issues of statistical inference In Section 4 it
is shown that unidirectional causality from X to Y (i.e Y does not cause X, and
X may or may not cause Y) is logically equivalent to the existence of simulta- neous equation models with X exogenous It is also shown that unidirectional causality from X to Y is not equivalent to the assertion that X is predetermined
in a particular behavioral relationship whose parameters are to be estimated In Section 5 we take up the narrower problem of testing the proposition that Y does not cause X under the assumptions made in Section 3
Section 6 is devoted to some of the problems which arise in testing the proposition of unidirectional causality using actual economic time series, due to the fact that these series need not satisfy the ideal assumptions made in Sections 3 and 5 We concentrate on parameterization problems, processes which are nonau- toregressive or have deterministic components or are nonstationary, and inference about many variables The reader who is only interested in the mechanics of testing hypotheses about unidirectional causality can skip Sections 2 and 4, and read Sections 3, 5, and 6 in order The material in Sections 2 and 4, however, is essential in the interpretation of the results of those tests
Whether or not Wiener-Granger causality is consistent with formal definitions of causality offered by philosophers of science is an open question In most defini- tions, “cause” is similar in meaning to “force” or “produce” [e.g Blalock (1961,
pp 9-lo)], which are clearly not synonymous with “predict” Perhaps the definition closest to Wiener-Granger causality is Feigl’s in which “causation is defined in terms of predictability according to a law” [Feigl (1953, p 408)J It has been argued [Zellner (19?9)] that statistical “laws” of the type embodied in Wiener-Granger causality are not admissible, as opposed to those of economic theory Wiener-Granger causality is therefore “devoid of subject matter consider- ations, including subject matter theory, and thus is in conflict with others’ definitions, including Feigl’s, that do mention both predictability and laws” [Zellner (1979, p 51)] Bunge (1959, p 30) on the other hand, argues forcefully
Trang 4The usefulness of the concept of Wiener-Granger causality in the conceptual- ization, construction, estimation and manipulation of econometric models is independent of its consistency or inconsistency with formal definitions To evaluate its usefulness, we review and formalize some operational concepts implicit in econometric modelling.*
A definition of causal ordering in any econometric model (as opposed to the real world) was proposed by Simon (1952) Suppose S is a space of possible outcomes, and that the model imposes two sets of restrictions, A and B, on these outcomes The entire model imposes the restriction n n B on S Suppose that S
is mapped into two spaces, X and Y, by Px and P,, respectively Then the ordered pair of restrictions (A, B) implies a causal ordering from X to Y if A restricts X (if at all) but not Y, and B restricts Y (if at all) without further restricting X Formally we have the following:
Definition
The ordered pair (A, B) of restrictions on S determines a causal ordering from X
to Y if and only if Pr( A) = Y and Px( A f~ B) = Px( A)
2Much (but not all) of what follows in this section may be found in Sims (1977a)
Trang 5Ch 19: Inference und Causuli~v 1105
A geometric interpretation of this definition is provided in Figure 2.1 Some examples may also be helpful Perhaps the simplest one which can be constructed
is the following Let S = {(x, y) E R2 }, and consider the restrictions:
and P,(C f~ D) = P,(A n B) = c - ba, and in fact one of these establishes a causal ordering from Y to X
As a second example, let S be the family of pairs of random variables (x, v) with bivariate normal distribution Consider the restrictions:
x=ui- N( pi, 0;) “A”
y+bx=u,-N(~2,a,2) “B”
on S, where ui and u2 are independent Suppose Px and P, map S into the marginal distributions for x and y, respectively Then (A, B) determines a causal ordering from the marginal for x to the marginal for y The model consisting of
A, B, and the stipulation that ui and u2 are independent is the simplest example
of a recursive model [Strotz and Wold (1960)] As Basmann (1965) has pointed out, any outcome in S can be described by such a model-again, the causal ordering is a property of the model, not of the outcome
Causal orderings, or recursive models, are intended to be more than just descriptive devices Inherent in such models is the notion that if A is changed, the outcome will still be A n B, with B unchanged Once the possibility of changing the first restriction in the ordered pair is granted, it makes a great deal of difference which causal ordering is inherent in the model: different models describe different sets of restrictions on S arising from manipulation of the first restriction Hence attention is focused on B We formalize the notion that B is unchanged when A is manipulated as follows
Definition
The set B c S accepts X as input if for any A c S which constraints only X (i.e
P; ‘( Px( A)) = A), (A, B) determines a causal ordering from X to Y
In econometric modelling, the notion that B should accept X as input is so entrenched and natural that it is common to think of B as the model itself, with
Trang 61106 J Geweke
little or no attention given to the set A which restricts the admissible inputs for
the model, although these restrictions may be very important Conventional manipulation of an econometric model for policy or predictive purposes assumes that the manipulated variables are accepted as input by the model
In many applications X and Y are time series, as they were in the notation of Section 1 Consider the simple case in which X and Y are univariate, normally distributed, jointly stationary time series, and S is the family of bivariate,
normally distributed, jointly stationary time series Suppose that the restriction A
is:
A(L)x, = u,,
where A(L) is one-sided (i.e involves only non-negative powers of the lag operator L) and has all roots outside the unit circle; and V= {u,, t real} is a serially uncorrelated, normally distributed, stationary time series Let the restric- tion B be:
where B(L) has no roots on the unit circle, both B(L) and C(L) may be
two-sided (i.e involve negative powers of the lag operator L) and W = { w,, t
real} is a serially uncorrelated normally distributed, stationary time series inde-
pendent of U Since A implies x, = A(L)-‘u,, it establishes the first time series without restricting the second, while “B ” implies
which establishes the second without changing the first Hence, the model establishes a causal ordering from X to Y, and if for any normally distributed,
jointly stationary X the outcome of the model satisfies (2.1), then B accepts X as
input Such a model might or might not be interesting for purposes of manipula- tion, however In general, y, will be a function of past, current, and future X,
which is undesirable if B is supposed to describe the relation between actual inputs and outputs; the restriction that B(L) and C(L) be one-sided and that B(L) have no roots inside the unit circle would obviate this difficulty
The notion that future inputs should not be involved in the determination of present outputs is known in the engineering literature as realizability [Zemanian (1972)], and we can formalize it in our notation as follows
Definition
The set B G S is realizable with time series X as input if B accepts X as input, and
Px (A,) = Px (A*) implies Pr( A, fl B) = Py,( A, n B) for all A, c S and A, c S wl&h constrain only X, and all t 2 r
Trang 7Ch 19: Inference and Cau.sali(v 1107
If B accepts X as input but is not realizable, then a specification of inputs up to
time t will not restrict outputs, but once outputs up to time t are restricted by B,
then further restrictions on inputs-those occurring after time t-are implied This is clearly an undesirable characteristic of any model which purports to treat time in a realistic fashion
The concepts of causal ordering, inputs, and realizability pertain to models One can establish whether models possess these properties without reference to the phenomena which the models are supposed to describe Of course, our interest
in these models stems from the possibility that they do indeed describe actual phenomena Hurwicz (1962) attributes the characteristic structural to models
which meet this criterion
This definition incorporates two terms which shall remain primitive: “imple-
mented” and “true” Whether or not Py( P;‘(C)fl B) is true for a given C is a
question to which statistical inference can be addressed; at most, we can hope to attach a posterior probability to the truth of this statement We can never know
whether PY( Pi’(C)n B) is true for any C: one can never prove that a model is
structural, although by implementing one or more sets C serious doubts could be cast on the assertion Since the definition allows any set C G X to be imple- mented, those implementing inputs in real time are permitted to change their plans It seems implausible that the current outputs of an actual system should depend on future inputs as yet undetermined We formalize this idea as follows
Axiom of causality
B c S is structural for inputs X only if B is realizable with X as input
The axiom of causality is a formalization of the idea that the future cannot cause the past, an idea which appears to be uniformly accepted in the philosophy
of science despite differences about the relations between antecedence and causality For example, Blalock (1964, p 10) finds this condition indispensable:
Trang 81108 J Geweke
“Since the forcing or producing idea is not contained in the notion of temporal sequences, as just noted, our conception of causality should not depend on temporal sequences, except for the impossibility of an effect preceding its cause.” Bunge argues that the condition is universally satisfied:
Even relativity admits the reversal of time series of physically disconnected events but excludes the reversal of causal connections, that is, it denies that effects can arise before they have been produced .events whose order of succession is reversible cannot be causally connected with one another; at most they may have a common origin To conclude, a condition for causality to hold is that C [the cause] be previous to or at most simultaneous with E [the event] (relative to a given reference system) [Bunge (1959, p 67)]
It is important to note that the converse of the axiom of causality is the post hoc ergo propter hoc fallacy The fallaciousness of the converse follows from the fact that there are many B, G S which are realizable with X as input, but for which P,(Pi’(C)n B,) # P,(&‘(C)n Bk) when j # k for some choices of C For C which have actually been implemented, B, and B, may of course produce identical outputs in spite of their logical inconsistency: one cannot establish that a restriction is structural through statistical inference, even to a specified level of a
posteriori probability.3 It may seem curious to provide the name “axiom of causality” to a statement which nowhere mentions the word “cause” The name is chosen because of Sims’ (1972) result that (in our language, and with appropriate restrictions on classes of time series and predictors) B is realizable with X as input if and only if in B Wiener-Granger causality is unidirectional from X to Y
To develop this result we shall be quite specific about the structure of the time series X and Y
3 Causal orderings and their implications
In any empirical application the concept of Wiener-Granger causality must be formulated more narrowly than it is in Granger’s definitions The relevant universe of information must be specified, and the class of predictors to be considered must be limited If formal, classical hypothesis testing is contemplated, then the question of whether or not Y is causing X must be made to depend on the values of parameters which are few in number relative to the number of observations at hand The determination of the relevant universe of information rests primarily on a priori considerations from economic theory, in much the
‘An extended discussion of specific pitfalls encountered in using a finding that a restriction B which
is realizable with X as input is in agreement with the data, to buttress a claim that B is structural, is provided by Sims (1977)
Trang 9Ch 19: Inference und Causahry 1109
same way that the specification of which variables should enter a behavioral equation or system of equations does Empirical studies which examine questions
of Wiener-Granger causality differ greatly in the care with which the universe of information is chosen; in many instances, it is suggested by earlier work on substantively similar issues which did not address questions of causality How- ever, virtually all of these studies consider only predictors which are linear either
in levels or logarithms This choice is due mainly to the analytical convenience of the linearity specification, as it is elsewhere in econometric theory In the present case it is especially attractive because only linear predictors are necessarily time invariant when time series are assumed to be wide sense stationary, the least restrictive class of time series for which a rich and useful theory of prediction is available In this section we will discuss the portions of this theory essential for developing the testable implications of Wiener-Granger causality Considerations
of testing and inference are left to Section 5
3.1 A canonical form for wide sense stationary multiple time series
We focus our attention on a wide sense stationary, purely non-deterministic time series z,: m x 1 By wide sense stationary, it is meant that the mean of z, exists and does not depend on t, and for all t and s cov(z,, z(+,) exists and depends on
s but not t By purely non-deterministic, it is meant that the correlation of z,+~
and z, vanishes as p increases so that in the limit the best linear forecast of z(+~
conditional on {z(_,, s > 0} is the unconditional mean of z(+~, which for conveni- ence we take to be 0 It is presumed that the relevant universe of information at time t consists of Z, = {zt_$, s > 0) These assumptions restrict the universe of information which might be considered, but they are no more severe than those usually made in standard linear or simultaneous equation models for the purposes
of developing an asymptotic theory of inference
We further suppose that there exists a moving average representation for z,:
z,= f AE s t s, E(q) = 0, var( E,) = Y (3.1) s=o
In the moving average representation, all roots of the generating function
CT+ A,zS have modulus not less than unity, the coefficients satisfy the square summability condition C~zollA,J(2 < 00,~ and the vector E, is serially uncorrelated
[Wold (1938)] The existence of the moving average representation is important to 4For any square complex matrix C, llC\l denotes the square root of the largest eigenvalue of C’C, and 1 Cl denotes the square root of the determinant of C’C
Trang 101110 J Gmeke
us for two reasons First, it is equivalent to the existence of the spectral density matrix S,(X) of z, at almost all frequencies h E [- n,m] [Doob (1953, pp 499-500)] Second, it provides a lower bound on the mean square error of one-step ahead minimum mean square error linear forecasts, which is:
Suppose now that z, has been partitioned into k x 1 and 1 x 1 subvectors x, and y,, z; = (XI, y,‘), reflecting an interest in causal relationships between X and
Y Adopt a corresponding partition of S,(h):
Trang 11Ch 19: Inference and Causali@ 1111
The linear projection of x, on X,-i and Y-i, and of yI on X,_, and Y_, is given by (3.3) which we partition:
The disturbance vectors u2, and uzt are each serially uncorrelated, but since each
is uncorrelated with X,_ 1 and Y,_i, they can be correlated with each other only contemporaneously We shall find the partition:
Trang 12follows from the last I equations
We finally consider the linear projections of x, on X,_, and Y, and y, on Y,_t and X Let b(h) = S,,,(X)$,(X)-‘, for all h E [ - T, r] for which the terms are defined and (3.4) is true Because of (3.4), the inverse Fourier transform,
of b(h) satisfies the condition ~~zoll D,ll’ < 00 From the spectral representation
of 2 it is evident that w, = x, cF= _ 3 D, yps is uncorrelated with all y,, and that
_T=pX
therefore provides the linear projection of x, on Y Since S,(h) = S,(X)- S,,.(h)$.(h))‘~,.,(X) consists of the first k rows and columns of S3(h))l, c-1Jk@&(h)@cz~ f or almost all A Hence, w, possesses an autoregressive representation, which we write:
Trang 13Ch 19: Inference und Cuusulity
uncorrelated with X,_ r Hence, (3.14) provides the linear projection of x, on X,_ 1
and all Y, ~~=rII&,I1* < cc and CT= _,llF’,11* < cc The same argument may be used to demonstrate that the linear projection of y, on Y,_, and X,,
exists and all coefficients are square summable
3.2 The implications of unidirectional causality6
If the universe of information at time t is Z,, all predictors are linear, and the criterion for the comparison of forecasts is mean square error, then the Wiener-Granger definition of causality may be stated in terms of the parameters
of the canonical form displayed in Table 3.1, whose existence was just demon- strated For example, Y causes X if, and only if, F,, l 0; equivalent statements are Z, @z2 and lzl I> 12, I Since Er @& in any case, Y does not cause X if and only if I Z1 I= ) 2’, I Define the measure of linear feedback from Y to X:
F Y-x- wJwl~*l)
The statement “Y does not cause X” is equivalent to F, _ x = 0 Symmetrically,
X does not cause Y if, and only if, the measure of linear feedback from X to Y,
F x+-ln(I~l/lT,I)~
is zero
Trang 14‘In the case k = I = 1, our measure of linear dependence is the same as the meclsure of informarron
per unit time contained in X about Y and vice versa, proposed by Gel’fand and Yaglom (1959) In the case I=l, Fx_v+Fx.v=-ln(l-R2*) and F,,,=-ln(l-R’.(k)), R2, and R,(k) being proposed by Pierce (1979) In the case in which there is no instantaneous causality, Granger (1963) proposed that 1 - 1 8,I / I&I be defined as the “strength of causality Y = X” and 1 - 1 T, I/ 1 T, 1 be
Trang 15Ch 19: Inference und Cuusali~~ 1115
Hence
and by an argument symmetric in X and Y:
(ii) By construction of (3.9) Z, = ,Z2 - CT*- ‘C’, so 1,X3 I I T2 I = I Tl Combining this result with )E41 ( Tl I = [‘I’[ from (3.16), (ii) is obtained
(iii) Follows by symmetry with (ii)
(iv) Follows from IZj I I T, I = 12’1 and the symmetry of the right-hand side of that equation in X and Y
We have seen that the measures F,., and F,,, preserve the notions of symmetry inherent in the concepts of instantaneous causality and dependence, in the case where relations are constrained to be linear and the metric of comparison
is mean square error Since
linear dependence can be decomposed additively into three kinds of linear feedback Absence of a particular causal ordering is equivalent to one of these three types of feedback being zero As we shall see in Section 5, the relations in this theorem provide a basis for tests of null hypotheses which assert the absence
of one or more causal orderings
It is a short step from this theorem to Sims’ (1972) result that Y does not cause
X if, and only if, in the linear projection of Y on future, current and past X coefficients on future X are zero The statement “Y does not cause X” is
equivalent to Zi = xc, and T3 = T4, which is in turn equivalent to H,, = & From
our derivation of (3.14) from (3.11), coefficients on X- X, in (3.15) are zero if and only if the coefficients on X - X, in the projection of yr on X are zero This implication provides yet another basis for tests of the null hypothesis that Y does not cause X
3.3 Extensions
The concept of Wiener-Granger causality has recently been discussed in contexts less restrictive than the one presented here The assumption that the multiple time series of interest is stationary and purely non-deterministic can be relaxed, attention need not be confined to linear relations, and characterizations of bidirectional causality have been offered We briefly review the most important
Trang 161116
developments in each of these areas, providing citations but not proofs
The extension to the case in which 2 may be non-stationary and have deterministic components is relatively straightforward, so long as only linear relations are of interest The definition of Wiener-Granger causality given in Section 1 remains pertinent with the understanding that only linear predictors are considered If Z is non-stationary, then the linear predictors are in general not time invariant, as was the case in this section Hosoya (1977) has shown that if Y does not cause X, then the difference between y, and its projection on X, is orthogonal to x,+~ ( p 2 1) The latter condition is the one offered by Sims (1972) under the assumptions of stationary and pure non-determinism, and is the natural
analogue of the condition ln( ( T, I/ ( T4 () = 0 If Z contains deterministic compo-
nents, then the condition that Y does not cause X implies that these components are linear functions of the deterministic part of X, plus a residual term which is uncorrelated with X at all leads and lags [Hosoya (1977)]
When we widen our attention to include possibly non-linear relations, more subtle issues arise Consider again the condition that Y does not cause X Corresponding natural extensions of the conditions we developed for the linear, stationary, purely non-deterministic case are:
(1) X,+1 is independent of Y, conditional on X, for all t, for the restriction
F,, = 0 in (3.7);
(2) y, is independent of x,+ r, x,+~, conditional on X, for all t, for the restriction that the linear projections of y, on X and on X, be identical; and (3) y, is independent of x,,,, x,+~ , conditional on X, and Y,_ , for all t, for the restriction Hss = Hds in (3.10) and (3.15)
Chamberlain (1982) has shown that under a weak regularity condition (analo- gous to c~~“=ol14112 < cc introduced in Section 3.1) conditions (1) and (3) are equivalent, just as their analogues were in the linear case However, (1) or (3) implies (2), but not conversely: the natural extension of Sims’ (1972) result is not true Further discussion of these points is provided in Chamberlain’s paper and in the related work of Florens and Mouchart (1982)
When causality is unidirectional it is natural to seek to quantify its importance and provide summary characteristics of the effect of the uncaused on the caused series When causality is bidirectional - as is perhaps the rule - these goals become even more pressing The measures of linear feedback provide one practical answer
to this question, since they are easy to estimate Two more elaborate suggestions have been made, both motivated by problems in the interpretation of macroeco- nomic aggregate time series
Sims (1980) renormalizes the moving average representation (3.1) in recursive form
s=o
Trang 17Ch 19: Inference und Causality 1117
with AZ lower triangular and 2* = var($) diagonal [The renormalization can be computed from (3.1) by exploiting the Choleski decomposition 2 = MM’, with M lower triangular Denote L = diag(M) Then LM-‘z, = ~~~“=,LM-‘A,E,_,; LIT’A, = LM-’ is lower triangular with units on the main diagonal and var( LM-‘e,) = LL’.] If we let a12 = var(e;) and [A:],, = a:.,, it follows from the diagonality of Z* that the m-step-ahead forecast error for z,[ is a*(j, m) = C;=,U,%;:~U,:; Th e unc ton ~,~C~~~,‘a,*,~/a*( f t’ j, m) provides a measure of the relative contribution of the disturbance corresponding to z, in (3.17) to the m-step-ahead forecast error in z, This measure is somewhat similar to the
measures of feedback discussed previously; when m = 1 and ,E is diagonal, there
is a simple arithmetic relationship between them An important advantage of this
decomposition is that for large m it isolates relative contributions to movements
in the variables which are, intuitively, “persistent” An important disadvantage,
however, is that the measures depend on the ordering of the variables through the renormalization of (3.1)
Geweke (1982a) has shown that the measures of feedback F,, x and F,, ,,
may be decomposed by frequency Subject to some side conditions which as a practical matter are weak, there exist non-negative bonded functions f,,_ x(h)
(l/2a>~lr,fx+r(X)dX The measures of feedback are thus decomposed into measures of feedback by frequency which correspond intuitively to the “long run” (low frequencies, small h) and “short run” (high frequencies, large A) In
the case of low frequencies, this relationship has been formalized in terms of the implications of comparative statics models for time series [Geweke (1982b)]
The condition that Y not cause X, in the sense defined in Section 1, is very closely related to the condition that X be strictly exogenous in a stochastic model The two are so closely related that tests of the hypothesis that Y does not cause X are often termed “exogeneity tests” in the literature [Sims (1977), Geweke (1978)] The strict exogeneity of X is in turn invoked in inference in a wide variety of situations, for example the use of instrumental variables in the presence of serially correlated disturbances The advantage of the strict exogeneity assumption is that there is often no loss in limiting one’s attention to distributions conditional on strictly exogenous X, and this limitation usually results in considerable simplifica- tion of problems of statistical inference As we shall soon see, however, the
condition that Y not cause X is not equivalent to the strict exogeneity of X All
that can be said is that if X is strictly exogenous in the complete dynamic simultaneous equation model, then Y does not cause X, where Y is endogenous in that model This means that tests for the absence of a Wiener-Granger causal
Trang 181118 J Gew,eke
ordering can be used to refute the strict exogeneity specification in a certain class
of stochastic models, but never to establish it In addition, there are many circumstances in which nothing is lost by undertaking statistical inference condi- tional on a subset of variables which are not strictly exogenous- the best known being that in which there are predetermined variables in the complete dynamic simultaneous equation model Unidirectional causality is therefore neither a necessary nor a sufficient condition for inference to proceed conditional on a subset of variables
To establish these ideas, specific terminology is required We begin by adopting
a definition due to Koopmans and Hood (1953, pp 117-120), as set forth by Christ (1966, p 156).8
Dejinition
A strictly exogenous variable in a stochastic model is a variable whose value in
each period is statistically independent of the values of all the random dis- turbances in the model in all periods
Examples of strictly exogenous variables are provided in complete, dynamic simultaneous equation models in which all variables are normally distributed: 9
B(L)y,+ryL)x,=u,:
A(L)u,=e,:
Roots of (B( L)( and (A( L)( have modulus greater than 1
This model is similar to Koopmans’ (1950) and those discussed in most econometrics texts, except that serially correlated disturbances and possibly
infinite lag lengths are allowed The equation A( L)-‘B( L)y, + A( L)-‘I?( L)x, =
E, corresponds to (3.10) in the canonical form derived in Section 3, and since E, is uncorrelated with X, it corresponds to (3.15) as well Hence, Fy+ X = 0: Y does
not cause X In view of our discussion in Section 2 and the fact that the complete dynamic simultaneous equation model is usually perceived as a structure which accepts X as input, this implication is not surprising
If Y does not cause X then there exists a complete dynamic simultaneous
equation model with Y endogenous and X strictly exogenous, in the sense that
‘We use the term “strictly exogenous” where Christ used “exogenous” in order to distinguish this concept from weak exogeneity, to be introduced shortly
‘The strong assumption of normality is made because of the strong condition of independence in our definition of strict exogeneity As a practical matter, quasi-maximum likelihood methods are usually used, and the independence condition can then be modified to specify absence of correlation,
Trang 221122 J Geweke
tions about serial correlation in the latter case should therefore be tested just as unidirectional causality should be tested when the weak exogeneity specification
rests on strict exogeneity In both cases, weak exogeneity will still rely on a priori
assumptions; no set of econometric tests will substitute for careful formulation of the economic model
5 Inference”
Since the appearance of Sims’ (1972) seminal paper, causal orderings among many economic time series have been investigated The empirical literature has been surveyed by Pierce (1977) and Pierce and Haugh (1977) Virtually all empirical studies have been conducted under the assumptions introduced in Section 3: time series are wide sense stationary, purely non-deterministic with autoregressive representation, the relevant universe of information at t is Y and X,, predictors are linear, and the criterion for comparison of forecasts is mean square error A wide array of tests has been used In this section we will describe and compare those tests which conceivably allow inference in large samples-i.e those for which the probability of Type I error can, under suitable conditions, be approximated arbitrarily well as sample size increases The development of a theory of inference is complicated in a non-trivial way by the fact that expression
of all possible relations between wide sense stationary time series requires an infinite number of parameters, as illustrated in the canonical form derived in Section 3 This problem is not insurmountable, but considerable progress on the
“parameterization problem” is required before a rigorous and useful theory of inference for time series is available; as we proceed, we shall take note of the
major lacunae
5.1 Alternative tests
Suppose that Y and X are two vector time series which satisfy the assumptions of Section 3 We find it necessary to make the additional assumption that Y and X are linear processes [Hannan (1970, p 209)], which is equivalent to the specifica- tion that the disturbances u,, and 17/t in Table 3.1 are serially independent, and not merely serially uncorrelated Consider the problem of testing the null hy- pothesis that Y does not cause X From the Theorem of Section 3, this may be done by testing (3.5) as a restriction on (3.7) or (3.10) as a restriction on (3.15)
We shall refer to tests based on the first restriction as “Granger tests”, since the restriction emerges immediately from Granger’s (1969) definition, and to tests