The autocovariance and autocorrelation functions are vectorscontaining the unique elements of the covariance and correlation matrices.The simplest time series has all yt ∼ IIDµ, σ2, i.e.
Trang 1CHAPTER 67
Timeseries Analysis
A time seriesywith typical elementysis a (finite or infinite) sequence of randomvariables Usually, the subscript s goes from 1 to ∞, i.e., the time series is written
y1,y2, , but it may have different (finite or inifinite) starting or ending values
67.1 Covariance Stationary Timeseries
A time series is covariance-stationary if and only if:
E[ys] = µ for all s(67.1.1)
var[ys] < ∞ for all s(67.1.2)
cov[ys,ys+k] = γk for all s and k(67.1.3)
Trang 2I.e., the means do not depend on s, and the covariances only depend on the distancesand not on s A covariance stationary time series is characterized by the expectedvalue of each observation µ, the variance of each observation σ2, and the “auto-correlation function” ρk for k ≥ 1 or, alternatively, by µ and the “autocovariancefunction” γk for k ≥ 0 The autocovariance and autocorrelation functions are vectorscontaining the unique elements of the covariance and correlation matrices.
The simplest time series has all yt ∼ IID(µ, σ2), i.e., all covariances betweendifferent elements are zero If µ = 0 this is called “white noise.”
A covariance-stationary process yt(t = 1, , n) with expected value µ = E[yi]
is said to be ergodic for the mean if
n→∞
1n
n
X
t=1
yt= µ
We will usually require ergodicity along with stationarity
Problem 548 [Ham94, pp 46/7] Give a simple example for a stationarytime series process which is not ergodic for the mean
Answer White noise plus a mean which is drawn once and for all from a N (0, τ 2 ) independent
Trang 367.1.1 Moving Average Processes The following is based on [Gra89, pp.63–91] and on [End95].
We just said that the simplest stationary process is a constant plus “white noise”(all autocorrelations zero) The next simplest process is a moving average process oforder 1, also called a MA(1) process:
yt= µ +εt+ βεt−1 εt∼IID(0, σ2)(67.1.5)
where the firsty, say it isy1, depends on the pre-sampleε0
Problem 549 Compute the autocovariance and autocorrelation function of thetime series defined in (67.1.5), and show that the following process
(67.1.6) yt= µ +ηt+ 1
βηt−1 ηt∼IID(0, β2σ2)generates a timeseries with equal statistical properties as (67.1.5)
Answer (67.1.5): var[ yt] = σ 2 (1 + β 2 ), cov[ yt, yt−1] = βσ 2 , and cov[ yt, yt−h] = 0 for h > 1 corr[ yt, yt−1] = β/(1 − β 2 ) (67.1.6) gives the same variance β 2 σ 2 (1 + 1/β 2 ) = σ 2 (1 + β 2 ) and the same correlation (1/β)/(1 + 1/β 2 ) = β/(1 + β 2 ) The moving-average representation of a timeseries is therefore not unique It
is not possible to tell from observation of the time series alone whether the processgenerating it was (67.1.5) or (67.1.6) One can say in general that unless β = 1
Trang 4every MA(1) process could have been generated by a process in which |β| < 1 Thisprocess is called the invertible form or the fundamental representation of the timeseries.
Problem 550 What are the implications for estimation of the fact that a process can have different data-generating processes?
MA-Answer Besides looking how the timeseries fits the data, the econometrician should also look whether the disturbances are plausible values in light of the actual history of the process, in order
to ascertain that one is using the right representation
The fundamental representation of the time series is needed for forecasting Let
us first look at the simplest situation: the time series at hand is generated by theprocess (67.1.5) with |β| < 1, the parameters µ and β are known, and one wants toforecastyt+1on the basis of all past and present observations Clearly, the past andpresent has no information about εt+1, therefore the best we can hope to do is toforecast yt+1by µ + βεt
But do we know εt? If a time series is generated by an invertible process, thensomeone who knows µ, β, and the current and all past values of y can use this to
Trang 5reconstruct the value of the current disturbance One sees this as follows:
(67.1.15)
Trang 6If |β| < 1, the last term of the right hand side, which depends on the unobservable
ε0, becomes less and less important Therefore, if µ and β are known, and all pastvalues of yt are known, this is enough information to compute the value of thepresent disturbanceεt Equation (67.1.15) can be considered the “inversion” of theMA1-process, i.e., its representation as an infinite autoregressive process
The disturbance in the invertible process is called the “fundamental tion” because every yt is composed of a part which is determined by the history
innova-yt−1,yt−2, plusεtwhich is new to the present period
The invertible representation can therefore be used for forecasting: the bestpredictor ofyt+1 is µ + βεt
Even if a time series was actually generated by a non-invertible process, theformula based on the invertible process is still the best formula for prediction, butnow it must be given a different interpretation
All this can be generalized for higher order MA processes [Ham94, pp 64–68]says: for any noninvertible MA process (which is not borderline in the sense that
|β| = 1) there is an invertible MA process which has same means, variances, andautocorrelations It is called the “fundamental representation” of this process
The fundamental representation of a process is the one which leads to very ple equations for forecasting It used to be a matter of course to assume at the
Trang 7sim-same time that also the true process which generated the timeseries must be an vertible process, although the reasons given to justify this assumption were usuallyvague The classic monograph [BJ76, p 51] says, for instance: “The requirement
in-of invertibility is needed if we are interested in associating present events with pasthappenings in a sensible manner.” [Dea92, p 85] justifies the requirement of in-vertibility as follows: “Without [invertibility] the consumer would have no way ofcalculating the innovation from current and past values of income.”
But recently it has been discovered that certain economic models naturally lead
to non-invertible data generating processes, see problem 552 This is a process inwhich the economic agents observe and act upon information which the econometri-cian cannot observe
If one goes over to infinite MA processes, then one gets all indeterministic tionary processes According to the so-called Wold decomposition, every stationaryprocess can be represented as a (possibly infinite) moving average process plus a
sta-“linearly deterministic” term, i.e., a term which can be linearly predicted withouterror from its past There is consensus that economic time series do not contain suchlinearly deterministic terms
The errors in the infinite Moving Average representation also have to do withprediction: can be considered the errors in the best one-step ahead linear predictionbased on the infinite past [Rei93, p 7]
Trang 8A stationary process without a linear deterministic term has therfore the form
where the timeseries εs is white noise, and B is the backshift operator satisfying
e>tB = e>t−1 (here et is the tth unit vector which picks out the tth element of thetime series)
The coefficients satiisfy P ψ2
i < ∞, and if they satisfy the stronger condition
P |ψi| < ∞, then the process is called causal
Problem 551 Show that without loss of generality ψ0= 1 in (67.1.16)
Answer If say ψk is the first nonzero ψ, then simply write ηj= ψk ε j+k Dually, one can also represent each fully indeterministic stationary processs as
an infinite AR-processyt− µ +Pp
j=1φi(yt−i− µ) =εt This representation is calledinvertible if it satisfiesP |θ | < ∞
Trang 967.1.2 The Box Jenkins Approach Now assume that the operator Ψ(B) =
P∞
j=0ψjBjcan be written as the product Ψ = Φ−1Θ where each Φ and Θ are finitepolynomials in B Again, without loss of generality, the leading coefficients in Ψand Θ can be assumed to be = 1 Then the time series can be written
The Box-Jenkins approach is based on the assumption that empirically occurringstationary timeseries can be modeled as low-order ARMA processes This would forinstance be the case if the time series is built up recursively from its own past, withinnovations which extend over more than one period
If this general assumption is satisfied, this has the following implications formethodology:
• Some simple procedures have been developed how to recognize which ofthese time series one is dealing with
• In the case of autoregressive time series, estimation is extremely simple andcan be done using the regression framework
Trang 1067.1.3 Moving Average Processes In order to see what order a finite ing average process is, one should look at the correlation coefficients If the order is j,then the theoretical correlation coefficients are zero for all values > j, and thereforethe estimates of these correlation coefficients, which have the form
Pn t=k+1(yt− ¯y)(yt−k− ¯y)
Pn t=1(yt− ¯y)2
must be insignificant
For estimation the preferred estimate is the maximum likelihood estimate Itcan not be represented in closed form, therefore we have to rely on numerical maxi-mization procedures
67.1.4 Autoregressive Processes The common wisdom in econometrics isthat economic time series are often built up recursively from their own past Example
of an AR(1) process is
where the first observation, say it is y1, depends on the pre-sampley0 (67.1.20) iscalled a difference equation
Trang 11This process generates a stationary timeseries only if |α| < 1 Proof: var[yt] =var[yt−1] means var[yt] = α2var[yt] + σ2 and therefore var[yt](1 − α2) = σ2, andsince σ2> 0 by assumption, it follows that 1 − α2> 0.
Solution (i.e., Wold representation as a MA process) is
(67.1.21) yt=y0αt+ (εt+ αεt−1+ · · · + αt−1ε1)
As proof that this is a solution, write down αyt−1and check that it is equal toyt−εt.67.1.5 Difference Equations Let’s make here a digression about nth orderlinear difference equations with constant coefficients Definition from [End95, p 8]:
Trang 12(3) Then the general solution is the sum of the particular solution and an trary linear combination of all homogeneous solutions.
arbi-(4) Eliminate the arbitrary constant(s) by imposing the initial condition(s) onthe general solution
Let us apply this to yt= αyt−1+εt The homogeneous equation is yt= αyt−1
and this has the general solutionyt= βαtwhere β is an arbitrary constant If thetimeseries goes back to −∞, the particular solution is yt=P∞
i=0αiεt−i, but if thetimeseries only exists for t ≥ 1 the particular solution is yt = Pt−1
i=0αiεt−i Thisgives solution (67.1.21)
Now let us look at a second order process: yt= α1yt−1+ α2yt−2+xt In order
to get solutions of the homogeneous equation yt = α1yt−1+ α2yt−2 tryyt = βγt.This gives the following condition for γ: γt= α1γt−1+ α2γt−2or γ2− α1γ + α2= 0.The solution of this quadratic equation is
2
If this equation has two real roots, then everything is fine If it has only one realroot, i.e., if α2 = −α2/4, then γ = α1/2, i.e., yt = β1(α1/2)t is one solution Butthere is also a second solution, which is not obvious: y = β t(α /2)tis a solution as
Trang 13well One sees this by checking:
(67.1.24) t(α1/2)t= α1(t − 1)(α1/2)t−1+ α2(t − 2)(α1/2)t−2
Simplify this and you will see that it holds
If the roots of the characteristic equation are complex, one needs linear nations of these complex roots, which are trigonometric functions Here the homo-geneous solution can be written in the form
combi-(67.1.25) yt= β1rtcos(θt + β2)
where r =√
−α2 and θ is defined by cos(θ) = α1/2r This formula is from [End95,
p 29], and more explanations can be found there
But in all these cases the roots of the characteristic equations determine thecharacter of the homogeneous solution They also determine whether the differenceequation is stable, i.e., whether the homogeneous solutions die out over time or not.For stability, all roots must lie in the unit circle
In terms of the coefficients themselves, these stability conditions are much morecomplicated See [End95, pp 31–33]
These stability conditions are also important for stochastic difference equations:
in order to have stationary solutions, it must be stable
Trang 14It is easy to estimate AR processes: simply regress the time series on its lags.But before one can do this estimation one has to know the order of the autoregressiveprocess A useful tool for this are the partial autocorrelation coefficients.
We discussed partial correlation coefficients in chapter19 The kth partial correlation coefficient is the correlation betweenytandyt−kwith the influence of theinvervening lags partialled out The kth sample partial autocorrelation coefficient isthe last coefficient in the regression of the timeseries on its first k lags It is the effectwhich the kth lag has which cannot be explained by earlier lagged values In anautoregressive process of order k, the “theoretical” partial autocorrelations are zerofor lags greater than k, therefore the estimated partial autocorrelation coefficientsshould be insignificant for those lags The asymptotic distribution of these estimates
auto-is normal with zero mean and variance 1/T , therefore one often finds lines at 2/√
Tand −2/√
T in the plot of the estimated partial autocorrelation coefficients, whichgive an indication which values are significant at the 95% level and which are not
67.1.6 ARMA(p,q) Processes Sometimes it is appropriate to estimate astationary process as having both autoregressive and moving average components(ARMA) or, if they are not stationary, they may be autoregressive or moving averageafter differencing them one or several times (ARIMA)
An ARM A(p, q) process is the solution of a pth order difference equation with
a M A(q) as driving process
Trang 15These models have been very successful On the one hand, there is reason
to believe on theoretical grounds that many economic timeseries are ARM A(p, q).[Gra89, p 64] cites an interesting theorem which also contributes to the usefulness ofARM A processes: the sum of two independent series, one of which is ARM A(p1, q1)and the other ARM A(p2, q2), is ARM A p1+ p2, max(p1+ q2, p2+ q1)
Box and Jenkins recommend to use the autocorrelations and partial relations for determining the order of the autoregressive or moving average parts,although this more difficult for an ARMA process than for an MA or AR process
autocor-The last step after what in the time series context is called “identification” (amore generally used term might be “specification” or “model selection”) and estima-tion is diagnostic checking, i.e., a check whether the results bear out the assumptionsmade by the model Such diagnostic checks are necessary because mis-specification
is possible if one follows this procedure One way would be to see whether the uals resemble a white noise process, by looking at the autocorrelation coefficients ofthe residuals The so-called portmanteau test statistics test whether a given series iswhite noise: there is either the Box-Pierce statistic which is the sum of the squaredsample autocorrelations
p
X
rk2
Trang 16or the Ljung-Box statistic
A second way to check the model is to overfit the model and see if the additionalcoefficients are zero A third way would be to use the model for forecasting and tosee whether important features of the original timeseries are captured (whether itcan forecast turning points, etc.)
[Gre97, 839–841] gives an example Eyeballing the timeseries does not give theimpression that it is a stationary process, but the statistics seem to suggest an AR-2process
67.2 Vector Autoregressive Processes[JHG+88, Chapter 18.1] start with an example in which an economic timeseries
is not adequately modelled by a function of its own past plus some present vations, but where two timeseries are jointly determined by their past plus some
Trang 17inno-innovation: consumption function
Identifica-andy on the righthand side The first becomes
ε1t ε2tDisturbances have same properties as errors in simultaneous equations systems (it
is called vector white noise) VAR processes are special cases of multivariate time
Trang 18series Therefore we will first take a look at multivariate time series in general Agood source here is [Rei93].
Covariance stationarity of multivariate time series is the obvious extension of theunivariate definition (67.1.1)–(67.1.2):
E[yt] = µ(67.2.5)
var[ymt] < ∞(67.2.6)
n
X(yt−j− µ)>Θj =ε>t
Trang 19where Θ0 is lower diagonal and the covariance matrix of the disturbances is agonal For each permutation of the variables there is a unique lower diagonal Θ0
di-which makes the covariance matrix of the disturbances the identity matrix, here priorknowledge about the order in which the variables depend on each other is necessary.But if one has a representation like this, one can build an impulse response function
Condition for a VAR(n) process to be stationary is, using (67.2.9):
Trang 20Instead of using theory and prior knowledge to determine the number of lags,
we use statistical criteria Minimize an adaptation of Akaike’s AIC criterion
AIC(n) = log det( ˜Σn) +2M
2nT(67.2.12)
SC(n) = log det( ˜Σn) +M
2n log TT(67.2.13)
where M = number of variables in the system, T = sample size, n = number of lagsincluded, and ˜Σ has elements ˜σij= εˆ>i ε ˆ j
T
Again, diagnostic checks necessary because mis-specification is possible
What to do with the estimation once it is finished? (1) forecasting really easy,the AR-framework gives natural forecasts One-step ahead forecasts by simply usingpresent and past values of the timeseries and setting the future innovations zero, and
in order to get forecasts more than one step ahead, use the one-step etc forecastsfor those date which have not yet been observed
67.2.1 Granger Causality Granger causality tests are tests whether tain autoregressive coefficients are zero It makes more sense to speak of Granger-noncausality: the time series x fails to Granger-cause y if y can be predicted aswell from its own past as from the past of x and y An equivalent expression is:
cer-in a regression of y on its own lagged values y ,y , and the lagged values
Trang 21xt−1,xt−2, , the coefficients of xt−1,xt−2, are not significantly different fromzero.
Alternative test proposed by Sims: xfails to Granger-causeyif in a regression
ofyt on lagged, current, and futurexq, the coefficients of the futurexq are zero
I have this from [Mad88, 329/30] Leamer says that this should be calledprecedence, not causality, because all we are testing is precedence I disagree; thesetests do have implications on whether the researcher would want to draw causalinferences from his or her data, and the discussion of causality should be included instatistics textbooks
Innovation accounting or impulse response functions: make a moving averagerepresentation, and then you can pick the timepath of the innovations: perhaps a1-period shock, or a stepped increase, whatever is of economic interest Then youcan see how these shocks are propagated through the system
Trang 22Way out would be: transform the innovations in such a way that their mated covariance matrix is diagonal, and only experiment with these diagonalizedinnovations But there are more than one way to do this.
esti-If one has the variables ordered in a halfways sensible way, then one could usethe Cholesky decomposition, which diagonalizes this ordering of the variables
Other approaches: forecast error (MSE) can be decomposed into a sum of butions coming from the different innovations: but this decomposition is not unique!Then the MA-representation is the answer to: how can one make policy recom-mendations with such a framework
contri-Here is an example how an economic model can lead to a non-invertible VARMAprocess It is from [AG97, p 119], originally in [Qua90] and [BQ89] Income attime t is the sum of a permanent and a transitory component yt =yp
t+yt; thepermanent follows a random walk ypt =ypt−1+δt while the transitory income iswhite noise, i.e., yt =εt var[εt] = var[δt] = σ2, and all disturbances are mutuallyindependent Consumers know which part of their income is transitory and whichpart is permanent; they have this information because they know their own par-ticular circumstances, but this kind of information is not directly available to theeconometrician Consumers act on their privileged information: their increase inconsumption is all of their increase in permanent income plus fraction β < 1 of their
Trang 23transitory income ct−ct−1=δt+ βεt One can combine all this into
yt−yt−1=δt+εt−εt−1 δi∼ (0, σ2)(67.2.14)
ct−ct−1=δt+ βεt εi∼ (0, σ2)(67.2.15)
This is a vector-moving-average process for the first differences
esti-There is an invertible data generating process too, but it has the coefficients(67.2.17)
Trang 24shocks affecting consumption this period also have an effect on this period’s incomeand an opposite effect on next period’s income This is a quite different scenario,and in many respects the opposite scenario, than that in equation (67.2.16).
Problem 552 It is the purpose of this question to show that the following twovector moving averages are empirically indistinguishable:
where all error terms δ,ε,ξ, andζ are independent with equal variances σ2
• a Show that in both situations
ut
vt
,
and that the higher lags have zero covariances
Trang 25Answer First scenario: ut = δ t + ε t − ε t−1 and vt = δ t + β ε t Therefore var[ut] = 3σ 2 ; cov[ut, vt] = σ2+ βσ2, var[vt] = σ2 + β2σ2 ; cov[ut, ut−1] = −σ2; cov[ut, vt−1] = −βσ2, cov[vt, ut−1] = cov[vt, vt−1] = 0.
Second scenario: leaving out the factor √1
1+β 2 , we have ut = ξt −(1−β)ξt−1+(1+β)ζt −βζt−1 and vt = (1 + β2)ζt Therefore var[ut] = 3σ2; cov[ut, vt] = σ2+ βσ2, var[vt] = σ2+ β2σ2 ; cov[ut, ut−1] = −σ2; cov[ut, vt−1] = −βσ2, cov[vt, ut−1 ] = cov[vt, vt−1] = 0
• b Show also that the first representation has characteristic root 1 − β, and thesecond has characteristic root 1−β1 I.e., with β < 1, the first is not invertible but thesecond is
Answer Replace the Lag operator L by the complex variable z, and compute the determinant: