Part V Multivariate Time Series Models 429
5.5 Regression model with autocorrelated disturbances
Consider the regression model
yt =α+βxt+ut, (5.6)
ut =φut−1+εt (5.7)
where we assume that|φ|<1, andεtis a white-noise process, namely it is a serially uncorrelated process with a zero mean and a constant varianceσ2ε
Cov(εt,εt)=0, for t=t, and Var(εt)=σ2ε. (5.8) We assume thatεtand the regressors are uncorrelated, namely
E(εt |xt, xt−1,. . ..)=0. (5.9) Note that condition (5.9) is weaker than the orthogonality assumption (5.3), where it is assumed thatεtis uncorrelated with future values of xt, as well as with its current and past values.
By repeated substitution, in (5.7), we have
ut =εt+φεt−1+φ2εt−2+. . .,
which is known as the moving average form for ut. From the above expression, under|φ| < 1, each disturbance utembodies the entire past history of theεt, with the most recent observa- tions receiving greater weight than those in the distant past. Since the successive values ofεtare uncorrelated, the variance of utis
Var(ut)=σ2ε+φ2σ2ε+φ4σ2ε+. . ..
= σ2ε 1−φ2. The covariance between utand ut−1is given by
Cov(ut, ut−1)=E(utut−1)=E [(φut−1+εt)ut−1]=φ σ2ε 1−φ2.
To obtain the covariance between utand ut−s, for any s, first note that by applying repeated sub- stitution equation (5.7) can be written as
ut =φsut−s+ s−1
i=1
φiεt−i.
It follows that
Cov(ut, ut−s)=E(utut−s)=E
φsut−s+ s−1
i=1
φiεt−i
ut−s
=φs σ2ε 1−φ2. To summarize, the covariance matrix of u=(u1, u2,. . ., uT)is given by
= σ2ε 1−φ2
⎛
⎜⎜
⎜⎝
1 φ . . . φT−1 φ 1 . . . φT−2
... ... ... ... φT−1 φT−2 . . . 1
⎞
⎟⎟
⎟⎠.
Note that, under|φ|<1, the values decline exponentially as we move away from the diagonal.
5.5.1 Estimation
Suppose, initially, that the parameterφin (5.7) is known. Then model (5.6) can be transformed so that the transformed equation satisfies the classical assumptions. To do this, first substitute ut=yt−α−βxtin ut =φut−1+εtto obtain:
yt−φyt−1=α (1−φ)+β (xt−φxt−1)+εt. (5.10) Define y∗t =yt−φyt−1, and x∗t =xt−φxt−1, then
y∗t =α (1−φ)+βx∗t +εt, t=2, 3,. . ., T. (5.11) It is clear that in this transformed regression, disturbancesεtsatisfy all the classical assumptions, and efficient estimators ofαandβcan be obtained by the OLS regression of y∗t on x∗t.
For the AR(2) error process we need to use the following transformations:
x∗t =xt−φ1xt−1−φ2xt−2, t=3, 4,. . ., T, (5.12) y∗t =yt−φ1yt−1−φ2yt−2, t=3, 4,. . ., T. (5.13) The above procedure ignores the effect of initial observations. For example, for the AR(1) case we can allow for the initial observations using:
x∗1= x1
1−φ2, (5.14)
y∗1= y1
1−φ2. (5.15)
100 Introduction to Econometrics
Efficient estimators ofαandβthat take account of initial observations can now be obtained by running the OLS regression of
y∗=
y∗1, y∗2,. . ., y∗T
, (5.16)
on an intercept and
x∗=
x∗1, x∗2,. . ., x∗T
. (5.17)
The estimators that make use of the initial observations and those that do not are asymptotically equivalent (i.e., there is little to choose between them when it is known that|φ| < 1 and T is relatively large).
Ifφ = 1, then y∗t and x∗t will be the same as the first differences of ytand xt, andβcan be estimated by regression ofytonxt. There is no long-run relationship between the levels of ytand xt.
Whenφis unknown,φandβcan be estimated using the Cochrane and Orcutt (C–O method) two-step procedure. Letφ (0)ˆ be the initial estimate ofφ, then generate the quasi-differenced variables
x∗t (0)=xt− ˆφ (0)xt−1, (5.18) y∗t (0)=yt− ˆφ (0)yt−1. (5.19) Then run a regression of y∗t (0)on x∗t (0)to obtain a new estimate ofαandβ, (sayα (1)ˆ and β (1), and hence a new estimate ofˆ φ, given by
φ (1)ˆ = T
t=2uˆt(1)uˆt−1(1) T
t=2uˆ2t (1) ,
whereˆut(1)=yt− ˆα (1)− ˆβ (1)xt. Generate new transformed observations x∗t (1)=xt− φ (1)ˆ xt−1, and y∗t (1) = yt− ˆφ (1)yt−1, and repeat the above steps until the two successive estimates ofβare sufficiently close to one another.
5.5.2 Higher-order error processes
The above procedure can be readily extended to general regression models and higher-order error processes. Consider for example the general regression model
yt =βxt+ut, t=1, 2,. . ., T, where utfollows the AR(2) specification:
AR(2): ut=φ1ut−1+φ2ut−2+ t, t∼N(0,σ2), t=1, 2,. . ., T. (5.20)
Assuming the error process is stationary and has started a long time prior to the first observation date (i.e., t=1) we have
AR(1)Case : Var(u1)= σ2 1−φ2, AR(2)Case :
⎧⎪
⎪⎪
⎪⎨
⎪⎪
⎪⎪
⎩
Var(u1)=Var(u2)= σ2(1−φ2) (1+φ2)
(1−φ2)2−φ21, Cov(u1, u2)= σ2φ1
(1+φ2)
(1−φ2)2−φ21.
The exact ML estimation procedure then allows for the effect of initial values on the parameter estimates by adding the logarithm of the density function of the initial values to the log-density function of the remaining observations obtained conditional on the initial values. For example, in the case of the AR(1) model the log-density function of(u2, u3,. . ., uT)conditional on the initial value, u1, is given by
log
f(u2, u3,. . ., uT|u1)
= −(T−1)
2 log(2πσ2)− 1 2σ2
T
t=2
u2t
, (5.21)
and
log f(u1)
= −1
2log(2πσ2)+1
2log(1−φ2)−(1−φ2) 2σ2 u21.
Combining the above log-densities yields the full (unconditional) log-density function of (u1, u2,. . ., uT)
log
f(u1, u2,. . ., uT)
= −T
2 log(2πσ2)+1
2log(1−φ2)
− 1 2σ2
T
t=2
(ut−φut−1)2+(1−φ2)u21
. (5.22)
Asymptotically, the effect of the distribution of the initial values on the ML estimators is negli- gible, but it could be important in small samples where xts are trended andφis suspected to be near but not equal to unity. See Pesaran (1972) and Pesaran and Slater (1980) (Chs 2 and 3) for further details. Also see Judge et al. (1985), Davidson and MacKinnon (1993), and the papers by Hildreth and Dent (1974), and Beach and MacKinnon (1978). Strictly speaking, the ML estima- tion will be exact if lagged values of ytare not included amongst the regressors. For a discussion of the exact ML estimation of models with lagged dependent variables and serially correlated errors see Pesaran (1981a).
102 Introduction to Econometrics
5.5.3 TheAR(1) case
For this case, the ML estimators can be computed by maximizing the log-likelihood function1 AR1(θ)= −T
2 log(2πσ2)+12log(1−φ2) (5.23)
− 1
2σ2(y−Xβ)R(φ)(y−Xβ),
with respect to the unknown parametersθ=(β,σ2,φ), where R(φ)is the T×T matrix
R(φ)=
⎛
⎜⎜
⎜⎜
⎜⎜
⎜⎝
1 −φ ã ã ã 0 0
−φ 1+φ2 ã ã ã 0 0 ... ... ... ... ... 0 0 ã ã ã 1+φ2 −φ 0 0 ã ã ã −φ 1
⎞
⎟⎟
⎟⎟
⎟⎟
⎟⎠
, (5.24)
and|φ|<1.
The computations are carried out by the ‘inverse interpolation’ method which is certain to converge. See Pesaran and Slater (1980) for further details.
The concentrated log-likelihood function in this case is given by AR1(φ)= −T
2[1+log(2π)] + 12log(1−φ2) (5.25)
− T
2 log{˜uR(φ)u˜/T}, |φ|<1, whereu˜is the T×1 vector of ML residuals:
˜
u=y−X[XR(φ)X]−1XR(φ)y.
5.5.4 TheAR(2) case
For this case, the ML estimators are obtained by maximizing the log-likelihood function AR2(θ)= −T
2 log(2πσ2)+log(1+φ2) (5.26) +12log
(1−φ2)2−φ21
− 1
2σ2(y−Xβ)R(φ)(y−Xβ),
1 This result follows readily from (5.22) and can be obtained by substituting ut=yt−βxtin (5.22).
with respect toθ=(β,σ2,φ), whereφ=(φ1,φ2)
R(φ)=
⎛
⎜⎜
⎜⎜
⎜⎜
⎜⎜
⎜⎝
1 −φ1 −φ2 0 . . . 0 0
−φ1 1+φ21 −φ1+φ1φ2 −φ2 . . . 0 0
−φ2 −φ1+φ1φ2 1+φ21+φ22 −φ1+φ1φ2 . . . 0 0 0 −φ2 −φ1+φ1φ2 1+φ21+φ22 . . . 0 0
... ... ... ... ... ... ...
0 0 0 0 . . . 1+φ21 −φ1
0 0 0 0 . . . −φ1 1
⎞
⎟⎟
⎟⎟
⎟⎟
⎟⎟
⎟⎠ .
(5.27)
The estimation procedure imposes the restrictions 1+φ2>0, 1−φ2+φ1>0, 1−φ2−φ1>0.
, (5.28)
needed if the AR(2) process, (5.20), is to be stationary.
5.5.5 Covariance matrix of the exactML estimators for theAR(1) andAR(2) disturbances
The estimates of the covariance matrix of the exact ML estimators defined in the above sub- sections are computed on the assumption that the regressors xtdo not include lagged values of the dependent variable.
For the AR(1) case we have
V˜(β)˜ = ˆσ2[XR(φ)X]˜ −1, (5.29)
V˜(φ)˜ =T−1(1− ˜φ2), (5.30)
where R(φ)˜ is already defined by (5.24), andσˆ2is given below by (5.39) below.
For the AR(2) case we have
V˜(β)˜ = ˆσ2[XR(φ˜1,φ˜2)X]−1, (5.31) V(˜ φ˜1)= ˜V(φ˜2)=T−1(1− ˜φ22), (5.32) Cov(φ˜1,φ˜2)= −T−1φ˜1(1+ ˜φ2), (5.33) where R(φ˜1,φ˜2)is defined by (5.27). Here the ML estimators are designated by∼.
5.5.6 Adjusted residuals, R2,R¯2, and other statistics
In the case of the exact ML estimators, the ‘adjusted’ residuals are computed as (see Pesaran and Slater (1980, pp. 49, 136)):
104 Introduction to Econometrics
˜1= ˜u1
/(1− ˜φ2)"
, (5.34)
˜2= ˜u2
#
(1− ˜φ22)− ˜u1 φ˜1
#
[(1+ ˜φ2)/(1− ˜φ2)], (5.35)
˜t = ˜ut− ˜φ1u˜t−1− ˜φ2u˜t−2, t=3, 4,. . ., T, (5.36) where
˜
ut =yt−xtβ,˜ t=1, 2,. . ., T, are the ‘unadjusted’ residuals, and
β˜ =
XR(φ)˜ X −1
XR(φ)˜ y. (5.37)
Recall thatφ˜ = (φ˜1,φ˜2). The programme also takes account of the specification of the AR- error process in computations of the fitted values. Denoting these adjusted (or conditional) fitted values byy˜t, we have
˜
yt = ˜E(yt$$yt−1, yt−2,. . .; xt, xt−1,. . .)=yt− ˜t, t=1, 2,. . ., T. (5.38) The standard error of the regression is computed using the formula
ˆ
σ2= ˜uR(φ)˜ u/(T˜ −k−p), (5.39)
where p = 1, for the AR(1) case, and p = 2 for the AR(2) case. Given the way the adjusted residuals˜tare defined above we also have
ˆ
σ2= ˜uR(φ)˜ %u/(T−k−p)= T
t=1
˜2t/(T−k−p). (5.40)
Notice that this estimator ofσ2differs from the ML estimator given by
˜ σ2=
T t=1
˜2t/T,
and the estimator adopted in Pesaran and Slater (1980). The difference lies in the way the sum of squares of residuals˜2t, is corrected for the loss in degrees of freedom arising from the esti- mation of the regression coefficients,β, and the parameters of the error process,φ=(φ1,φ2).
The R2,R¯2, and the F-statistic are computed from the adjusted residuals:
R2=1− T
t=1
˜2t
& T
t=1
(yt− ¯y)2
,
R¯2=1−(σˆ2/σˆ2y), (5.41)
where σˆy is the standard deviation of the dependent variable, defined as before byσˆ2y = T
t=1(yt− ¯y)2/(T−1).
The F-statistics reported following the regression results are computed according to the for- mula
F-statistic= ' R2
1−R2
( 'T−k−p k+p−1
( a
∼F(k+p−1, T−k−p) (5.42) with
p=1, under AR(1) error specification and
p=2, under AR(2) error specification
Notice that R2in (5.42) is given by (5.41). The above F-statistic can be used to test the joint hypothesis that except for the intercept term, all the other regression coefficients and the param- eters of the AR-error process are zero. Under this hypothesis the F-statistic is distributed approx- imately as F with k+p−1 and T−k−p degrees of freedom. The chi-squared version of this test can be based on TR2/(1−R2), which under the null hypothesis of zero slope and AR coefficients is asymptotically distributed as a chi-squared variate with k+p−1 degrees of freedom.
The Durbin–Watson statistic is also computed using the adjusted residuals,˜t:
DW= T
t=2(˜t− ˜t−1)2 T t=1˜2t
.
5.5.7 Log-likelihood ratio statistics for tests of residual serial correlation The log-likelihood ratio statistic for the test of AR(1) against the non-autocorrelated error spec- ification is given by
χ2AR1,OLS=2(LLAR1−LLOLS)∼a χ21.
The log-likelihood ratio statistic for the test of the AR(2)-error specification against the AR(1)- error specification is given by
χ2AR2,AR1=2(LLAR2−LLAR1)∼a χ21.
Both of the above statistics are asymptotically distributed, under the null hypothesis, as a chi- squared variate with one degree of freedom.
106 Introduction to Econometrics
The log-likelihood values, LLAR1and LLAR2, represent the maximized values of the log- likelihood functions defined by (5.23), and (5.26), respectively. LLOLS denotes the maximized value of the log-likelihood estimator for the OLS case.