Hence a look at the residual plot for a regression that i has no autocorrelation; ii has positive autocorrelation, and, iii has negative autocorrelation.. The positive autocorrelation is
Trang 1Applied Econometrics
Lecture 9: Autocorrelation
“It is never possible to step twice into the same river”
1) Introduction
Autocorrelation (also called serial correlation) is violation of the assumption that the error terms are not correlated, i.e., with autocorrelation E(∈i, ∈j) ≠ 0 (∈i ≠∈j) That is, the error in the period t is not independent of previous errors
Since we do not know the population line, we do not know the actual errors (∈s), but we estimate them by the residuals (e) Hence a look at the residual plot for a regression that (i) has no autocorrelation; (ii) has positive autocorrelation, and, (iii) has negative autocorrelation The positive autocorrelation is the common problem in economics
2) Consequences of autocorrelation
Ordinary least squares (OLS) estimates in presence of autocorrelation will not have the desirable statistical properties With positive autocorrelation the standard errors are too low (underestimated) This adversely affects the t statistics (overestimated), so we may reject the null when it is in fact valid Likewise the R2 and related F – statistic are likely to be overestimated
3) Detecting autocorrelation
There are many ways to check for autocorrelation such as (1) looking at the residual plot; (2) observing the correlogram; (3) using the runs tests; and, (4) using the Durbin – Watson statistic This section presents the runs tests and Durbin – Watson tests
3.1) Runs test
Autocorrelation can show up in the residual plot A non – autocorrelation error should jump around the mean (zero) in a random manner With positive autocorrelation (we are most likely to get with economic data) the error is more likely to stay above or below the mean for successive observations (with negative autocorrelation it will jump above and below very frequently)
We can formalize this approach in the runs test, by counting the number of runs in the data A run is defined as the succession of positive or negative residual (even just one observation counts as a run)
We saw also that if there is positive autocorrelation then there will be rather fewer runs than we should expect from a series with no – autocorrelation On the other hand, if there is negative autocorrelation then there are more runs than with no autocorrelation
Trang 2The table for the runs test gave a confident interval – if the observed number of runs is outside this interval we reject null hypothesis of autocorrelation If the actual number of runs is less than the lower bound of the confidence interval then we reject in favor of negative autocorrelation If it is a higher we reject in favor of negative autocorrelation We may sometimes need to calculate the interval ourselves
n
N
N1 2
2
+1
SR
2 =
) 1 (
) 2
( 2
2
2 1 2 1
−
−
n n
n N N N N
Where N 1 is the number of positive residuals, N 2 is the number of negative residual, R is total number of runs, and n is the number observations ( so n = N1 + N2 )
The confidence interval at 5 percent level of significance is given by:
E(R) – 1.96 sR ≤ R ≤ E(R) +1.96 sR
We accept the null hypothesis of no autocorrelation if the observed number of runs falls within the confidence interval
3.2) Durbin – Watson test
A second (the most common) test is the Durbin – Watson (DW) test The DW statistic is defined as:
d =
∑
∑
=
−
n
t t
n
t
t t
e
e e
1 2 2
2 1
Note that d = 2(1+ρ)
-1 ≤ ρ ≤ +1
d will be zero with extreme positive autocorrelation 4 with extreme negative autocorrelation and 2 if there is no autocorrelation
The null hypothesis is that DW = 2, which corresponds to no autocorrelation
Reject H 0 : positive
autocorrelation
Zone of indecision Accept H 0 : no
autocorrelation
Zone of indecision Reject H 0 : positive
autocorrelation
0 dL dU 2 4 – dU 4 – dL 4
Testing for autocorrelation with a lagged dependent variable
If the model contains a lagged dependent variable, d will be biased 2 (this bias may lead us to accept the null when in fact autocorrelation is present In such cases we must instead use Durbin’s h
Trang 3h =
)]
[var(
1 2
1
1
b n
n d
−
⎟
⎠
⎞
⎜
⎝
⎛ −
Where var(b1) is the square of the standard error of the coefficient on the lagged dependent variables and n is the number of observations The test may not be used if n[var(b1)] is greater than one The runs test and DW are not equivalent – they may give different answers Also the fact that DW may frequently fall in the indecision zone mean that some judgment is required If DW is in the in decision zone, but fairly close to du, and the run test indicates no autocorrelation, then you can assume no autocorrelation
4) Why do we get autocorrelation?
We test for autocorrelation on the residuals; but these are only a good proxy for the true error if the model is correct The presence of autocorrelation will very often indicate miss–specification, including:
Incorrect functional form
Omitted variable(s)1
Structural instability
Influential points
Spurious regression
Spurious regression is the very serious problem in time series data A rule of thumb is the R2 > d indicates that a regression is spurious (if R2 > d the regression is almost certainly spurious, but if
R2<d the regression may still be spurious)
A note on cross–section data: autocorrelation must be a time series problem, as we can always remove autocorrelation from cross–section data by re–ordering the data However, if the data are sorted by one of the independent variables then the apparent presence of autocorrelation can still indicate misspecification Reordering is not usually an option in time – series data, and certainly not
so if the equation includes any lags
5) Remedial measures
The first thing to be is to interpret autocorrelation as a symptom of misspecification and so to carry out various specification tests (i.e for omitted variables, structural breaks, etc) This will nearly always cure the problem If the autocorrelation is genuine you can remove the autocorrelated errors by:
1 The exclusion of relevant variable(s) will bias the estimates of the coefficients of all variables included in the model (unless they happen to be orthogonal to the excluded variable) The normal t – tests cannot tell us if the model is misspecified on account of
Trang 45.1 The Cochrane – Orcult procedure
Suppose we have the model: Yt = β0 + β1X1 + ut
It is usually assumed that the ei follow the first–order autoregressive scheme, namely,
ut = ρut-1 + εt
Cochrance and Orcutt (1949) then recommend the following steps to estimste ρ
1 Estimate the two – variable model and calculate the residuals, et-1
2 Run the following regression: et = ρet-1 + vt
3 Using ρ, run the generalized difference equation:
(Yt – ρYt-1) = β1(1 – ρ) + β2(Xt – ρXt-1) + (ut – ρut-1)
Yt
*
+ β2
*
Xt
*
+ et
*
4 Calculate the new residuals: et
* *
= Yt – β1
* – β2
*
Xt
*
5 Estimate regression: et
**
= ρet-1
* *
+ wt
This second round estimate of ρ may not be the best estimate We can go into the third round estimate and so on We may stop calculating when the successive estimates of ρ differ by a very small amount from 0.01 to 0.005
5.2 The Durbin procedure
Durbin (1960) suggested can alternative method of estimating ρ The generalized difference equation can be written as:
Yt = β1(1 – ρ) + β2Xt + ρβ2Xt-1 + ρYt-1 +et
Once an estimate of ρ is obtained, we regress the transformed variable Y*
on X* as in
Yt
* = β1
* + β2
*
Xt
*
+ et
*
5.3 The Theil – Nagar procedure
Theil and Nagar (1961) have estimated ρ based on d statistic (in the small samples)
ρ = 2(1 2 /22) 2
k N
k d N
−
+
−
Where N is the total number of observations, d is DW, k is the number of coefficients including the intercept
Trang 55.4 The Hildreth – Lu procedure
From the first – order autoregressive scheme ut = ρut-1 + εt Hildreth – Lu (1960) recommend selecting ρ lie between ±1 using 0,1 unit intervals and transforming the data by the generalized difference equation and obtain the associated RSS Hildreth – Lu suggest choosing that ρ which minimizes the RSS
The differencing procedure looses one observation To avoid this, the first observation on Y and X is transformed as follows: Y1(1–ρ )0.5
and X1(1–ρ)0.5
(Prais –winsten: 1971)
5.5 Detrending by including a time trend as one of regessors
Yt = β 1 + β2Xt + β3 ut
The first – order transformation of it as follow:
ΔYt =β2ΔXt + β3 + εt
There is an intercept term in the first difference form It signifies that there was a linear trend term in the original model If β3 > 0, there is an upward trend after removing the influence of the variable X
We emphasize that techniques are only to be use if you are sure that there is no problem of misspecification
6) An example
Regression of crop output on the price index and fertilizer input (Table 6.1) was found to be badly autocorrelated: the DW statistic was 0.96 to a critical value of dL of 1.28 We found that the autocorrelation arose from a problem of omitted variable bias But for illustrative purposes we shall see how the autocorrelation may be removed using the Cochrane – Orcutt correction To do this we carry out the following steps:
(1) The estimated equation with OLS gives DW = 0.958; thus ρ = 1 – d/2 = 0.521
(2) Calculate Qt
*
= Qt – 0.521Qt–1, and similarly for P* and F* for observations 1962 to 1990 The results are shown in Table 6.1
(3) Apply the Prais – Winsten transformation to get Q*1961 = (1 – 0.5612)1/2 Q1961, and similarly for the 1961 of P* and F* (Although we do have 1960 values for P and F, though not Q, and so could apply the Cochrane – Orcutt procedure to the 1960 observations, the fact that we use the Prais – Winsten transformation for one variable means that we must also use it for the others) The resulting values are shown in Table 6.1
Trang 6Table 6.1: Application of Cochrane – Orcutt correction to crop production function data
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
40.4 36.4 35.4 37.9 34.8 27.9 29.8 34.7 38.4 33.6 33.6 32.2 35.3 39.4 30.6 30.5 33.7 35.8 36.0 37.0 30.7 28.0 28.4 27.6 32.9 37.1 36.0 36.6 38.8 37.1
34.5 15.4 16.5 19.4 15.1 9.8 15.3 19.1 20.3 13.6 16.1 14.7 18.5 21.1 10.1 14.5 17.8 18.2 17.3 18.3 11.4 12.0 13.8 12.8 18.5 20.0 16.7 17.9 19.7 16.9
106.0 108.1 110.3 110.1 108.6 103.8 109.5 102.6 101.1 100.9 104.7 107.3 103.0 116.4 112.7 108.0 103.2 101.0 103.6 109.6 105.2 98.7 99.2 94.8 100.6 104.5 98.9 101.8 105.6 108.7
90.5 52.9 54.0 52.6 51.2 47.2 55.4 45.5 47.6 48.2 52.2 52.7 47.1 62.8 52.0 49.3 46.9 47.2 51.0 55.6 48.1 43.9 47.7 43.2 51.2 52.1 44.5 50.2 52.6 53.6
99.4 100.8 102.1 102.9 103.1 104.2 104.6 105.6 106.8 106.7 108.3 108.6 110.4 111.2 111.1 110.7 110.5 112.0 111.6 113.2 114.1 114.8 114.8 114.4 114.6 114.5 114.2 115.5 116.7 118.0
84.8 49.0 49.6 49.7 49.5 50.5 50.3 51.1 51.7 51.1 52.7 52.2 53.8 53.7 53.1 52.8 52.8 54.4 53.2 55.0 55.1 55.4 55.0 54.7 54.9 54.8 54.6 55.9 56.5 57.2
(4) Estimate the following model:
* = β1
* + β2
*
Pt
* + β3
*
Ft
* + ∈t
*
Trang 7The regression results are given in Table 6.2 (which repeats also those for OLS estimation) Calculate the estimate of the intercept b1 = b1
*/(1–ρ), which equals –19.98
Table 6.2: Regression results with Cochrane – Orcutt procedure
OLS Coefficient
(t – statistic)
10.21 (0.41)
0.26 (1.26)
–0.03 (–0.20)
0.12 0.96
Cochrane – Orcutt
procedure
Coefficient (t – statistic)
–9.57 (–2.02)
0.28 (2.63)
0.22 (1.58)
0.62 1.50
(5) Comparing the two regressions, we see that the DW statistic is now 1.5 This value falls towards the upper end of the zone of indecision, so the evidence for autocorrelation is much weaker than
in the OLS regression, though it may be thought worthwhile to repeat the procedure (using a new ρ of 0.25, calculated from the new DW)
Comparison of the slope coefficients from the two regressions shows price to be relatively unaffected With the Cochrane – Orcutt procedure, the fertilizer variable produces the expected positive sign, though it remains insignificant The unexpected insignificance of fertilizer is a further indication that we should have treated the initial autocorrelation as a sign of misspecification In this case, the Cochrane – Orcutt procedure has suppressed the symptom of misspecification, but cannot provide the cure – which is to include the omitted variables
If you believe the model to be correctly specified and there is autocorrelation, then the Cochrane – Orcutt procedure may be used to obtain efficient estimates
References
Bao, Nguyen Hoang (1995), ‘Applied Econometrics’, Lecture notes and Readings,
Vietnam-Netherlands Project for MA Program in Economics of Development
Maddala, G.S (1992), ‘Introduction to Econometrics’, Macmillan Publishing Company, New York
Mukherjee Chandan, Howard White and Marc Wuyts (1998), ‘Econometrics and Data Analysis for
Developing Countries’ published by Routledge, London, UK
Trang 8Workshop 9: Autocorrelation
1.1) Use the data given in below table to regress output on the price index and fertilizer in put Draw the residual plot and count the number of runs
Output (Q) Price Index (P) Fertilizer input (F) Rainfall (R)
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
n.a
40.4 36.4 35.4 37.9 34.8 27.9 29.8 34.7 38.4 33.6 33.6 32.2 35.3 39.4 30.6 30.5 33.7 35.8 36.0 37.0 30.7 28.0 28.4 27.6 32.9 37.1 36.0 36.6 38.8 37.1
100.0 106.0 108.1 110.3 110.1 108.6 103.8 109.5 102.6 101.1 100.9 104.7 107.3 103.0 116.4 112.7 108.0 103.2 101.0 103.6 109.6 105.2 98.7 99.2 94.8 100.6 104.5 98.9 101.8 105.6 108.7
100.0 99.4 100.8 102.1 102.9 103.1 104.2 104.6 105.6 106.8 106.7 108.3 108.6 110.4 111.2 111.1 110.7 110.5 112.0 111.6 113.2 114.1 114.8 114.8 114.4 114.6 114.5 114.2 115.5 116.7 118.0
184.2 155.3 107.3 110.1 169.3 81.9 22.0 31.5 198.2 147.5 76.5 90.2 54.3 178.3 70.6 53.0 74.4 136.4 131.1 109.2 112.8 43.0 53.5 20.6 56.4 78.3 151.8 145.2 112.3 161.0 94.5
Trang 91.2) Run the following regressions:
Q = α0 + α1P + α2F
and
Q = β0 + β1P + β2F + β3R + β4R–1
Use an F – test to test the hypothesis that the two rainfall variables may be excluded from the regression
1.3) In the light of your results, comment on the apparent problem of autocorrelation in the regression of output on the price index and fertilizer output
2) Compile time – series data for the population of the country of your choice Regress both terms of trade and logged terms of trade on time and graph the residuals in each case How many runs are there in each case? Comment Can you specify the equation to increase the number of runs?
3) Using data given in data file SOCECON, regress the infant mortality rate on income per capita Plot the residuals with the observations ordered: (a) alphabetically; and, (b) by income per capita Count the number of runs in each case Comment on your results
4) Using the Sri Lankan macroeconomic data set (SRINA), perform the simple regression of Ip
on Ig and plot the residuals Use both runs and DW tests to check for autocorrelation Add variables to the equation to improve the model specification and repeat the tests for autocorrelation Use the Cochrane – Orcutt estimation procedure if you feel it is appropriate Comment on your findings