Assume we know for sure that lag length is not greater than M.. The significance level of this test depends on M and on the true lag length.. Since we never know the true lag length for
Trang 1Note that X contains presample values.
Trang 2Two problems: lag length often not known, and X matrix often highly collinear.
multi-How to determine lag length? Sometimes it is done by the adjusted ¯R2 [Mad88,
p 357] says this will lead to too long lags and proposes remedies
Assume we know for sure that lag length is not greater than M [JHG+88,
pp 723–727] recommends the following “general-to-specific” specification procedurefor finding the lag length: First run the regression with M lags; if the t-test forthe parameter of the M th lag is significant, we say the lag length is M If it isinsignificant, run the regression with M −1 lags and test again for the last coefficient:
If thet-test for the parameter of the M − 1st coefficient is significant, we say the laglength is M − 1, etc
The significance level of this test depends on M and on the true lag length Since
we never know the true lag length for sure, we will never know the true significancelevel for sure The calculation which follows now allows us to compute this signif-icance level under the assumption that the N given by the test is the correct N Furthermore this calculation only gives us the one-sided significance level: the nullhypothesis is not that the true lag length is = N , but that the true lag length is
≤ N
Assume the null hypothesis is true, i.e., that the true lag length is ≤ N Since
we assume we know for sure that the true lag length is ≤ M , the null hypothesis
Trang 3is equivalent to: βN +1 = βN +2 = · · · = βM = 0 Now assume that we apply theabove procedure and the null hypothesis holds The significance level of our test
is the probability that our procedure rejects the null although the null is true Inother words, it is the probability that either the firstt-test rejects, or the firstt-testaccepts and the second t-test rejects, or the first two t-tests accept and the third
t-test rejects, etc, all under the assumption that the true βi are zero In all, the laglength is overstated if at least one of the M − N t-tests rejects Therefore if we definethe event Cito be the rejection of the itht-test, and define Qj= C1∪ · · · ∪ Cj, thenPr[Qj] = Pr[Qj−1∪ Cj] = Pr[Qj−1] + Pr[Cj] − Pr[Qj−1∩ Cj] [JHG+88, p 724]says, and a proof can be found in [And66] or [And71, pp 34–43], that the teststatistics of the different t-tests are independent of each other Therefore one canwrite Pr[Qj] = Pr[Qj−1] + Pr[Cj] − Pr[Qj−1] Pr[Cj]
Examples: Assuming all t-tests are carried out at the 5% significance level, andtwo tests are insiginficant before the first rejection occurs I.e., the test indicatesthat the true lag length is ≤ M − 2 Assuming that the true lag length is indeed
≤ M −2, the probability of falsely rejecting the hypothesis that the M th and M −1stlags are zero is 0.05 + 0.05 − 0.052= 0.1 − 0.0025 = 0.0975 For three and four teststhe levels are 0.1426 and 0.1855 For 1% significance level and two tests it would be0.01 + 0.01 − 0.012= 0.0200 − 0.0001 = 0.0199 For 1% significance level and threetests it would be 0.0199 + 0.01 − 0.000199 = 0.029701
Trang 4Problem 451 Here are excerpts from SAS outputs, estimating a consumptionfunction The dependent variable is always the same, GCN72, the quarterly personalconsumption expenditure for nondurable goods, in 1972 constant dollars, 1948–1985.The explanatory variable is GYD72, personal income in 1972 constant dollars (deflated
by the price deflator for nondurable goods), lagged 0–8 quarters
PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 65.61238269 0.88771664 73.911 0.0001
PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 65.80966177 0.85890869 76.620 0.0001
• a 3 points Make a sequential test how long you would like to have the laglength
Trang 5PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 65.87672382 0.84399982 78.053 0.0001
PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 65.99593829 0.82873058 79.635 0.0001
•b 5 points What is the probability of type I error of the test you just described?
Answer For this use the fact that the t-statistics are independent There is a 5% probability
of incorrectly rejecting the first t-test and also a 5% probability of incorrectly rejecting the second
Trang 6PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 66.07544717 0.80366736 82.217 0.0001
PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 66.15803761 0.78586731 84.185 0.0001
GYD72L3 1 -0.008377491 0.02330072 -0.360 0.7197GYD72L4 1 -0.000826189 0.02396660 -0.034 0.9725
Trang 7PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 66.22787177 0.77701222 85.234 0.0001
• c 3 points Which common problem of an estimation with lagged explanatoryvariables is apparent from this printout? What would be possible remedies for thisproblem?
Answer The explanatory variables are highly multicollinear, therefore use Almon lags or something similar Another type of problem is: increase of type I errors with increasing number of
Trang 8PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 66.29560292 0.77062598 86.028 0.0001
GYD72L3 1 -0.005511693 0.02366979 -0.233 0.8162GYD72L4 1 -0.002789862 0.02372100 -0.118 0.9065
Trang 9PARAMETER STANDARD T FOR H0:
VARIABLE DF ESTIMATE ERROR PARAMETER=0 PROB > |T|INTERCEP 1 66.36142439 0.77075066 86.100 0.0001
GYD72L2 1 -0.002721499 0.02388376 -0.114 0.9094GYD72L3 1 -0.001837498 0.02379826 -0.077 0.9386
β for which the dth difference can be computed.)
But here it is more convenient to incorporate these restrictions into the regressionequation and in this way end up with a regression with fewer explanatory variables
Trang 10Any β with a polynomial lag structure has the form β = Hα for the (d + 1) × 1vector α, where the columns of H simply are polynomials:
More examples for such H-matrices are in [JHG+88, p 730] Then the specification
y= Xβ +εbecomes y= XHα +ε I.e., one estimates the coefficients of α by anordinary regression again, and even in the presence of polynomial distributed lagsone can use the ordinary F-test, impose other linear constraints, do “GLS” in theusual way, etc (SAS allows for an autoregressive error structure in addition to thelags) The pdlreg procedure in SAS also uses a H whose first column contains azero order polynomial, the second a first order polynomial, etc But it does not usethese exact polynomials shown above but chooses the polynomials in such a way thatthey are orthogonal to each other The elements of α are called X**0 (coefficient ofthe zero order polynomial), X**1, etc
Trang 11In order to determine the degree of the polynomial one might use the sameprocedure on this reparametrized regression which one used before to determine thelag length.
About endpoint restrictions: The polynomial determines the coefficients β0through βM, with the other βj being zero Endpoint restrictions (the SAS op-tions last, first, or both) determine that either the polynomial is such that itsformula also gives βM +1 = 0 or β−1 = 0 or both This may prevent, for instance,the last lagged coefficient from becomeing negative if all the others are positive Butexperience shows that in many cases such endpoint restrictions are not a good idea.Alternative specifications of the lag coefficients: Shiller lag: In 1973, long beforesmoothing splines became popular, Shiller in [Shi73] proposed a joint minimization
of SSE and k times the squared sum of d + 1st differences on lag coefficients He used
a Bayesian approach; Maddala classical method This is the BLUE if one replacesthe exact linear constraint by a random linear constraint
Problem 452 Which problems does one face if one estimates a regression withlags in the explanatory variables? How can these problems be overcome?
Trang 1249.1 Geometric lagEven more popular than polynomial lags are geometric lags Here the model is
yt= α + γxt+ γλxt−1+ γλ2xt−2+ · · · +εt
(49.1.1)
= α + β(1 − λ)xt+ β(1 − λ)λxt−1+ β(1 − λ)λ2xt−2+ · · · +εt
(49.1.2)
Here the second line is written in a somewhat funny way in order to make the
wt = (1 − λ)λt, the weights with which β is distributed over the lags, sum to one.Here it is tempting to do the following Koyck-transformation: lag this equation byone and premultipy by λ to get
λyt−1= λα + β(1 − λ)λxt−1+ β(1 − λ)λ2xt−2+ β(1 − λ)λ3xt−3+ · · · + λεt−1.(49.1.3)
Trang 13dis-49.2 Autoregressive Distributed Lag Models[DM93, p 679] say that (49.0.1) is not a good model because it is not a dynamicmodel, i.e., ytdepends on lagged values ofxtbut not on lagged values of itself As
a consequence, only the current values of the error termεtaffectyt But if the errorterm is thought of as reflecting the combined influence of many variables that areunavoidably omitted from the regression, one might want to have the possibility thatthese omitted variables have a lagged effect on yt just as xt does Therefore it isnatural to allow lagged values ofytto enter the regression along with lagged values
ofxt:
(49.2.1)
yt= α+β1yt−1+β2yt−2+· · ·+βpyt−p+γ0xt+γ1xt−1+· · ·+γNxt−q+εt εt∼IID(0, σ2)This is called an ADL(p, q) model A widely encountered special case is the ADL(1, 1)model
(49.2.2) yt= α + β1yt−1+ γ0xt+ γ1xt−1+εt, εt∼IID(0, σ2)
This has the following special cases: distributed lag model with geometric lags (γ1=0), static model with AR(1) errors (γ1= −β1γ0), partial adjustment model (γ1= 0),model in the first differences (β1= 1), (γ1= −γ0)
This lagged dependent variable is not an obstacle to running OLS, in light of theresults discussed under “random regressors.”
Trang 14We will discuss two models which give rise to such a lag structure: either withthe desired level achieved incompletely as the dependent variable (Partial Adjustmentmodels), or with an adaptively formed expected level as the explanatory variable Inthe first case, OLS on the Koyck transformation is consistent, in the other case it isnot, but alternative methods are available.
Partial Adjustment Here the model is
Trang 15If one were to repeatedly lag this equation, premultipy by λ, and reinsert, one wouldget
(49.2.6) yt= α + β(1 − λ)xt+ β(1 − λ)λxt−1+ β(1 − λ)λ2xt−2+ · · ·
· · · + +(1 − λ)εt+ λ(1 − λ)εt−1+ · · · These are geometrically declining lags, and (49.2.5) is their Koyck transform Itshould be estimated in the form (49.2.5) It has a lagged dependent variable, butcontemporaneously uncorrelated, therefore OLS is consistent and has all desiredasymptotic properties
The next question is about Adaptive Expectations, an example where regression
on the Koyck-transformation leads to inconsistent results
Problem 453 Suppose the simple regression model is modified so thatyt is, up
to a disturbance term, a linear function not of xtbut of what the economic agents attime t consider to be the “permanent” level of x, call it x∗t One example would be
a demand relationship in which the quantity demanded is a function of permanentprice The demand for oil furnaces, for instance, depends on what people expect theprice of heating oil to be in the long run Another example is a consumption functionwith permanent income as explanatory variable Then
(49.2.7) y = α + βx∗+ε, ε ∼IID(0, σ2)
Trang 16Here x∗t is the economic agents’ perceptions of the permanent level of xt Usually the
x∗t are not directly observed In order to link x∗t to the observed actual (as opposed
to permanent) values xt, assume that in every time period t the agents modify theirperception of the permanent level based on their current experience xt as follows:(49.2.8) x∗t− x∗t−1= (1 − λ)(xt− x∗t−1)
I.e., the adjustment which they apply to their perception of the permanent level inperiod t, x∗t − x∗
t−1, depends on by how much last period’s permanent level differsfrom the present period’s actual level; more precisely, it is 1 − λ times this difference.Here 1 − λ represents some number between zero and one, which does not changeover time We are using 1 − λ instead of λ in order to make the formulas below alittle simpler and to have the notation consistent with the partial adjustment model
• a 1 point Show that (49.2.8) is equivalent to
Trang 17Answer Lag ( 49.2.7 ) by 1 and premultiply by λ (the Koyck-transformation) to get
λyt−1= αλ + λβx∗t−1+ λεt−1 (49.2.11)
Subtract this from ( 49.2.7 ) to get
yt− λyt−1= α(1 + λ) + βx∗t− βλx∗t−1+ εt − λεt−1 (49.2.12)
Now use ( 49.2.9 ) in the form x∗t− λx ∗
t−1 = (1 − λ)xt to get ( 49.2.10 ) The new disturbances are
β0
1 − ˆλ.
Answer OLS is inconsistent because yt−1and εt−1, therefore also yt−1and ηtare correlated (It is also true that η and η are correlated, but this is not the reason of the inconsistency)
Trang 18• d 1 point In order to get an alternative estimator, show that repeated cation of (49.2.8) gives
appli-(49.2.15) x∗t = (1 − λ) xt+ λxt−1+ λ2xt−2+ · · · + λt−1x1 + λtx∗0
Answer Rearranging ( 49.2.9 ) one obtains
x∗t = (1 − λ)xt + λx∗t−1(49.2.16)
= (1 − λ)xt + λ 1 − λ)xt−1 + λx∗t−1(49.2.17)
= (1 − λ)(xt + λxt−1) + λx∗t−1(49.2.18)
= (1 − λ) xt + λxt−1 + λ2xt−2 + · · · + λt−1x1+ λtx∗0(49.2.19)
• e 2 points If λ is known, show how α and β can be estimated consistently
by OLS from the following equation, which is gained from inserting (49.2.15) into(49.2.7):
(49.2.20) yt= α + β(1 − λ) xt+ λxt−1+ λ2xt−2+ · · · + λt−1x1 + βx∗
0λt+εt.How many regressors are in this equation? Which are the unknown parameters? De-scribe exactly how you get these parameters from the coefficients of these regressors
Answer Three regressors: intercept, (1 − λ) xt + λxt−1 + λ 2 xt−2 + · · · + λ t−1 x1, and λ t In the last term, λ t is the explanatory variable A regression gives estimates of α, β, and a “prediction”
Trang 19of x∗0 Note that the sum whose coefficient is β has many elements for high t, and few elements for low t Also note that the λ t -term becomes very small, i.e., only the first few observations of this
“variable” count This is why the estimate of x∗0 is not consistent, i.e., increasing the sample size will not get an arbitrarily precise estimate of this value Will the estimate of σ 2 be consistent?
• f 1 point What do you do if λ is not known?
Answer Since λ is usually not known, one can do the above procedure for all values along a grid from 0 to 1 and then pick the value of λ which gives the best SSE Zellner and Geisel did this
in [ ZG70 ], and their regression can be reproduced in R with the commands data(geizel) and then plot((1:99)/100, geizel.regression(geizel$c, geizel$y, 99), xlab="lambda", ylab="sse") They got two local minima for λ, and that local minimum which was smaller corresponded to a
β > 1 which had to be ruled out for economic reasons Their results are described in [ Kme86 , p 534] They were re-estimated with more recent data in [ Gre93 , pp 531–533], where this paradox
Here is R-code to compute the regressors in (49.2.19), and to search for the bestλ
"geizel.regressors" <- function(x, lambda)
{ lngth <- length(x)
lampow <- z <- vector(mode="numeric",length=lngth)
lampow[[1]] <- lambda
z[[1]] <- x[[1]]