METHOD OF MOMENTS Method of moments involves replacing the population moments by the sample moment.. Method of moments involves replacing the population moments by the sample moment... β
Trang 1Chapter 13 GENERALIZED METHOD OF MOMENTS (GMM)
I ORTHOGONALITY CONDITION:
The classical model:
(n Y1) (n k X)(kβ1) (nε1)
× = × × + ×
(1) E(ε X)= 0
(2) E(εε/ X)=σ2I
(3) X and ε are generated independently
If (E εi X i)=0 then (for equation i:
(1 ) ( 1)
k k
Y X β ε
× ×
( i i)
i
E E Xε X
0
i
E E ε X X
=
[ ]
( 1)
i
k
×
→ Orthogonality condition
Cov X ε =E X −E X ε −E ε
( / / )
E X E X ε
0
E X ε E X E ε
/
( 1) ( 1) (1 1)
0
i i
k k
E X ε
×
× ×
So for the classical model:
/
( 1) ( 1) (1 1)
0
i i
k k
E X ε
×
× ×
Trang 2II METHOD OF MOMENTS
Method of moments involves replacing the population moments by the sample moment Example 1: For the classical model:
( )
/ /
( 1)
0
i i
i i i
k population moment
Population E X
E X Y X
ε
β
×
=
Sample moment of this:
( ) 1 ( 1) 1 1
1
( 1)
n
i i i
k n n k
i
k
=
×
∑
Moment function: A function that depends on observable random variables and unknown parameters and that has zero expectation in the population when evaluated at the true parameter
( )
m β - moment function – can be linear or non-linear
(k1)
β
× is a vector of unknown parameters
Method of moments involves replacing the population moments by the sample moment
Example 1: For the classical linear regression model:
The moment function: X i/εi =m( )β
The population function: ( / )
0
i i
E X ε =
[ ] /( )
( 1)
k
E m β E X Y X β
×
The sample moment of E X( iε is i)
( ) 1 ( 1) 1 1
1
( 1)
n
i i i
k n n k
i
k
=
×
∑
Replacing sample moments for population moments:
Trang 3( )
/
0
X Y X
n − β =
/ / ˆ 0
X Y−X Xβ =
/ ˆ /
X Xβ =X Y
/ 1 /
MOM X X X Y OLS
Example 2: If
(1 )
i k
X
×
Cov X ε ≠
(1 )
L
×
(1 )
i k
X
×
Z i satisfies:
(1 )
L
E ε Z
×
( 1)
L
E Z ε
×
( 1)
L
Cov Zε
×
=
We have:
/
( 1) ( 1) (1 1)
L L
E Z ε
×
× ×
=
/ ( 1)
0
i i i L
population moment
E Z Y X β
×
i i i
E Z Y −X β is /( )
1
i i i i
Z Y X
=
( 1) ( )
( 1) ( 1)
k
L n
n L
Z Y X
×
×
×
×
−
Replacing sample moments for population moments:
/
( 1)
( 1)
0
k
n
Z Y X
×
×
a) If L < k (*) has no solution
b) If L = k exact identified
Trang 4/
( ) ( 1)
n k k
Z Y X β
× ×
/ ˆ /
Z Xβ =Z Y
/ 1 /
MOM Z X Z Y IV
c) If L > k → k parameters, L equations → There is NO unique solution because of
"too many" equations → GMM
III GENERALIZED METHOD OF MOMENTS:
1 The general case:
Denote: m/( )βˆ is the sample moment of the population moment E m[ ( )]β =0
Method of moments:
( 1) ( 1)
ˆ
k L
m β
×
×
/
= (01)
L×
1 2
ˆ ˆ ˆ
ˆ
k
β β β β
=
a) If L < k: no solution for β ˆ
b) If L = k: unique solution for β as ˆ
( 1) ( 1)
ˆ
k L
m β
×
×
/
= (01)
L×
c) If L > k, how do we estimate β ?
Hausen (1982) suggested that instead of solving the equations for β ˆ
( 1)
ˆ
k
m β
×
(01)
L×
We solve the minimum problem:
/
(1 ) ( 1)
L L
×
Where W is any positive definite matrix that may depend on the data
Note: If
(n n X)
× is a positive definite matrix then for any vector a=(a a1 2a n)
/ ( ) (1 ) (1 ) 0
n n
a X a
×
Trang 5β that minimize (**) is called generalized method of moments (GMM) estimator
of β , denote ˆˆ βGMM
Hausen (1982) showed that ˆβGMMfrom (**) is a consistent estimator of β
( 1)
ˆ
Problem here: What is the best W to use ?
- Hansen (1982) indicated:
1 ( )
ˆ
L L
W VarCov n m β −
- With this W, ˆβGMMis efficient → has the smallest variance:
1 /
1 ˆ
VarCov G WG
n
where:
/ ( )
ˆ ( ) lim
ˆ
L k
m
β
×
∂
=
∂
1
GMM VarCov G WG G W VarCov n m WG G WG
n
β = − β −
2 The linear model:
( )
/ 1
1 n
i i i i
Z Y X
=
The sample moment becomes:
min
L L
×
First- order condition:
( )
/
( 1) (1 )
0
L k
× ×
k equations:
( )/ / ( / / )
Z X W Z Y Z Xβ
ˆ
Z X W Z Y Z X W Z Xβ
Trang 6( )/ 1 ( )/
GMM Z X W Z X Z X W Z Y
For the linear regression model:
( 1) ( ) ( 1) ( ) ( 1)
ˆ
GMM
X Z W Z X X Z W Z Y
β
−
IV GMM AND OTHER ESTIMATORS IN THE LINEAR MODELS:
1 Notice that if L = k (exact identified) then /
X Z is a square matrix (k×k) so that:
X Z W Z X − Z X − W− X Z −
ˆ
GMM Z X Z Y
β = −
which is the IV estimator → The IV estimator is a special case of the GMM estimator
GMM OLS X X X Y
3 If L > k over-identification, the choice of matrix W is important W is called weight matrix
ˆ
GMM
β is consistent for any W positive definite
The choice of W will affect the variance of ˆβGMM → We could choose W such that ˆ
( GMM)
Var β is the smallest → efficient estimator
4 If W =(Z Z/ )−1 then:
ˆ
estimator is also a special case of the GMM estimator
5 From Hausen (1982), the optimal W in the case of linear model is:
1
1
i i i i
−
−
1
i i
W VarCov Z
n ε
−
Trang 76 The next problem is to estimate W:
a) If no heteroskedasticity and no autocorrelation:
1
1 ˆ
n
i i i
n
ε
σ
−
=
n
ε
ˆ
β = − − −
We get the 2SLS estimator → there is no different between ˆ2
SLS
β and ˆβGMMin the case of no heteroskedasticity and no autocorrelation
b) If there is heteroskedasticity in the error terms (but no autocorrelation) in unknown forms
( 2 / ) 1
1
1
i i i i
n
−
=
= ∑ (White's estimator) → efficient gain over ˆ2
SLS
β c) If there are both heteroskedasticity and autocorrelation in unknown forms, use:
ˆ
W = Newey - West estimator)
1 /
1 ˆ
n
−
1
i i i j i i j i i j i j i
W Z Z e w e e Z Z Z Z
n
−
1
1
j
j w
L
SLS
β
Notes:
/
( i i)
E ε ε
EX:
2 1
f X u
σ
/
( i i)
E ε ε
could perform GLS to get the efficient estimators ˆβGLS(using instrumental variables), GMM is not necessary here
Usually the form of autocorrelation and heteroskedasticity are not known → GMM estimator is an important improvement in these cases
Trang 8V GMM ESTIMATION PROCEDURE:
Step 1:
Use W = I or W =(Z Z/ )−1 to obtain a consistent estimator of β Then estimate ˆW by
White's procedure (heteroskedasticity case) or Newey-West procedure (general case)
Step 2:
Use the estimated ˆW to compute the GMM estimator:
GMM X Z W Z X X Z W Z Y
Note: We always need to construct ˆW in the first step
VI THE ADVANTAGES OF GMM ESTIMATOR:
1 If we don't know the form/patent of heteroskedasticity or autocorrelation → correct by Robust standard error (White or Newey-West) → stuck with inefficient estimators
2 2-SLS estimator is consistent but still inefficient if error terms are correlated/ heteroskedasticity
3 GMM brings efficient estimator in the case of unknown heteroskedasticity and autocorrelation forms/ correct standard errors also
Potential drawbacks:
1 Definition of the weight matrix W for the first is arbitrary, different choices will lead
to different point estimates in the second step
One possible remedy is to not stop after iterations but continue to update the weight matrix W until convergence has been achieved This estimator can be obtained by using the "cue" (continuously updated estimators) option within ivreg 2
2 Inference problem because the optimal weight matrix is estimated → can lead to sometime downward bias in the estimated standard errors for GMM estimator
Trang 9VII VARIANCE OF THE GMM ESTIMATOR FOR LINEAR MODELS:
1
GMM
VarCov X Z W Z X
n
1
1
ˆ
n
X Z Z X
Q W Q n
−
0
n→∞
GMM
n β −β → ( / / )
1
ˆ
N Q WQ −
GMM VarCov β = n X Z W Z X −
In practice ˆW is noise, since the residual in the first step are affected by sampling error The
upshot is that step 2 standard errors tend to be too good Methods now exist that enable you
to correct for sampling error in the first step (Windmeijer procedure)
VIII SPECIFICATION TESTS WITH GMM:
ˆR
GMM
β : Restricted estimator (under constraints)
ˆ
GMM
β : Unrestricted estimator (no constraints)
J Z ε R W Z ε R Z εW Z ε n χ
q: number of restrictions