Chapter 7 GENERALIZED LINEAR REGRESSION MODEL Our basic model: We will now generalize the specification of the error term... WHITE'S HETEROSCEDASCITY CONSISTENT ESTIMATOR OF VarCov βˆ..
Trang 1Chapter 7
GENERALIZED LINEAR REGRESSION MODEL
Our basic model:
We will now generalize the specification of the error term
E( ε) = 0, E(εε') = Ω =Σ
×n
n
2
This allows for one or both of:
1 Heteroskedasticity
2 Autocorrelation
The model now is:
(1) Y =n X ×kβ +ε
(2) X is non-stochastic and Rank(X)=k
(3) E(ε) =
1
0
×
n
(4) E(εε') =
n
nΣ×
n
nΩ×
= 2
ε σ
Heteroskedasticity case:
=
Σ
2
2 2
2
1
0 0
0 0
0 0
n
σ
σ σ
Trang 2Autocorrelation case:
=
Σ
−
−
−
−
1
1 1
2 1
2 1
1 1
2
n n
n n
ρ ρ
ρ ρ
ρ ρ
σε
) , ( t t i
i =Corr ε ε −
ρ = correlation between errors that are i periods apart
1 = X X′ −1X′Y
) (
ˆ
β = ( ′ )−1 ′( β+ε)
X X X X
ε β
β = + X′X −1X′
) (
ˆ
β ε β
β = + ′ − ′ =
) ( ) ( )
( X X 1X E
E
βˆ
is still an unbiased estimator
2 VarCov(β) = E[(βˆ−β)(βˆ−β)']
= E[(X′X)−1X′ε)((X′X)−1X′ε)']
= [( ) ' ( ) ]
1
− ′ ′
X
) ( ) ' ( ) (X′X − X′E εε X X′X −
) ( ) ( ) (X′X − X′σ Ω X X′X −
1 2
) ( ′ −
so standard formula for σˆβ ˆ no longer holds and σˆβˆ is a biased estimator of true σˆβˆ
) ( ) )
(X′X − X′Ω X X′X −
so the usual OLS output will be misleading, the std error, t-statistics, etc will be based on
1 2
) '
(
ˆε X X −
σ not on the correct formula
3 OLS estimators are no longer best (inefficient)
Trang 3Note: for non-stochastic X, we care about the efficient of βˆ Because we know if n↑ → Var(βˆj) ↓ → plim βˆ = , βˆ
is consistent
4 If X is stochastic:
- OLS estimators are still consistent (when E( ε|X) = 0
- IV estimators are still consistent (when E( ε|X) ≠ 0)
- The usual covariance matrix estimator of VarCov(βˆ) which is 2 1
) ' (
ˆε X X −
σ will be
inconsistent (n →∞) for the true VarCov( βˆ)
We need to know how to deal with these issues This will lead us to some generalized estimator
III WHITE'S HETEROSCEDASCITY CONSISTENT ESTIMATOR OF VarCov( βˆ)
(Or Robust estimator of VarCov( βˆ)
If we knew σ2Ω then the estimator of the VarCov( βˆ) would be:
) ( ) ( ) (X′X − X′σ Ω X X′X −
=
1 2
1
1 ) ( 1 1
X X n X X
n X X n
=
1 1
1 ) 1 1
X X n X X n X X n n
If Σ is unknown, we need a consistent estimator of
1
(Note that the number of unknowns is Σ grows one-for-one with n, but [X )′Σ X] is k ×k matrix it does not grow with n)
n ′Σ
=
∑∑
= = × ×
′
=
i n
j k i
ij X X
n 1 1 1 1
Trang 4In the case of heteroskedasticity
= Σ
2
2 2
2 1
0 0
0 0
0 0
n
σ
σ σ
∑
=
′
=
i
i i
i X X
n 1
2
White (1980) showed that if:
∑
=
′
=
i
i i
i X X e
n 1
2 0
1
then plim(Σ0) = plim(Σ*
)
so we can estimate by OLS and then a consistent estimator of V will be:
1
1 2 1
1 1
1
1
ˆ
−
=
−
n X X e n X X n
n
V
n
i
i i i
0 1
ˆ =n X′X − Σ X′X −
V
Vˆ is consistent estimator for V, so White's estimator for VarCov(βˆ) is:
VarCov(β)= ′ −1 'Σˆ ′ −1 = ˆ
where:
= Σ
2
2 2
2 1
0 0
0 0
0 0
ˆ
n e
e e
(Note Σ0 = 1Σ)
n
Vˆis consistent for V =n(X′X)− 1σ2Ω(X′X)− 1regardless of the (unknown) form of the
heteroskedasticity (only for heteroskedasticity)
Newey - West produced a corresponding consistent estimator of V when there is
autocorrelation and/or heteroskedasticity
Note that White's estimator is only for the case of heteroskedasticity and autocorrelation White's estimator just modifies the covariance matrix estimator, not βˆ The t-statistics,
F-statistics, etc will be modified, but only in a manner that is appropriate asymptotically
So if we have heteroskedasticity or autocorrelation, whether we modify the covariance
matrix estimator or not, the usual t-statistics will be unreliable in finite samples (the
Trang 5white's estimator of VarCov(βˆ) only useful when n is very large, n → ∞ the Vˆ →
VarCov(βˆ)
→ βˆ is still inefficient
→ To obtain efficient estimators, use generalized lest squares - GLS
A good practical solution is to use White's adjustment, then use Wald test, rather than the
F-test for exact linear restrictions Now let's turn to the estimation of , taking account of
the full process for the error term
OLS estimator will be inefficient in finite samples
1 Assume E( εε') = nΣ×n is known, positive definite
→ there exists
1
×
n j
C and
1
×
n j
λ j = 1,2, ,n such that
n
nΣ×
1
×
n j
C =
1
×
n j C
1
×
n j
λ (characteristic vector C, Eigen-value λ)
→ before C'ΣC = Λ where [ n]
n C C1 C2 C
1=
=
Λ
n
λ
λ λ
0 0
0 0
0 0
2 1
= Λ
n
λ
λ λ
0 0
0 0
0 0
2 1 2
/ 1
) ( )' ( 'ΣC =Λ = Λ1/2 Λ1/2
C
→ Λ− Σ Λ− =
'
2 / 1 '
2 / 1
) ( ' ) (
H H
C
) )(
)(
)(
( 1/2 1/2 1/2 1/2
→ HΣ 'H =I
' '− − −
− =
=
Σ H IH H H
→ Σ=H ' H
'
2 / 1
C
H =Λ−
Trang 6Our model: Y = Xβ+ε
Pre-multiply by H:
*
*
ε
β H HX HY
X Y
+
=
→ Y* = X*β+ε*
ε*
will satisfy all classical assumption because:
E(ε*ε*
') = E[H( εε')H'] = HΣH' = I
Since transformed model meets classical assumptions, application of OLS to (Y * , X*) data yields BLUE
→ βˆGLS =(X*'X*)−1X*'Y*
→ X H H X X Y H H
1 1
' ' ) ' '
−
− Σ
=
→ βˆGLS =(X'Σ−1X)−1X'Σ−1Y
Moreover:
) ' ( ) ' ( ' ) ' ( ) ˆ ( = X X − X E X X X −
1
*
*
) ' ( −
=
) ˆ ( GLS
VarCov β (X'Σ X−1 )−1
→ βˆGLS ~N[β,(X'Σ−1X)]
Note that: βˆGLS is BLUE of βˆ→ E(βˆGLS)=β
GLS estimator is just OLS, applied to the transformed model → satisfy all assumptions Gauss - Markov theorem can be applied
→ βˆGLS is BLUE of βˆ
→ βˆOLS must be inefficient in this case.
→ Var(βˆj GLS)≤Var(βˆj OLS)
Trang 7Example:
=
Σ
2
2 2
2 1
0 0
0 0
0 0
n
known
σ
σ σ
= Σ
→ −
2
2 2
2 1 1
/ 1 0
0
0 /
1 0
0 0
/ 1
n
σ
σ σ
H'H = Σ-1
=
→
2
2 2
2 1
/ 1 0
0
0 /
1 0
0 0
/ 1
n
H
σ
σ σ
2
2 2 2
2 1 1 2 1
2
2 2
2 1
/
/ /
/ 1 0
0
0 /
1 0
0 0
/ 1
Y Y
Y Y
Y
Y Y HY
n n n n
=
=
=
σ
σ σ
σ
σ σ
=
=
n nk n
n n
k k
X X
X X
X X
HX X
σ σ
σ
σ σ
σ
σ σ
σ
/ /
/ 1
/ /
/ 1
/ /
/ 1
2
2 2 2
22 2
1 1 1
12 1
*
Transformed model has each observations divided by σi:
+
+ +
+
=
i i
i
ik k i
i
i i
Y
σ
ε σ
β σ
β σ
β
1
Apply OLS to this transformed equation → "Weighted Least Squares":
Let:
βˆ= GSL estimator.
εˆ=Y*−X*βˆGLS
=n−k
ε ε
σˆ ˆ' ˆ
Then to test: H0: R = q (F Wald test)
) , ( 2
1 1
*
*
~ ˆ
] ˆ [ ] ' ) ' ( [ ]' ˆ [
k n r r
k
q R R X X R q R
−
−
−
−
−
=
σ
β β
if H0 is true
Trang 8
and
) ( ˆ ' ˆ
] ' ˆ ˆ ˆ
k n
r F
c c r
k n
−
−
′
=
− ε ε
ε ε ε ε
where: εˆc =Y*−X*βˆc GLS
ˆ ˆ ( ' ) '[ ( ' ) '] ( ˆ )
1 1 1 1
1
q R
R X X R R X
GLS GLS
β
is the "constrained" GLS estimator of
2 Feasible GLS estimation:
In practice, of course, Σ is usually unknown, and so βˆ cannot be constructed, it is not
feasible The obvious solution is to estimate Σ, using some Σˆ then construct:
Y X X X GLS
1 1
ˆ ' (
ˆ = Σ− − Σ−
β
A practical issue: Σ is an (n×n), it has n(n+1)/2 distinct parameters, allowing for symmetry But we only have "n" observations → need to constraint Σ Typically Σ = Σ(θ) where θ contain a small number of parameters
Ex: Heteroskedasticity var(εi) = σ2
(θ1+θ2Z i)
+
+
+
=
Σ
n
n
z
z z
2 1
2 1
1 2 1
0 0
0 0
0 0
θ θ
θ θ
θ θ
just 2 parameters to be estimated to form Σˆ
Serial correlation:
) ( 1
1 1
2
2 1
1
ρ ρ
ρ ρ
ρ ρ
Σ
=
=
Σ
−
−
−
n
n n
n
only one parameter to be estimated
• If Σˆ is consistent for Σ then will be asymptotically efficient for
• Of course to apply we want to know the form of Σ → construct tests