However we can establish asymptotic distribution: ]' [... • This lead to the same estimator for β as in OLS and the MLE approach is a systematic way to deal with complex nonlinear model
Trang 1Chapter 1 REVIEW OF LEAST SQUARES &
LIKELIHOOD METHODS
I LEAST QUARES METHODS:
1 Model:
- We have N observations (individuals, forms, …) drawn randomly from a large
population i = 1, 2, …, N
- On observation i: Yi and K-dimensional column vector of explanatory variables
) , , , ( i1 i2 ik
X = and assumeX = for all i = 1, 2, …, N ie 1
- We are interested in explaining the distribution of Y in terms of the explanatory i
variables X using linear model: i
)) , , ( (
Y =β +ε β = β β
In matrix notation:
ε
β+
Y
i ik k i
i
Y =β1+β2 2+β3 3+ +β +ε
Assumption 1: n
1 i i
i,Y} X { = are independent and identically distributed Assumption 2: ε ׀i X ~ N ( 0, σ i 2 )
Assumption 3: εi ⊥X i (∑
=
=
n
X
0
Assumption 4: E[ε ׀i X ] = 0 i
Assumption 5: E[ε *i X ] = 0 i
Trang 2The Ordinary Least Squares (OLS) estimator for β solves:
2
=
−
n
X
Y β
β
This leads to: ˆ ' ( ' ) 1( ' )
1
1 1
Y X X X Y
X X
n
−
=
−
=
=
β
The exact distribution of the OLS estimation under the normality assumption is:
] ) ' (
, [
~
β
• Without the normality of the ε it is difficult to derive the exact distribution of βˆ However we can establish asymptotic distribution:
) ]' [ , 0 ( ) ˆ ( − →N 2 E XX − 1
• We do not know σ , we can consistently estimate it as 2
2 1
1
1
=
−
−
−
X Y
k
σ
• In practice, whether we have exact normality for the error terms or not, we will use the following distribution for βˆ:
) , (
β ≈
where: V σ= 2.(E(X'X))−1
1
ˆ
=
∑
X X
V σ
• If we are interested in a specific coefficient:
) ˆ , ˆ (
ˆk N βk V kk
β ≈
ij
Vˆ is the (i,j) element of the matrix Vˆ
• Confidence intervals for β would be (95%) k
kk k
kk
k 1.96 Vˆ ; ˆ 1.96 Vˆ
β
• Test a hypothesis whether β =k α
Trang 3) 1, 0 (
~
V
t
kk
β −
=
2 Robust Variances:
If we don’t have the homoscedasticity assumption then:
-1 2
]) (E[XX' ,
0 ( ) ˆ
n − →
We can estimate the heteroskedasticity – consistent variance as: (White’s estimator)
1 1
1 2 1
1
'
1 ' ˆ
1 ' 1
=
=
−
=
N
N
N X X N
X X N
II MAXIMUM LIKELIHOOD ESTIMATION:
1 Introduction:
• Linear regression model:
i i
Y = ' β +ε
with ε׀X i ~ N ( 0, σ 2 )
1
ˆ argmin (n i ' )i
i
Y X
β
=
= ∑ −
=
=
−
=
n
n
Y X X
X
1
1 1
' ˆ
β
• Maximum likelihood estimator:
2
,
ˆ ˆ ( , MLE) arg max ( , )L
β σ
β σ = β σ
Where:
∑
∑
=
=
−
−
−
=
=
n
n
X Y n
X Y L
1
2 2
2 1
2 2
2 2
) ' ( 2
1 ) 2 ln(
2
) ' ( 2
1 ) 2 ln(
2
1 )
, (
β σ
πσ
β σ
πσ σ
β
Note: X ~N(µ,σ2)→ density function of X:
2 2
( ) 2 2
1 ( ) 2
X
µ σ
πσ
−
−
=
Trang 4• This lead to the same estimator for β as in OLS and the MLE approach is a systematic way to deal with complex nonlinear model
∑
=
−
n 1
2
σ
2 Likelihood function:
• Suppose we have independent and identically distributed random variables Z , ,1 Z n with common density f( θZ i, ) The likelihood function given a sample Z1,Z2, ,Z n is
1
( ) n ( , )i
i
f Z
θ θ
=
=∏
• The log – likelihood function:
i 1
( ) ln ( ) n ln ( , )i
=
= =∑
• Building a likelihood function is first step to job search theory model
• An example of maximum likelihood function:
An unemployed individual is assumed to receive job offers
Arriving according to rate λ such that the expected number of job offers arriving
in a short interval of length dt is λdt
Each offer consist of some wage rate w, draw independently of previous wages,
with continuous distribution function F w (w)
If the offer is better than the reservation wage w, that is with probability
) (
1−F w , the offer is accepted
The reservation wage is set to maximize utility
Suppose that the arrival rate is constant over time
Optimal reservation wage is also constant over time
The probability of receiving an acceptable offer in a short time dt is dtθ with
)) ( 1
θ
Trang 5 The constant acceptance rate θ implies that the distribution for the unemployment duration is exponential with mean 1 and density function: θ
) (
) (y θe yθ
→
y: unemployment duration - random variable
Mean & variance θ1,θ12
S(y)=1−F(y)=e( − θ ): survivor function
<
+
<
<
=
=
) Pr(
lim ) (
) ( ) (
dy y Y
y y
S
y f y h
dy
(The rate at which a job is offered and accepted)
Likelihood function:
a) If we observe the exact unemployment duration y i
n i
( ) f(y , ) n ( i )S( i )
i
→ =∏ =∏
b) We observe a number of people all becoming unemployed at the same point in time, but we only observe whether they exited unemployment before a fixed point in time, say c:
( ) θ F(cθ ) (1di F c( θ )) (1-S(cθ )) (di S cθ )
=∏ − =∏
1
=
di denotes that individual i left unemployment before c and di=0to denote this
individual was still unemployed at time c
c) If we observe the exact exit or failure time if it occurs before c, but only an indicator
of exit occurs after c
n
i
( ) f(y , ) (di ) n ( ) S( ) ( )
i i i
=∏ =∏
d) Denote c is the specific censoring time of individual i Letting t denote the minimum i
of the exit time y and censoring time i c , i t = i min(y i,c i)
n
( ) θ =∏f(y , ) ( θ di S c θ ) =∏n f t( θ ) S(t θ ) =∏n h t( θ ) (di S t θ )
Trang 63 Properties of MLE
1
ˆ arg max n ln ( , )
MLE i
i
f Z
θ
→Θ =
= ∑
a Consistency:
For all ε> 0
ˆ lim Pr( MLE ) 0
n θ θ ε
→∞ − > =
b Asymptotic normality:
1 2
'
MLE L i
θ θ
−
4 Computation of the maximum likelihood estimator:
Newton – Raphson method:
• Approximate the objective function Q(θ)=−L(θ) around some starting value θ by a 0 quadrate function and find the exact minimum for that quadrate approximation Call this
1
θ
• Redo the quadrate