1. Trang chủ
  2. » Thể loại khác

Chapter 01_Classical Linear Regression

15 47 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 15
Dung lượng 153,56 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Chapter 01_Classical Linear Regression tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về tất cả cá...

Trang 1

Chapter 1:

CLASSICAL LINEAR REGRESSION

I MODEL:

Population model: Y = f(X1,X2, ,X k)+ε

- f may be any kind (linear, non-linear, parametric, non-parametric, )

Sample information:

i ik i

Y, 2, , , }=1

i ki k i

i

Y =β1+β2 2 +β3 3 + +β +ε

ki

i k X

Y

=

with other factors held constant

EX: C i =β1+β2Y ii

Dependent variable

Explanatory variable

or Regressor

Disturbance (error)

Trang 2

C P M Y

C i

i

=

β ⇒ require 0 ≤ ≤ 1

Denotes:

Y =

n Y

Y Y

 2 1

; X =

nk n

n

k k

X X

X

X X

X

X X

X

1

1

1

3 2

2 23

22

1 13

12

n

β

β β

 2 1

and ε =

n

ε

ε ε

 2 1

⇒ We have:

) 1 ( ) 1 ( ( ) 1 ( × = × × + ×

n k k n

II ASSUMPTIONS OF THE CLASSICAL REGRESSION MODEL:

Models are simplications of reality

We'll make a set of simplifying assumptions for the model

The assumptions relate to:

Assumption 1: Linearity The model is linear in the parameters

Y = X + ε

Assumption 2: Full rank X ij s are not random variables - or X ijs are random

variables that are uncorrelated with ε

There is no exact linear dependencies among the columns of X

This assumption will be necessary for estimation of the parameters (need no EXACT)

) ( 1

k n

X

Rank(X) = n is also OK

Trang 3

Assumption 3: Exogeneity of the independent variables

0 ] ,

, , [ i X j1X j2 X j3 X jk =

This means that the independent variables will not carry useful information for prediction of εi

0 ) ( ,

1 0

]

Assumption 4:



=

=

=

j i Cov

n i Var

j i

i

0 ) , (

, 1 )

ε ε

σ ε

For any random vector Z =

n z

z z

 2 1

, we can express its variance -

covariance matrix as:

=

) (Z E Z E Z Z E Z VarCov

=

×

×

 

 

m

m m

m m m

z E z z

E z z

E z z E z

z E z

z E z E

1 2 2

1 1

1

2 2

1 1

)) ( (

)) ( (

)) ( ( ) (

) (

) (

jth diagonal element is var(z j ) = σjj = σj 2

ijth element (i ≠ j) is cov(z i ,z j ) = σij

=

mn m

m

m ij z

E z E

σ σ

σ

σ σ

σ

σ σ

2 1

2 22

21

12 2 1

1 ( )) ] [(

So we have "covariance matrix" for the vector ε

[(

) (

0 0

εε ε

ε ε ε

Then the assumption (4) is equivalent:

Trang 4

=

=

2

2 2

2

0 0

0 0

0 0

) ' (

σ

σ

σ σ

εε

I E



=

=

=

) (

0 ) , (

) (hom

, 1 )

ation autocorrel no

j i Cov

ity oscedastic n

i Var

j i

i

ε ε

σ ε

Assumption 5: Data generating process for the regressors (Non-stochastic of X)

+ Xijs are not random variables

Notes: This assumption is different with assumption 3

0 ]

Assumption 6: Normality of Errors

] , 0 [

ε

+ Normality is not necessary to obtain many results in the regression model

+ It will be possible to relax this assumption and retain most of the statistic results

SUMMARY: The classical linear regression model is:

Y = X + ε

] , 0 [

ε

Rank(X) = k

X is non-stochastic

III LEAST SQUARES ESTIMATION:

(Ordinary Least Squares Estimation - OLS)

Our first task is to estimate the parameters of the model:

Y = X + ε with ε ~N[0,σ2I]

Trang 5

Many possible procedures for doing this The choice should be based on "sampling properties" of estimates

Let's consider one possible estimation strategy: Least Squares

Denote βˆ is estimator of :

n e

e e

 2 1

is estimated residuals (of ε) or e i is estimated of εi

For the ith observation:

)

( population unobserved

i i

Y = ′ β+ε =



) (

' ˆ

sample observed i

X β +

=

=

i i e e

1 2

e e e

n

i

i = ′

=1

2

= (YXβ)′(YXβ) = YY−βˆ′XYYXβˆ+βˆ′XXβˆ

= YY−2βˆ′XY+βˆ′XXβˆ (βˆ′XY =YXβˆ)

ˆ

β β β

β

X X Y X Y

Y Min ′ − ′ ′ + ′ ′

The necessary condition for a minimum:

0 ˆ

]

[ =

β

e

e

ˆ

] ˆ ˆ

2 [

=

′ +

β

β β

βX Y X X Y

Y

Y

X ′

βˆ =[ ˆ ˆ ˆ ]

2

1 β βk

nk k

k k

n

X X

X X

X X

X X

1

1 1 1

3 2 1

2 32

22 12

n Y

Y Y

 2 1

Trang 6

=[βˆ1 βˆ2  βˆk]

i ik

i i i

Y X

Y X Y

 2

Take the derivative w.r.t eachβˆ :

β

β

ˆ

] ˆ

[

X Y

Y X

Y X Y

i ik

i i

i

=

 2

X

X ′ =

nk k

k k

n

X X

X X

X X

X X

1

1 1 1

3 2 1

2 32

22 12

nk n

n

k k

X X

X

X X

X

X X

X

1

1

1

3 2

2 23

22

1 13

12

=

∑ ∑

2 3

2

2 3

2 2

2 2

3 2

ik i

ik i

ik ik

ik i

i i i

i

ik i

i

X X

X X

X X

X X X

X X

X

X X

X n

Symmetric Matrix of sums of squares and cross products:

β

βˆ′X ′ Xˆ: quadratic form

∑∑

=

i k

j

j i ij X X X

X

1 1

ˆ ˆ ) ( ˆ

β

Take the derivatives w.r.t eachβˆ : i



=

 →



=

=

n j X

X X

X

X X i

j

X X X

X i

j

j ij i

j ji

j i ij

i ij i

i ij

, 1 ˆ

) ( 2 ˆ

ˆ ) (

ˆ ˆ ) ( :

ˆ ) ( 2 ˆ

] ˆ ) [(

:

ˆ /

2

β β

β

β β

β β

β

β

Trang 7

X X j n

X X

X X

j ij i

j ji

j i ij

, 1 ˆ

) ( 2 ˆ

ˆ ) (

ˆ ˆ )

 →



β β

β

β

β

β

) ( 2 ˆ

] ) ( ˆ [

X X X

ˆ

] [

=

β

e e

⇔ −2XY +2(XX)βˆ =0 (call "Normal equations")

→ (XX)βˆ= X ′ → Y = XX −1XY

) ( ˆ

β

IV ALGEBRAIC PROPERTIES OF LEAST SQUARES:

1 "Orthogonality condition":

⇔ −2XY +2(XX)βˆ =0 (Normal equations)

⇔ ′( − )=0

e

X Y

X ′e=0

nk k

k k

n

X X

X X

X X

X X

1

1 1 1

3 2 1

2 32

22 12

n e

e e

 2 1

=

0

0 0

e X

e n

i

i ij

n

i

i

, 1 0

0

1



=

=

=

=

2 Deviation from mean model (The fitted regression passes through X , Y )

n i e X X

X

Y i =βˆ1+βˆ2 2i +βˆ3 3i + +βˆk ki+ i =1,

Sum overall n observations and divide by n

Trang 8

0 1 3

3 2 2

=

+ +

+ +

+

i i k

X X

Then:

n i e X X X

X X

X Y

Y i − =βˆ1+βˆ2( 2i− 2)+βˆ3( 3i− 3)+ +βˆk( kik)+ i =1,

In model in deviation form, the intercept is put aside and can be found later

3 The mean of the fitted values Yˆ is equal to the mean of the actual Y i i value in the sample:

Y

X

i

+

ˆ

ˆ

β

=

n

i i Y

1

=

n

i i Y

1

ˆ +

0 1

=

n

i i

e i=1,n

Note that: These results used the fact that the regression model include an intercept term

V PARTITIONED REGRESSION: FRISH-WAUGH THEOREM:

1 Note: Fundamental idempotent matrix (M):

βˆ

X Y

e= − ′

) ( )

X

X X X X Y

I

n n n

n

] ) (

×

×

=

)]

( ) ( )

X X X X X X

)]

) ( )

[(Xβ +ε −Xβ −X XX 1X′ε

ε

] ) ( [

) (

1

 

 

n

M

X X X X I

×

So residuals vector e has two alternative representations:

Trang 9

=

= ε

M e

MY e

M is the "residual maker" in the regression of Y on X

M is symmetric and idempotent, that is:

=

=

M M M

M M

=

X X X X I

X X X X

I =IX(X'X)−1X'=M

Note: (AB)′=BA

M

M = − ′ − 1 ′ − ′ − 1 ′

) ( )

(

= IX XX −1X

)

)

I

) ( )

M X X X X

I− ′ ′=

) ( Also we have:

k n n

n

n k k k k n k n

X X X X X X X X X X X X X I MX

×

×

×

×

×

×

2 Partitioned Regression:

Suppose that our matrix of regressors is partitioned into two blocks:

 [ 1 2] ( 1 2 )

2 1

k k k X

X X

k n k n k n

= +

=

×

×

×

×

×

×

×

2 2 1

1k n k k k

n n

X X

Y

n

+

=

1 2 1

ˆ ] [

β β

The normal equations:

(XX)βˆ = XY

⇔ [X1 X2]′[X1 X2]βˆ =[X1 X2]′Y

Trang 10

Y

X

X X

X X

X

=

2 1

2

1 2 1 2

1

ˆ

ˆ ] [

β β

=

Y X

Y X X

X X X

X X X X

2 1

2

1 2 2 1 2

2 1 1 1

ˆ

ˆ

β β



=

′ +

=

′ +

) ( ˆ

) (

ˆ ) (

) ( ˆ

) ( ˆ ) (

2 2 2 2 1 1 2

1 2 2 1 1 1 1

b Y

X X

X X

X

a Y

X X

X X

X

β β

β β

From (a) → (X1′X1)βˆ1 = X1′(−X2βˆ2 +Y)

or βˆ1 =(X1′X1)−1X1′(−X2βˆ2 +Y) (c)

Put (c) into (b):

(X2′X1)(X1′X1)−1X1′(−X2βˆ2 +Y)+(X2′X2)βˆ2 = X2′Y

⇔ −X2′X1(X1′X1)−1X1′X2βˆ2 +(X2′X2)βˆ2 = X2′YX2′X1(X1′X1)−1X1′Y

X I X X X X X X I X X X X Y

n n M n

n M

) (

1 1 1 1 1 2

2 2 )

(

1 1 1 1 1

×

×

We have: (X2′M1X2)βˆ2 =X2′M1Y → βˆ2 =(X2′M1X2)−1X2′M1Y

Because

=

=

M M M

M M

*

*

1 1 2 2 2 1 1

(

Y X

Y M M X X

M M

*

* 2 2

* 2

*

2' )ˆ '

2 1

* 2

* 2

2 ( ' ) '

β

Where:



=

=

=

Y M Y

M X X X M X

1

*

1 2

* 2 2 1

*

Interpretation:

1

×

n Y on 

1

1

k n X

×

Trang 11

• * 1 2

2 M X

1

1

k n

X

×

X2) on X1 and get the matrix of the residuals

1 1 1 1 1

1 Y Yˆ Y X [(X X ) X Y] M Y Y

e

n

=

=

=

×

1

1

k n

X

×

):

2 2 1 1 2

1

k n k k k n k

n

X E

×

×

×

×

+

 =  − =

×

×

×

) ˆ (

2 2 2

2 2

k n k n k

n

X X E

  ˆ )

(

2 1 1 2

1 2

k k k n k n

X X

×

×

×

=[IX1(X1′X1)−1X1′]X2 =M1X2 = X*2

1

1

×

n

e , and fit a regreesion: now we regress e1 on E:

  u E e

k k n n

+

=

×

×

1

2 2

~

β

then we will have:

2

2 ˆ

~ β

β =

We get the same results as if we just regress the whole model

This results is called the "Frisch - Waugh" theorem

X1 = Ability (test scores)

ε β

= X1 1 X2 2 Y

Trang 12

Y* = residuals from regression of Y on X1 (= variation in wages when controlling for ability)

X* = residuals from regression of X2 on X1

Then regress Y* on X* → get 2 : Y = X* 2 +u

2

Example: De-trending, de-seasonaling data:

1 1 2 2 1

1 1 1

2 2

1× × × ×

×

×

+ +

=

n k k n k n n

X t

=

n

t

2 1

either include "t" in model or "de-trend" X2 & Y variables by regressing on "t" & taking residuals

Note: Including trend in regression is an effective way de-trending of data

VI GOODNESS OF FIT:

One way of measuring the "quality of the fitted regression line" is to measure the extent to which the sample variable for the Y variable is explain by the model

=

n

i

Y

n 1

2

) (

1

=

n

i

Y

1

2

) (

e Y e X

Y = βˆ+ = ˆ+

Y X X X X X

Y = = ′ −1 ′

) ( ˆ

Now consider the following matrix:

Trang 13

 



=

1 1 1

n I M

n n C

where

=

×

1

1 1 1

1

~ 

n

Note that:

Y

M0 =

n n

n

n n

n

n n

n

1 1

1

1 1

1

1 1

1

1 0

0

0 1

0

0 0

1

n Y

Y Y

 2 1

=

n n n

n

n n n

n n n

Y Y

Y

Y Y

Y

Y Y

Y

1 1

1

1 1

1 2

1 1

1 1

=

Y Y

Y Y

Y Y

n

 2 1

We have:

• M0

~

1 =

~

0

YM M Y =YM Y =

Y M

0 0

)' (

0

0 ' ( ∑

=

n

i

Y

1

2

) (

M0Y =M0Xβˆ+M0e=M0Yˆ+M0e

Recall that:

~

0

=

=

X (∑e i =0→M0e=e)

~ 0

0' = ′ =0

M X e M X e

Y ′ Y M0 = (Xβˆ+ e)′(M0Xβˆ+M0e)

= (βˆ'X′+e')(M0Xβˆ+M0e)

Trang 14

= βˆ'XM0Xβˆ+βˆ'XM0e+eM0Xβˆ+eM0e

X M

X ′ +eM0e

So:









SSE

n

i i

SSR

n

i i

SST

n

i

=

=

=

+

=

1 2 1

2 1

2

) ˆ ( )

(

(Yˆ−Y ; Yˆ =Xβˆ so βˆ' 0 βˆ

X M

X ′ = Yˆ′ Y M0ˆ = ∑

=

n

i

Y

1

2

) ˆ ( )

SST: Total sum of squares

SSR: Regression sum of squares SSE: Error sum of squares Coefficient of Determination:

SST

SSE SST

SSR

R2 = =1−

(only if intercept included in models)

SST

SSR R

1 1

2 = − ≤

SST

SSE R

0 ≤ R2≤ 1

What happens if we add any regressor(s) to the model?

) 1 (

1

1β +ε

= X Y

= + +

Y 1β1 2β2 Xβ+u (2)

(A) Applying OLS to (2)

u u'ˆ min

) ˆ ˆ

2

1 β β

(B) Applying OLS to (1)

Trang 15

e e'

min

) ( β 1

in (A) must be ≤ that in (B) so uˆ'u=e'e

→ Adding any regression(s) to the model cannot increase (typically decrease) the sum

very interesting measure of the quality of regression

For this reason, we often use the "Adjusted" R2-Adjusted for "degree of freedom":





=

Y M Y

e e

=

) 1 /(

) /(

1 0

2

n Y M Y

k n e e R

Note: ee=YM0Y and rank(M) = (n-k)

=

Y

M

Y' 0 ∑

=

n

i

Y

1

2

) ˆ ( d of freedom = n-1

2

R may ↑ or ↓ when variables are added It may even be negative

Note that: If the model does not include an Intercept, then the equation: SST = SSR + SSE

does not hold And we no longer have 0 ≤ R2≤ 1 We must also be careful in comparing R2

across different models For example:

(2) logC i =0.2+0.7logY i +u R2 = 0.7

In (1) R2 relates to sample variation of the variable C In (2), R2 relates to sample variation of

the variable log(C) Reading Home: Greene, chapter 3&4

Ngày đăng: 09/12/2017, 08:36

TỪ KHÓA LIÊN QUAN