1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Class Notes in Statistics and Econometrics Part 33 pptx

59 140 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 59
Dung lượng 481,24 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

where Z contains the regressors arranged in a block-diagonal “supermatrix.”If one knows ΣΣΣ up to a multiplicative factor, and if all regressions cover the sametime period, then one can

Trang 1

dif-explanatory variables and ti observations They may be time series covering ferent but partly overlapping time periods This is why they are called “seeminglyunrelated” regressions The only connection between the regressions is that for thoseobservations which overlap in time the disturbances for different regressions are con-temperaneously correlated, and these correlations are assumed to be constant over

Trang 2

dif-time In tiles, this model is

Trang 3

The covariance matrix of the disturbance term in (65.1.1) has the following “striped”form:

Here Iij is the ti× tj matrix which has zeros everywhere except at the intersections

of rows and columns denoting the same time period

In the special case that all time periods are identical, i.e., all ti = t, one candefine the matrices Y = 

a Kronecker product: V[vecE] = ΣΣΣ ⊗ I, since all Iij in (65.1.2) are t × t identity

Trang 4

matrices If t = 5 and m = 3, the covariance matrix would be

Trang 5

where Z contains the regressors arranged in a block-diagonal “supermatrix.”

If one knows ΣΣΣ up to a multiplicative factor, and if all regressions cover the sametime period, then one can apply (26.0.2) to (65.1.4) to get the following formula forthe GLS estimator and at the same time maximum likelihood estimator:

X>mPm i=1σmiyi

In the seemingly unrelated regression model, OLS on each equation singly is thereforeless efficient than an approach which estimates all the equations simultaneously If

Trang 6

the numbers of observations in the different regressions are unequal, then the formulafor the GLSE is no longer so simple It is given in [JHG+88, (11.2.59) on p 464].

65.2 The Likelihood Function

We know therefore what to do in the hypothetical case that ΣΣΣ is known What

if it is not known? We will derive here the maximum likelihood estimator For theexponent of the likelihood function we need the following mathematical tool:

Problem 532 Show thatPt

s=1a>sΩΩas= tr A>ΩΩA where A =a1 at

a>t

Ωa 1 a t=

"a>1Ω Ωa 1 a>1Ω Ωa 2 · · · a>1Ω Ωa t

a>2Ω Ωa 1 a>2Ω Ωa 2 · · · a>2Ω Ωa t

a>tΩ Ωa 1 a>tΩ Ωa 2 · · · a>tΩ Ωa t

#

Trang 7

To derive the likelihood function, define the matrix function H(B) as follows:H(B) is a t × m matrix the ith column of which is Xiβi, i.e., H(B) as a column-partitioned matrix is H(B) =X1β1 · · · Xmβm In tiles,

∆mThe above notation follows [DM93, 315–318] [Gre97, p 683 top] writes thissame H as the matrix product

where Z has all the different regressors in the different regressions as columns (it is

Z =X1 · · · Xn with duplicate columns deleted), and the ith column of Π haszeros for those regressors which are not in the ith equation, and elements of B forthose regressors which are in the ith equation

Using H, the model is simply, as in (65.0.18),

(65.2.3) Y = H(B) +E, vec(E) ∼N(o, ΣΣΣ ⊗ I)

Trang 8

This is a matrix generalization of (56.0.21).

The likelihood function which we are going to derive now is valid not only forthis particular H but for more general, possibly nonlinear H Define ηs(B) to bethe sth row of H, written as a column vector, i.e., as a row-partitioned matrix wehave H(B) =

Trang 9

2(ys− ηs(B))>Σ−1(ys− ηs(B))

= (2π)−mt/2(det ΣΣΣ)−t/2exp −1

2X

Problem533 Expain exactly the step in the derivation of (65.2.5) in which thetrace enters

Trang 10

Answer Write the quadratic form in the exponent as follows:

(65.2.9)

= tr Σ Σ−1(Y − H(B))>(Y − H(B)) (65.2.10)

The log likelihood function `(Y; B, ΣΣΣ) is therefore

Trang 11

In order to concentrate out ΣΣΣ it is simpler to take the partial derivatives with respect

to ΣΣ−1 than those with respect to ΣΣΣ itself Using the matrix differentiation rules(C.1.24) and (C.1.16) and noting that −t/2 log det ΣΣΣ = t/2 log det ΣΣ−1 one gets:

We know therefore what the maximum likelihood estimator of ΣΣΣ is if B is known:

it is the sample covariance matrix of the residuals And we know what the maximumlikelihood estimator of B is if ΣΣΣ is known: it is given by equation (65.1.6) In such asituation, one good numerical method is to iterate: start with an initial estimate ofΣ

Σ (perhaps from the OLS residuals), get from this an estimate of B, then use this

Trang 12

to get a second estimate of ΣΣΣ, etc., until it converges This iterative scheme is callediterated Zellner or iterated SUR See [Ruu00, p 706], the original article is [Zel62].

65.3 Concentrating out the Covariance Matrix (Incomplete)One can rewrite (65.2.11) using (65.2.13) as a definition:

Trang 13

Here is a derivation of this using tile notation We use the notation ˆE=Y − H(B)for the matrix of residuals, and apply the chain rule to get the derivatives:

Σ−1

Trang 14

This is an array of rank 2, i.e., a matrix, but the other factors are arrays of rank 4:Using (C.1.22) we get

Finally, by (C.1.18),

∂ ˆE

∂Π> = ∂

Z

Trang 15

Putting it all together, using the symmetry of the first term (65.3.5) (which has theeffect that the term with the crossing arms is the same as the straight one), gives

if the X matrices are identical, then it is not necessary to do GLS, because OLS

on each equation separately gives exactly the same result Question 534 gives threedifferent proofs of this:

Problem 534 Given a set of disturbance related regression equations

(65.4.1) yi= Xβi+εi i = 1, m

in which all Xi are equal to X, note that equation (65.4.1) has no subscript at thematrices of explanatory variables

Trang 16

• c 4 points For this part of the Question you will need the following properties

of vec and ⊗: (A⊗B)>= A>⊗B>, (A⊗B)(C ⊗D) = (AC)⊗(BD), (A⊗B)−1=

A−1⊗ B−1, vec(A + B) = vec(A) + vec(B), and finally the important identity((B.5.19)) vec(ABC) = (C>⊗ A) vec(B)

By applying the vec operator to (65.4.2) show that the BLUE of the matrix B isˆ

B = (X>X)−1X>Y, i.e., show that, despite the fact that the dispersion matrix isnot spherical, one simply has to apply OLS to every equation separately

Answer Use ( B.5.19 ) to write ( 65.4.2 ) in vectorized form as

vec( Y ) = (I ⊗ X) vec( B ) + vec( E )

Trang 17

Since V[vec( E )] = Σ Σ Σ ⊗ I, the GLS estimate is

vec( ˆ B ) =

(I ⊗ X)>(Σ Σ Σ ⊗ I)−1(I ⊗ X)

−1

(I ⊗ X)>(Σ Σ Σ ⊗ I)−1vec( Y )

=

(I ⊗ X>)(Σ Σ−1⊗ I)(I ⊗ X)

and applying ( B.5.19 ) again, this is equivalent to

ˆ

B = (X>X)−1X>Y (65.4.3)



• d 3 points [DM93, p 313] appeals to Kruskal’s theorem, which is Question

499, to prove this Supply the details of this proof

Answer Look at the derivation of ( 65.4.3 ) again The Σ Σ−1 in numerator and denominator cancel out since they commute with Z defining Ω Ω Ω = Σ Σ Σ ⊗ I, this “commuting” is the formula

Trang 18

Answer Let B = ˆ B + A; then (Y − XB) > (Y − XB) = (Y − X ˆ B) > (Y − X ˆ B) + A>X>XA because the cross product terms A>X>(Y − X ˆ B ) = O since ˆ B satisfies the normal equation

X>(Y − X ˆ B ) = O.

Instead of maximizing the likelihood function with respect to B and Σ Σ Σ simultaneously, Theil

in [ The71 , p 500–502] only maximizes it with respect to B for the given Σ Σ Σ and finds a solution which is independent of Σ Σ Σ The likelihood function of Y is ( 65.2.5 ) with H(B) = XB, i.e., (65.4.5) f Y (Y ) = (2π)−tm/2(det Σ Σ Σ)−t/2exp −1

2tr ΣΣ

−1 (Y − XB))>(Y − XB)

Trang 19

The trace in the exponent can be split up into tr(Σ Σ−1(Y − X ˆ B )>(Y − X ˆ B ) + tr Σ Σ−1X>X>XA; but this last term is equal to tr XAΣ Σ−1A>X>, which is ≥ 0 Joint estimation has therefore the greatest efficiency gains over OLS if the cor-relations between the errors are high and the correlations between the explanatoryvariables are low.

Problem 535 Are following statements true or false?

• a 1 point In a seemingly unrelated regression framework, joint estimation ofthe whole model is much better than estimation of each equation singly if the errorsare highly correlated True or false?

• b 1 point In a seemingly unrelated regression framework, joint estimation

of the whole model is much better than estimation of each equation singly if theindependent variables in the different regressions are highly correlated True or false?

Trang 20

What is the rationale for this? Since the first equation has fewer variables thanthe second, I know the disturbances better For instance, if the equation wouldnot have any variables, then I would know the disturbances exactly But if I knowthese disturbances, and know that they are correlated with the disturbances of thesecond equation, then I can also say something about the disturbances of the secondequation, and therefore estimate the parameters of the second equation better.

Problem 536 You have two disturbance-related equations

,σ11 σ12

is correct? (a) in order to estimate β1, OLS on the first equation singly is as good

as SUR (b) in order to estimate β2, OLS on the second equation singly is as good

as SUR Which of these two is true?

Answer The first is true One cannot obtain a more efficient estimator of β1by considering the whole system This is [ JGH + 85 , p 469] 

Trang 21

65.5 Unknown Covariance MatrixWhat to do when we don’t know ΣΣΣ? Two main possibilities: One is “feasibleGLS”, which uses the OLS residuals to estimate ΣΣΣ, and then uses the GLS formulawith the estimated elements of ΣΣΣ This is the most obvious method; unfortunately

if the numbers of observations are unequal, then this may no longer give a ative definite matrix The other is the maximum likelihood estimation ofB and ΣΣsimultaneously If one iterates the “feasible GLS” method, i.e., uses the residuals ofthe feasible GLS equation to get new estimates of ΣΣΣ, then does feasible GLS withthe new ΣΣΣ, etc., then one will get the maximum likelihood estimator

nonneg-Problem 537 4 points Explain how to do iterated EGLS (i.e., GLS with anestimated covariance matrix) in a model with first-order autoregression, and in aseemingly unrelated regression model Will you end up with the (normal) maximumlikelihood estimator if you iterate until convergence?

Answer You will only get the Maximum Likelihood estimator in the SUR case, not in the AR1 case, because the determinant term will never come in by iteration, and in the AR1 case, EGLS

is known to underestimate the ρ Of course, iterated EGLS is in both situations asymtotically as good as Maximum Likelihood, but the question was whether it is in small samples already equal to the ML You can have asymptotically equivalent estimates which differ greatly in small samples 

Trang 22

Asymptotically, feasible GLS is as good as Maximum likelihood This is reallynothing new and nothing exciting The two estimators may have quite differentproperties before the asymptotic limit is reached! But there is another, much strongerresult: already for finite sample size, iterated feasible GLS is equal to the maximumlikelihood estimator.

Problem 538 5 points Define “seemingly unrelated equations” and discuss theestimation issues involved

Trang 23

CHAPTER 66

Simultaneous Equations Systems

This was a central part of econometrics in the fifties and sixties

66.1 Examples[JHG+88, 14.1 Introduction] gives examples The first example is clearly notidentified, indeed it has no exogenous variables But the idea of a simultaneousequations system is not dependent on this:

Trang 24

yd, ys, and p are the jointly determined endogenous variables The first equationdescribes the behavior of the consumers, the second the behavior of producers.

Problem 539 [Gre97, p 709 ff] Here is a demand and supply curve with q

quantity, pprice, y income, and ι is the vector of ones All vectors are t-vectors

q= α0ι + α1p+ α2y+εd εd∼ (o, σ2

dI) (demand)(66.1.4)

q= β0ι + β1p+εs εs∼ (o, σ2

sI) (supply)(66.1.5)

εd andεs are independent ofy, but amongst each other they are contemporaneouslycorrelated, with their covariance constant over time:

(66.1.6) cov[εdt,εsu] =

(

0 if t 6= u

σds if t = u

• a 1 point Which variables are exogenous and which are endogenous?

Answer p and q are called jointly dependent or endogenous y is determined outside the

Trang 25

• b 2 points Assuming α1 6= β1, verify that the reduced-form equations for p

andq are as follows:

• c 2 points Show that one will in general not get consistent estimates of thesupply equation parameters if one regressesq on p(with an intercept)

Trang 26

Answer By ( 66.1.7 ) (the reduced form equation for p ), cov[ ε st , pt] = cov[ ε st ,εdt − ε st

β1−α1] =

σsd−σ 2

β1−α1 This is generally 6= 0, therefore inconsistency 

•d 2 points If one estimates the supply function by instrumental variables, using

y as an instrument for pand ι as instrument for itself, write down the formula forthe resulting estimator β˜1 of β1 and show that it is consistent You are allowed touse, without proof, equation (52.0.12)

Answer. β˜

1 =

1 n

P

( yi−¯ y )( qi−¯ q )

1 n

P

( yi−¯ y )( pi−¯ p ) Its plim is cov[cov[yy,,qp]] = β1 α2var[ y ]/(β1−α1)

α 2 var[ y ]/(β 1 −α1) = β 1 These covariances were derived from ( 66.1.7 ) and ( 66.1.8 ) 

• e 2 points Show that the Indirect Least Squares estimator of β1 is identical tothe instrumental variables estimator

Answer For indirect least squares one estimates the two reduced form equations by OLS:

the slope parameter in ( 66.1.7 ), α2

β 1 − α1, estimated by

P( yi− ¯ y )( pi− ¯ p )P

( yi− ¯ y ) 2 ; the slope parameter in ( 66.1.8 ), β1α2

β 1 − α 1

, estimated by

P( yi− ¯ y )( qi− ¯ q )P

( yi− ¯ y ) 2

Trang 27

Divide to get

β 1 estimated by

P( yi− ¯ y )( qi− ¯ q )P

( yi− ¯ y )( pi− ¯ p ) which is the same β˜1as in part d 

• f 1 point Since the error terms in the reduced form equations are poraneously correlated, wouldn’t one get more precise estimates if one estimates thereduced form equations as a seemingly unrelated system, instead of OLS?

contem-Answer Not as long as one does not impose any constraints on the reduced form equations, since all regressors are the same 

• g 2 points We have shown above that the regression ofq on pdoes not give

a consistent estimator of β1 However one does get a consistent estimator of β1 ifone regresses q on the predicted values ofp from the reduced form equation (This

is 2SLS.) Show that this estimator is also the same as above

Answer This gives β˜1 =

Trang 28

• h 1 point So far we have only discussed estimators of the parameters in thesupply function How would you estimate the demand function?

Answer You can’t The supply function can be estimated because it stays put while the demand function shifts around, therefore the observed intersection points lie on the same supply function but different demand functions The demand function itself cannot be estimated, it is

Here is an example from [WW79, 257–266] Take a simple Keynesian expenditure model of a consumption function with investment iexogenous:

income-c= α + βy+ε

(66.1.9)

y=c+i

(66.1.10)

Exogenous means: determined outside the system By definition this always means:

it is independent of all the disturbance terms in the equations (here there is just onedisturbance term) Then the first claim is: yis correlated withε, becauseyandcaredetermined simultaneously oncei and εis given, and both depend oni and ε Let

us do that in more detail and write the reduced form equation for y That means,let us express y in terms of the exogenous variable and the disturbances only Plug

Trang 29

From this one can see

cov(y,ε) = 0 + 0 + 1

1 − βcov(ε,ε) =

σ2

1 − β(66.1.15)

Therefore OLS applied to equation (66.1.9) gives inconsistent results

Problem 540 4 points Show that OLS applied to equation (66.1.9) gives anestimate which is in the plim larger than the true β

Ngày đăng: 04/07/2014, 15:20