1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Class Notes in Statistics and Econometrics Part 31 ppt

29 122 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 29
Dung lượng 378,55 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

In tiles, this model istEstimation under the assumption ΣΣΣ is known: To estimate ¯β one can use the het-eroskedastic model with error variances τ2xt>ΣΣxt, call the resulting estimate ˆβ

Trang 2

In tiles, this model is

tEstimation under the assumption ΣΣΣ is known: To estimate ¯β one can use the het-eroskedastic model with error variances τ2xt>ΣΣxt, call the resulting estimate ˆβ¯ Theformula for the best linear unbiased predictor of βt itself can be derived (heuristi-cally) as follows: Assume for a moment that ¯β is known: then the model can bewritten as yt− xt>β = x¯ t>vt Then we can use the formula for the Best LinearPredictor, equation (??), applied to the situation

(61.0.33) xt>vt

vt



∼0o

, τ2xt>ΣΣxt xt>Σ



where xt>vt is observed, its value is yt− xt>β, but¯ vt is not Note that here wepredict a whole vector on the basis of one linear combination of its elements only

Trang 3

z>tα, where ztis the vector containing the unique elements of the symmetric matrixxtxt> with those elements not located on the diagonal multiplied by the factor 2,since they occur twice in the matrix, and α contains the corresponding unique ele-ments of τ2ΣΣ (but no factors 2 here) For instance, if there are three variables, then

τ2xt>ΣΣxt= x2

t1τ11+ x2

t2τ22+ x2

t3τ33+ 2xt1xt2τ12+ 2xt1xt3τ13+ 2xt2xt3τ23, where τijare the elements of τ2ΣΣ Therefore ztconsists of x2

t1, x2 t2, x2 t3, 2xt1xt2, 2xt1xt3, 2xt2xt3,and α> = [τ11, τ22, τ33, τ12, τ13, τ23] Then construct the matrix Z which has as itstth row the vector z>; it follows V[ε] = diag(γ) where γ = Zα

Trang 4

Using this notation and defining, as usual, M = I − X(X>X)−1X>, andwriting mt for the tth column vector of M , and furthermore writing Q for thematrix whose elements are the squares of the elements of M , and writing δtfor thevector that has 1 in the tth place and 0 elsewhere, one can derive:

E[ˆ2t] = E[(δ>tεˆ)2](61.0.37)

= E[εˆ>δtδ>t εˆ](61.0.38)

= E[ε>M δtδ>tMε](61.0.39)

= E[tr M δtδ>tMεεεε>](61.0.40)

= tr M δtδ>tM diag(γ) = tr mtm>diag(γ)(61.0.41)

= m>t diag(γ)mt= mt1γ1m1t+ · · · + mtnγnmnt(61.0.42)

= q>tγ = q>tZα(61.0.43)

E[εˆ2] = QZα,(61.0.44)

where α is as above This allows one to get an estimate of α by regressing thevector [ˆ21, ,ˆ2n]> on QZ, and then to use Zα to get an estimate of the variances

τ2x>ΣΣx Unfortunately, the estimated covariance matrix one gets in this way maynot be nonnegative definite

Trang 5

[Gre97, p 669–674] brings this model in the following form:

t

Problem 513 Letyi be the ith column of Y The random coefficients model

as discussed in [Gre97, p 669–674] specifies yi = Xiβi+εi with εi ∼ (o, σ2

iI)and εi uncorrelated with εj for i 6= j Furthermore also βi is random, write it as

βi=β+vi, withvi∼ (o, τ2Γ) with a positive definite Γ, and againvi uncorrelated

with vj for i 6= j Furthermore, allvi are uncorrelated with allεj.

• a 4 points In this model the disturbance term is really wi=εi+ Xvi, which

has covariance matrix V[wi] = σ2

iI + τ2XiΓX>i As a preliminary calculation forthe next part of the question show that

(61.0.46) X>i (V[wi])−1= 1

τ2Γ−1(X>i Xi+ κ2iΓ−1)−1X>i

Trang 6

where κ2i = σi2/τ2 You are allowed to use, without proof, formula (A.8.13), whichreads for inverses, not generalized inverses:

σ 2 i

(I − X i (X>i X i + κ2iΓ−1)−1X>i ) Premultiply this by X>i and add and subtract the same term:

(61.0.49) X>i(V[ wi ])−1= 1

σ 2 i

X>i − 1

σ 2 i

(X>iX i + κ2iΓ−1− κ2iΓ−1)(X>iX i + κ2iΓ−1)−1X>i(61.0.50)

Trang 7

This is the product of three matrices each of which has an inverse:

• c 2 points Show that from (61.0.46) also follows that The GLS of each column

of Y separately is the OLS βˆi= (X>i Xi)−1X>i yi

• d 2 points Show that V[βˆi] = σ2i(X>i Xi)−1+ τ2Γ

Answer Since V[ yi] = σ 2

i I+τ 2 X i ΓX>i, it follows V[ βiˆ] = σ 2

i (X>iX i ) −1 +τ 2 (X>iX i ) −1 X>i X i ΓX>iX i (X>i X i ) −1 =

•e 3 points [Gre97, p 670] describes a procedure how to estimate the covariance

matrices if they are unknown Explain this procedure clearly in your own words, and

spell out the conditions under which it is a consistent estimate

Answer If Γ is unknown, it is possible to get it from the sample covariance matrix of

the group-specific OLS estimates, as long as the σ 2

i and the X i are such that asymptotically

1

n−1

P

( βiˆ − β¯)( βiˆ −¯β )>is the same as n1P( βiˆ − β )( βiˆ − β )> which again is asymptotically the

same asP1nV[ βiˆ] We also need that asymptoticallyPn1s2i(X>i X i )−1=Pn1σi2(X>iX i )−1 If

these substitutions can be made, then plimn−11 P( βiˆ − β¯)( βiˆ −¯β )>−P1

n σ 2

i (X>iX i )−1= τ 2 Γ, sinceP1V[ βiˆ] = τ 2 Γ +P1σ 2 (X>X )−1 This is [ Gre97 , (15-29) on p 670] 

Trang 8

Problem514 5 points Describe in words how the “Random Coefficient Model”differs from an ordinary regression model, how it can be estimated, and describesituations in which it may be appropriate Use your own words instead of excerptingthe notes, don’t give unnecessary detail but give an overview which will allow one todecide whether this is a good model for a given situation.

Answer If Σ Σ Σ is known, estimation proceeds in two steps: first estimate ¯ β by a heteroskedastic GLS model, and then predict, or better retrodict, the actual value taken by the βt by the usual linear prediction formulas But the most important aspect of the model is that it is possible to estimate Σ Σ Σ if it is not known! This is possible because each vt imposes a different but known pattern

of heteroskedasticity on the error terms, it so to say leaves its footprints, and if one has enough observations, it is possible to reconstruct the covariance matrix from these footprints 

Problem 515 4 points The specification is

(61.0.54) yt= α +βtxt+ γx2t

(no separate disturbance term), where α and γ are constants, andβtis the tth element

of a random vector β∼ (ιµ, τ2I) Explain how you would estimate α, γ, µ, and τ2

Answer Set v = β − ιµ; it is v ∼ (o, τ 2 I) and one gets

(61.0.55) y = α + µx + γx2+ vt x

Trang 9

This is regression with a heteroskedastic disturbance term Therefore one has to specify weights=1/x 2

Trang 11

CHAPTER 62

Multivariate Regression

62.1 Multivariate Econometric Models: A Classification

If the dependent variableY is a matrix, then there are three basic models:The simplest model is the multivariate regression model, in which all columns of

Y have the same explanatory variables but different regression coefficients

pThe most common application of these kinds of models are Vector AutoregressiveTime Series models If one adds the requirements that all coefficient vectors satisfy

Trang 12

the same kind of linear constraint, one gets a model which is sometimes called agrowth curve models These models will be discussed in the remainder of this chapter.

In a second basic model, the explanatory variables are different, but the cient vector is the same In tiles:

In the third basic model, both explanatory variables and coefficient vectors aredifferent

+

pThese models are known under the name “seemingly unrelated” or “disturbancerelated” regression models They will be discussed in chapter65

Trang 13

After this, chapter 66will discuss “Simultaneous Equations Systems,” in whichthe dependent variable in one equation may be the explanatory variable in anotherequation.

62.2 Multivariate Regression with Equal Regressors

The multivariate regression model with equal regressors reads

where we make the following assumptions: X is nonrandom and observed and Y

is random and observed, B is nonrandom and not observed E is random and notobserved, but we know E[E] = O and the rows ofEare independent drawings fromthe same (o, ΣΣΣ) distribution with an unknown positive definite ΣΣΣ

This model has applications for Vector Autoregressive Time Series and variate Analysis of Variance (MANOVA)

Multi-The usual estimator of B in this model can be introduced by three properties: aleast squares property, a BLUE property, and the maximum likelihood property (un-der the assumption of normality) In the univariate case, the least squares property

is a scalar minimization, while the BLUE property involves matrix minimization Inthe present case, the least squares property becomes a matrix minimization property,the BLUE property involves arrays of rank 4, and the maximum likelihood property

Trang 14

is scalar maximization In the univariate case, the scalar parameter σ2 could beestimated alongside the linear estimator of β, and now the whole covariance matrixΣ

Σ can

62.2.1 Least Squares Property The least squares principle can be appliedhere in the following form: given a matrix of observations Y, estimate B by thatvalue ˆB for which

B = ˆB minimizes (Y − XB)>(Y − XB)(62.2.2)

in the matrix sense, i.e., (Y − X ˆB)>(Y − X ˆB) is by a nnd matrix smaller thanany other (Y − XB)>(Y − XB) And an unbiased estimator of ΣΣΣ is ˆΣ= n−k1 (Y −

Trang 15

Proof: This is Problem 232 Due to the normal equations, the cross productdisappears:

(Y − XB)>(Y − XB) = (Y − X ˆB+ X ˆB− XB)>(Y − X ˆB+ X ˆB− XB)

= (Y − X ˆB)>(Y − X ˆB) + (X ˆB− XB)>(X ˆB− XB)(62.2.4)

Note that the normal equation (62.2.3) simply reduces to the OLS normal equationfor each column βi of B, with the corresponding column yi of Y as dependentvariable In other words, for the estimation of βi, only the ith columnyi is used

62.2.2 BLUE To show that ˆBis the BLUE, write the equationY = XB +E

in vectorized form, using (B.5.19), as

vec(Y) = (I ⊗ X) vec(B) + vec(E)

(62.2.5)

Trang 16

Since V[vec(E)] = ΣΣΣ ⊗ I, the GLS estimate is, according to (26.0.2),

vec( ˆB) =(I ⊗ X)>(ΣΣΣ ⊗ I)−1(I ⊗ X)

−1(I ⊗ X)>(ΣΣΣ ⊗ I)−1vec(Y)(62.2.6)

=(I ⊗ X>)(ΣΣ−1⊗ I)(I ⊗ X)−1(I ⊗ X>)(ΣΣ−1⊗ I) vec(Y)(62.2.7)

=Σ−1⊗ X>X

−1(ΣΣ−1⊗ X>) vec(Y)(62.2.8)

=I ⊗ (X>X)−1X>vec(Y)(62.2.9)

and applying (B.5.19) again, this is equivalent to

ˆ

B= (X>X)−1X>Y.(62.2.10)

From this vectorization one can also derive the dispersion matrix V[vec( ˆB)] = ΣΣΣ ⊗(X>X)−1 In other words, C[ˆβi, ˆβj] = σij(X>X)−1, which can be estimated byˆ

σij(X>X)−1

Trang 17

62.2.3 Maximum Likelihood To derive the likelihood function, write themodel in the row-partitioned form

(2π)−r/2(det ΣΣΣ)−1/2exp −1

2(y

>

i − x>

i B)ΣΣ−1(yi− B>xi)(62.2.12)

= (2π)−nr/2(det ΣΣΣ)−n/2exp −1

2

X(y>i − x>

i B)ΣΣ−1(yi− B>xi).(62.2.13)

Trang 18

The quadratic form in the exponent can be rewritten as follows:

=

nXi=1

tr ΣΣ−1(yi− B>xi)(y>i − x>i B)

= tr ΣΣ−1

nXi=1(y>i − x>i B)>(y>i − x>i B)

Trang 19

The first step is obvious: using (62.2.4), the quadratic form in the exponentbecomes:

(Y − XB)>(Y − XB) = tr ΣΣ−1(Y − X ˆB)>(Y − X ˆB)

+ tr(X ˆB − XB)ΣΣ−1(X ˆB − XB)>.The argument which minimizes this is B = ˆB, regardless of the value of ΣΣΣ Thereforethe concentrated likelihood function becomes, using the notation ˆE= (Y − X ˆB):(62.2.14) (2π)−nr/2(det ΣΣΣ)−n/2exp −1

2tr ΣΣ

−1Eˆ>Eˆ

In order to find the value of ΣΣΣ which maximizes this we will use (A.8.21) in Theorem

A.8.3 in the Mathematical Appendix From (A.8.21) follows

(62.2.15) (det A)n/2e−n2 tr A

≤ e−rn/2,

We want to apply (62.2.15) SetA= n1( ˆE>Eˆ1/2

Σ−1( ˆE>Eˆ)1/2; then exp −n2trA =exp −1

2tr ΣΣ−1Eˆ>Eˆ, and detA= det(1

nEˆ>Eˆ/ det ΣΣΣ; therefore, using (62.2.15),(2π)−nr/2(det ΣΣΣ)−n/2exp −1

2tr ΣΣ−1Eˆ>Eˆ ≤ 2πe−nr/2det(1

nEˆ>Eˆ−n/2,with equality holding whenA= I, i.e., for the value ˆΣ= 1Eˆ>Eˆ.

Trang 20

(62.2.14) is the concentrated likelihood function even if one has prior knowledgeabout ΣΣΣ; in this case, the maximization is more difficult.

62.2.4 Distribution of the Least Squares Estimators ˆBis normally tributed, with mean B and dispersion matrix V[vec( ˆB)] = ΣΣΣ ⊗ (X>X)−1 From theunivariate result that vec( ˆE) and vec( ˆB) are uncorrelated, or from the univariateproof which goes through for the multivariate situation, follows in the Normal casethat they are independent Therefore ˆBis also independent of ˆΣ Since

E>Eˆ ∼ 1

n − kW(n − k, ΣΣΣ)and it is independent of ˆB

Let us look at the simplest example, in which X = ι Then B is a row vector,write it as B = µ>, and the model reads

Trang 21

in other words, each row y of Y is an independent drawing from the same µ, ΣΣdistribution, and we want to estimate µ and ΣΣΣ, and also the correlation coefficients.

An elementary and detailed discussion of this model is given in chapter63

62.2.5 Testing We will first look at tests of hypotheses of the form RB = U This is a quite specialized hypothesis, meaning that each column of B is subject tothe same linear constraint, although the values which these linear combinations takemay differ from column to column Remember in the univariate case we introducedseveral testing principles, the Wald test, the likelihood ratio test, and the Lagrangemultiplier test, and showed that in the linear model they are equivalent Theseprinciples can be directly transferred to the multivariate case The Wald test con-sists in computing the unconstrained estimator ˆB, and assessing, in terms of the(estimated, i.e., “studentized”) Mahalanobis distance, how far R ˆBis away from U The Likelihood ratio test (applied to the least squares objective function) consists

in running both the constrained and the unconstrained multivarate regression, andthen determining how far the achieved values of the GLS objective function (whichare matrices) are apart from each other

Since the univariatet-test and its multivariate generalization, called Hotelling’s

T, is usually only applied in hypotheses where R is a row vector, we will limit ourdiscussion to this case as well The simplest example of such a test would be totest whether µ = µ in the above “simplest example” model with iid observations

Trang 22

The OLS estimate of µ is ¯y, which one gets by taking the column means of Y.The dispersion matrix of this estimate is ΣΣΣ/n The Mahalanobis distance of thisestimate from µ0is therefore n(¯y− µ)>Σ−1(¯y− µ), and replacing ΣΣΣ by its unbiased

S =W/(n − 1), one gets the following test statistic: T2

n−1= n(¯y− µ)>S−1(¯y− µ).Here use the following definition: ifz∼N(o, ΣΣΣ) is a r-vector, andW ∼W(r, ΣΣΣ)independent of z with the same ΣΣΣ, so thatS =W/r is an unbiased estimate of ΣΣΣ,then

(62.2.19) T2r=z>S−1z

is called a Hotelling T2

r,r with r and r degrees of freedom

One sees easily that the distribution of T2r,r is independent of ΣΣΣ It can bewritten in the form

Trang 23

hypothesis if at least one of theset-tests rejects This principle of constructing testsfor multivariate hypotheses from those of simple hypotheses is called the “union-intersection principle” in multivariate statistics.

Since the usual F-statistic in univariate regression can also be considered theestimate of a Mahalanobis distance, it might be worth while to point out the dif-ference The difference is that in the case of the F-statistic, the dispersion matrixwas known up to a factor σ2, and only this factor had to be estimated In the case

of the Hotelling T2, the whole dispersion matrix is unknown and all of it must beestimated (but one has also multivariate rather than univariate observations) Just

as the distribution of the F statistic does not depend on the true value of σ2, thedistribution of Hotelling’sT2 does not depend on ΣΣΣ Indeed, its distribution can beexpressed in terms of the F-distribution This is a deep result which we will notprove here:

If ΣΣΣ is a r × r nonsingular matrix, then the distribution of Hotelling’sT2r,rwith rand r degrees of freedom can be expressed in terms of theF-distribution as follows:

r

T2r,r

r ∼Fr,r−r+1This apparatus with Hotelling’s T2 has been developed only for a very specifickind of hypothesis, namely, a hypothesis of the form r>B = u> Now let us turn

Ngày đăng: 04/07/2014, 15:20

TỪ KHÓA LIÊN QUAN