1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Class Notes in Statistics and Econometrics Part 12 ppsx

46 271 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 46
Dung lượng 462,62 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

If θ is nonrandom and in addition θˆis unbiased, then the MSE -matrixcoincides with the covariance matrix.. This sampling error has the following formula: We will use the MSE -matrix as

Trang 1

is nonrandom, and “predictors” if it is random For scalar random variables we willuse the mean squared error as a criterion for closeness Its definition is MSE[ˆφ;φ](read it: mean squared error of ˆφas an estimator or predictor, whatever the casemay be, ofφ):

(23.0.1) MSE[ˆφ;φ] = E[(ˆφ−φ)2]

Trang 2

For our purposes, therefore, the estimator (or predictor) ˆφof the unknown parameter(or unobserved random variable)φis no worse than the alternative ˜φif MSE[ˆφ;φ] ≤MSE[˜φ;φ] This is a criterion which can be applied before any observations arecollected and actual estimations are made; it is an “initial” criterion regarding theexpected average performance in a series of future trials (even though, in economics,usually only one trial is made).

23.1 Comparison of Two Vector Estimators

If one wants to compare two vector estimators, sayφˆandφ˜, it is often impossible

to say which of two estimators is better It may be the case that ˆφ1 is better than

˜

φ1 (in terms of MSE or some other criterion), but ˆφ2 is worse than ˜φ2 And even ifevery componentφi is estimated better by ˆφithan by ˜φi, certain linear combinations

t>φof the components ofφmay be estimated better by t>φ˜than by t>φˆ

Problem 294 2 points Construct an example of two vector estimators φˆand

˜

φof the same random vectorφ=φ1 φ2>, so that MSE[ˆφi;φi] < MSE[˜φi;φi] for

i = 1, 2 but MSE[ˆφ1+ ˆφ2;φ1+φ2] > MSE[˜φ1+ ˜φ2;φ1+φ2] Hint: it is easiest to use

an example in which all random variables are constants Another hint: the geometricanalog would be to find two vectors in a plane ˆφ and ˜φ In each component (i.e.,

Trang 3

projection on the axes), ˆφ is closer to the origin than ˜φ But in the projection onthe diagonal, ˜φ is closer to the origin than ˆφ.

Answer In the simplest counterexample, all variables involved are constants: φ = 0

as an estimator ofφis defined as

(23.1.1) MSE[φˆ;φ] = E[(φˆ−φ)(φˆ−φ)>]

Trang 4

Problem 295 2 points Letθ be a vector of possibly random parameters, and θˆ

an estimator of θ Show that

(23.1.2) MSE[θˆ;θ] = V[θˆ−θ] + (E[θˆ−θ])(E[θˆ−θ])>

Don’t assume the scalar result but make a proof that is good for vectors and scalars

Answer For any random vector x follows

E[ xx>] = E( x − E[ x ] + E[ x ])( x − E[ x ] + E[ x >

= E( x − E[ x ])( x − E[ x >− E( x − E[ x ]) E[ x ]>− EE[ x x − E[ x >+ EE[ x ] E[ x ]>

= V[ x ] − O − O + E[ x ] E[ x ]>.

If θ is nonrandom, formula (23.1.2) simplifies slightly, since in this case V[θˆ−θ] =V[θˆ] In this case, the MSE matrix is the covariance matrix plus the squared biasmatrix If θ is nonrandom and in addition θˆis unbiased, then the MSE -matrixcoincides with the covariance matrix

Theorem 23.1.1 Assumeφˆandφ˜are two estimators of the parameterφ(which

is allowed to be random itself ) Then conditions (23.1.3), (23.1.4), and (23.1.5) are

Trang 6

To complete the proof, (23.1.5) has (23.1.3) as a special case if one sets Θ =

Problem 296 Show that if Θ and ΣΣΣ are symmetric and nonnegative definite,then tr(ΘΣΣΣ) ≥ 0 You are allowed to use that tr(AB) = tr(BA), that the trace of anonnegative definite matrix is ≥ 0, and Problem129 (which is trivial)

Answer Write Θ = RR>; then tr(ΘΣ Σ Σ) = tr(RR>Σ Σ) = tr(R>Σ ΣR) ≥ 0 

Problem 297 Consider two very simple-minded estimators of the unknownnonrandom parameter vector φ =hφ1

φ 2

i Neither of these estimators depends on anyobservations, they are constants The first estimator is φˆ= [11], and the second is

Trang 7

Answer. φˆhas smaller trace of the MSE-matrix.

ˆ

φ − φ =



1 1

Note that both MSE-matrices are singular, i.e., both estimators allow an error-free look at certain

•b 1 point Give two vectors g = [g1

g 2] and h =h1

h2 satisfying MSE[g>φˆ; g>φ] <MSE[g>φ˜; g>φ] and MSE[h>φˆ; h>φ] > MSE[h>φ˜; h>φ] (g and h are not unique;there are many possibilities)

Answer With g = −11



and h = 1

for instance we get g>φˆ− g > φ = 0, g>φ˜−

g > φ = 4, h>φˆ; h>φ = 2, h>φ˜; h>φ = 0, therefore MSE[g >φˆ; g> φ] = 0, MSE[g >φ˜; g> φ] = 16,

Trang 8

MSE[h>φˆ; h>φ] = 4, MSE[h>φ˜; h>φ] = 0 An alternative way to compute this is e.g.

Trang 9

CHAPTER 24

Sampling Properties of the Least Squares

Estimator

The estimatorβˆwas derived from a geometric argument, and everything which

we showed so far are what [DM93, p 3] calls its numerical as opposed to its statisticalproperties But βˆhas also nice statistical or sampling properties We are assumingright now the specification given in (18.1.3), in which X is an arbitrary matrix of fullcolumn rank, and we are not assuming that the errors must be Normally distributed.The assumption that X is nonrandom means that repeated samples are taken withthe same X-matrix This is often true for experimental data, but not in econometrics.The sampling properties which we are really interested in are those where also the X-matrix is random; we will derive those later For this later derivation, the properties

Trang 10

with fixed X-matrix, which we are going to discuss presently, will be needed as anintermediate step The assumption of fixed X is therefore a preliminary technicalassumption, to be dropped later.

In order to know how good the estimatorβˆis, one needs the statistical properties

of its “sampling error”βˆ− β This sampling error has the following formula:

We will use the MSE -matrix as a criterion for how good an estimator of a vector

of unobserved parameters is Chapter 23gave some reasons why this is a sensiblecriterion (compare [DM93, Chapter 5.5])

Trang 11

24.1 The Gauss Markov TheoremReturning to the least squares estimator βˆ, one obtains, using (24.0.7), thatMSE[βˆ; β] = E[(βˆ− β)(βˆ− β)>] = (X>X)−1X>E[εεεε>]X(X>X)−1=

= σ2(X>X)−1.(24.1.1)

This is a very simple formula Its most interesting aspect is that this MSE matrixdoes not depend on the value of the true β In particular this means that it isbounded with respect to β, which is important for someone who wants to be assured

of a certain accuracy even in the worst possible situation

Problem 298 2 points Compute the MSE-matrix MSE[εˆ;ε] = E[(εˆ−ε)(εˆ−

ε)>] of the residuals as predictors of the disturbances

Answer Write ε ˆ − ε = M ε − ε = (M − I) ε = −X(X>X)−1X>ε ; therefore MSE[ ε ˆ ; ε ] = E[X(X>X)−1X>ε εε ε>X(X>X)−1X = σ 2 X(X>X)−1X> Alternatively, start with ˆ ε − ε = y −

ˆ − ε = Xβ− ˆ = X(β− βˆ) This allows to use MSE[ ˆ ε ; ε ] = X MSE[ βˆ; β]X>= σ 2 X(X>X)−1X>.



Problem299 2 points Letvbe a random vector that is a linear transformation

of y, i.e., v = Ty for some constant matrix T Furthermore v satisfies E[v] = o.Show that from this follows v= Tεˆ (In other words, no other transformation ofy

with zero expected value is more “comprehensive” than ε However there are many

Trang 12

other transformation of ywith zero expected value which are as “comprehensive” as

ε)

Answer E[ v ] = T Xβ must be o whatever the value of β Therefore T X = O, from which follows T M = T Since ˆ ε = M y , this gives immediately v = T ˆ ε (This is the statistical implication

of the mathematical fact that M is a deficiency matrix of X.) 

Problem 300 2 points Show thatβˆandεˆare uncorrelated, i.e., cov[βˆi,ˆj] =

0 for all i, j Defining the covariance matrix C[βˆ,εˆ] as that matrix whose (i, j)element is cov[βˆi,ˆj], this can also be written as C[βˆ,εˆ] = O Hint: The covariancematrix satisfies the rules C[Ay, Bz] = A C[y,z]B> and C[y,y] = V[y] (Other rulesfor the covariance matrix, which will not be needed here, are C[z,y] = (C[y,z])>,C[x+y,z] = C[x,z] + C[y,z], C[x,y+z] = C[x,y] + C[x,z], and C[y, c] = O if c is

Trang 13

(here consisting of one row only) that contains all the covariances

(24.1.2) C[¯y,βˆ] ≡cov[¯y,βˆ1] cov[¯y,βˆ2] · · · cov[¯y,βˆk]

has the following form: C[¯y,βˆ] = σn21 0 · · · 0 where n is the number of servations Hint: That the regression has an intercept term as first column of theX-matrix means that Xe(1) = ι, where e(1) is the unit vector having 1 in the firstplace and zeros elsewhere, and ι is the vector which has ones everywhere

ob-Answer Write both ¯ y and βˆin terms of y , i.e., ¯ y = 1

n ι > y and βˆ= (X > X) −1 X > y Therefore (24.1.3)

Unbi-˜

φ= a>yof φ = t>β has a bigger MSE than t>βˆ

Proof Write the alternative linear estimator ˜φ= a>yin the form

˜

φ= t>(X>X)−1X>+ c>

y

(24.1.4)

Trang 14

then the sampling error is

MSE[˜φ; φ] = E[(˜φ− φ)2] = E[ t>(X>X)−1X>+ c>

εεεε> X(X>X)−1t + c] =

= σ2 t>(X>X)−1X>+ c>

X(X>X)−1t + c = σ2t>(X>X)−1t + σ2c>c,Here we needed again c>X = o> Clearly, this is minimized if c = o, in which case

Answer (Compare [ DM93 , p 159].) Any other linear estimator β˜of β can be written

as β˜ = (X>X)−1X>+ Cy Its expected value is E[ β˜] = (X>X)−1X>Xβ + CXβ For

˜

β to be unbiased, regardless of the value of β, C must satisfy CX = O But then it follows

Trang 15

MSE[β˜ ; β] = V[ β˜] = σ 2 (X>X)−1X>+ C X(X>X)−1+ C>= σ 2 (X>X)−1+ σ 2 CC>, i.e.,

it exceeds the MSE-matrix of βˆby a nonnegative definite matrix 

24.2 Digression about Minimax EstimatorsTheorem24.1.1 is a somewhat puzzling property of the least squares estimator,since there is no reason in the world to restrict one’s search for good estimators

to unbiased estimators An alternative and more enlightening characterization of

Theorem 24.2.2 βˆ is a linear minimax estimator of the parameter vector β

in the following sense: for every nonrandom coefficient vector t, t>βˆis the linear

Trang 16

minimax estimator of the scalar φ = t>β with respect to the MSE I.e., for every

other linear estimator ˜φ= a>yof φ one can find a value β = β0 for which ˜φhas a

larger MSE than the largest possible MSE of t>βˆ

Proof: as in the proof of Theorem 24.1.1, write the alternative linear estimator

Now there are two cases: if c>X = o>, then MSE[˜φ; φ] = σ2t>(X>X)−1t + σ2c>c

This does not depend on β and if c 6= o then this MSE is larger than that for c = o

If c>X 6= o>, then MSE[˜φ; φ] is unbounded, i.e., for any finite number ω one one

can always find a β0 for which MSE[˜φ; φ] > ω Since MSE[ˆφ; φ] is bounded, a β0

can be found that satisfies (24.2.1)

Trang 17

If we characterize the BLUE as a minimax estimator, we are using a consistentand unified principle It is based on the concept of the MSE alone, not on a mix-ture between the concepts of unbiasedness and the MSE This explains why themathematical theory of the least squares estimator is so rich.

On the other hand, a minimax strategy is not a good estimation strategy Nature

is not the adversary of the researcher; it does not maliciously choose β in such a waythat the researcher will be misled This explains why the least squares principle,despite the beauty of its mathematical theory, does not give terribly good estimators(in fact, they are inadmissible, see the Section about the Stein rule below)

ˆ

β is therefore simultaneously the solution to two very different minimizationproblems We will refer to it as the OLS estimate if we refer to its property ofminimizing the sum of squared errors, and as the BLUE estimator if we think of it

as the best linear unbiased estimator

Note that even if σ2 were known, one could not get a better linear unbiasedestimator of β

24.3 Miscellaneous Properties of the BLUEProblem 303

Trang 18

• a 1 point Instead of (18.2.22) one sometimes sees the formula

P(xt− ¯x)2 for the slope parameter in the simple regression Show that these formulas are math-ematically equivalent

Answer Equivalence of (24.3.1) and (18.2.22) follows fromP(x t − ¯ x) = 0 and therefore also

¯P(x t − ¯ x) = 0 Alternative proof, using matrix notation and the matrix D defined in Problem 189: (18.2.22) is x>D>Dy

Trang 19

• c 2 points Show that cov[βˆ, ¯y] = 0.

Answer This is a special case of problem 301, but it can be easily shown here separately: cov[ βˆ, ¯ y ] = cov

hP

s (x s − ¯ x) ysP

t (x t − ¯ x) 2 ,1

n

Xj

yji

=

nPt(x t − ¯ x) 2

Xs

Trang 20

• a 1 point Is β˜ an unbiased estimator of β? (Proof is required.)

Answer First derive a nice expression for β˜− β:

˜

β − β =

Px iyi

Px 2 i

Px 2

i β

Px 2 i

=

P

x iεiP

x 2 i

= 0 since E εi = 0.



• b 2 points Derive the variance of β˜ (Show your work.)

Trang 22

Problem 305 We still assume (24.3.5) is the true model Consider an native estimator:

Trang 23

Answer One can argue it: βˆis unbiased for model (18.2.15) whatever the value of α or β, therefore also when α = 0, i.e., when the model is (24.3.5) But here is the pedestrian way:

(x i − ¯ x) 2 +

P

(x i − ¯ x) εiP

(x i − ¯ x) 2

= β +

P(x i − ¯ x) εiP

(x i − ¯ x) 2 since X(x i − ¯ x)x i =X(x i − ¯ x)2

E βˆ= E β + E

P

(x i − ¯ x) εiP

(x i − ¯ x) 2

= β +

P

(x i − ¯ x) E εiP

(x i − ¯ x) 2 = β since E εi = 0 for all i, i.e., βˆis unbiased.



• b 2 points Derive the variance of βˆ if (24.3.5) is the true model

Trang 24

Answer One can again argue it: since the formula for var βˆdoes not depend on what the true value of α is, it is the same formula.

(x i − ¯ x) 2 (24.3.11)



• c 1 point Still assuming (24.3.5) is the true model, would you prefer βˆor the

˜

β from Problem 304as an estimator of β?

Answer Since β˜and βˆare both unbiased estimators, if (24.3.5) is the true model, the ferred estimator is the one with the smaller variance As I will show, var β˜≤ varβˆand, therefore,

pre-˜

β is preferred to βˆ To show

var βˆ= σ

2P

(x − ¯ x) 2 ≥ σ

2P

x 2 = var β˜

(24.3.12)

Trang 25

one must show

X

(x i − ¯ x)2≤Xx2i(24.3.13)

which is a simple consequence of (12.1.1) Thus var βˆ≥ var β˜; the variances are equal only if ¯ x = 0,

is generally a biased estimator of β.Show that its bias is

P x2

Trang 26

Answer In situations like this it is always worth while to get a nice simple expression for the sampling error:

˜

β − β =

Px iyi

Px 2 i

− β (24.3.15)

x 2 i

+ β

P

x 2 iP

x 2 i

+

P

x iεiP

x 2 i

− β (24.3.17)

= α

P

x i

Px 2 i

+

P

x iεi

Px 2 i

(24.3.18)

E[ β˜− β] = E α

P

x iP

x 2 i

+ E

P

x iεiP

x 2 i

(24.3.19)

= α

P

x iP

x 2 i

+

P

x i E εiP

x 2 i

+ 0 = αPn¯x

x 2 i

(24.3.21)

Trang 27

• b 2 points Compute var[β˜] Is it greater or smaller than

x 2 i

2var[Xx iyi] (24.3.24)

P

x 2 i

2

X

x2ivar[ yi] (24.3.25)

2

P

x 2 i

2X

x2i since all yiare uncorrelated and have equal variance σ2(24.3.26)

= σ

2P

x 2 i

(24.3.27)

This variance is smaller or equal becausePx 2 ≥P

Trang 28

• c 5 points Show that the MSE ofβ˜ is smaller than that of the OLS estimator

if and only if the unknown true parameters α and σ2 satisfy the equation

x 2 i

+

αn¯ x

P

x 2 i

2

2P

(x i − ¯ x) 2

αn¯ x

P

x 2 i

2

2P

(x i − ¯ x) 2 − σ

2P

x 2 i

= σ

2 Px 2

i −P(x i − ¯ x) 2P

(x i − ¯ x) 2P

x 2 i

2 n¯ x 2P

(x i − ¯ x) 2P

x 2 i

n

P(x i − ¯ x) 2 + ¯ x 2 ≤ σ

2P

Trang 29

If α = 0 it has a F -distribution with 1 and n − 2 degrees of freedom If α 6= 0 it has what is called

a noncentral distribution, and the only thing we needed to know so far was that it was likely to assume larger values than with α = 0 This is why a small value of that statistic supported the hypothesis that α = 0 But in the present case we are not testing whether α = 0 but whether the constrained MSE is better than the unconstrained This is the case of the above inequality holds, the limiting case being that it is an equality If it is an equality, then the above statistic has a F

distribution with noncentrality parameter 1/2 (Here all we need to know that: if z ∼ N (µ, 1) then

z 2 ∼ χ 2 with noncentrality parameter µ 2 /2 A noncentral F has a noncentral χ 2 in numerator and

a central one in denominator.) The testing principle is therefore: compare the observed value with the upper α point of a F distribution with noncentrality parameter 1/2 This gives higher critical values than testing for α = 0; i.e., one may reject that α = 0 but not reject that the MSE of the contrained estimator is larger This is as it should be Compare [ Gre97 , 8.5.1 pp 405–408] on

From the Gauss-Markov theorem follows that for every nonrandom matrix R,the BLUE of φ = Rβ is φˆ= Rβˆ Furthermore, the best linear unbiased predictor(BLUP) ofε=y− Xβ is the vector of residualsεˆ=y− Xβˆ

Problem 307 Let ˜ε= Ay be a linear predictor of the disturbance vectorε inthe model y= Xβ +εwithε∼ (o, σ2I)

• a 2 points Show that ˜ε is unbiased, i.e., E[˜ε−ε] = o, regardless of the value

of β, if and only if A satisfies AX = O

Ngày đăng: 04/07/2014, 15:20

TỪ KHÓA LIÊN QUAN