1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Class Notes in Statistics and Econometrics Part 7 ppt

70 288 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 70
Dung lượng 557,54 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

And if there is no identification but the value can be confined to The following is an important property of consistent estimators: Slutsky theorem: Ift is a consistent estimator for θ,

Trang 1

Estimation Principles and Classification of

Estimators

13.1 Asymptotic or Large-Sample Properties of Estimators

We will discuss asymptotic properties first, because the idea of estimation is toget more certainty by increasing the sample size

Strictly speaking, asymptotic properties do not refer to individual estimatorsbut to sequences of estimators, one for each sample size n And strictly speaking, ifone alters the first 10 estimators or the first million estimators and leaves the othersunchanged, one still gets a sequence with the same asymptotic properties The resultsthat follow should therefore be used with caution The asymptotic properties maysay very little about the concrete estimator at hand

Trang 2

The most basic asymptotic property is (weak) consistency An estimator tn

(where n is the sample size) of the parameter θ is consistent iff

Answer If additional data no longer give information, like when estimating the initial state

of a timeseries, or in prediction And if there is no identification but the value can be confined to

The following is an important property of consistent estimators:

Slutsky theorem: Ift is a consistent estimator for θ, and the function g is tinuous at the true value of θ, then g(t) is consistent for g(θ)

con-For the proof of the Slutsky theorem remember the definition of a continuousfunction g is continuous at θ iff for all ε > 0 there exists a δ > 0 with the propertythat for all θ with |θ − θ| < δ follows |g(θ ) − g(θ)| < ε To prove consistency of

Trang 3

g(t) we have to show that for all ε > 0, Pr[|g(t) − g(θ)| ≥ ε] → 0 Choose for thegiven ε a δ as above, then |g(t) − g(θ)| ≥ ε implies |t− θ| ≥ δ, because all thosevalues of t for with |t− θ| < δ lead to a g(t) with |g(t) − g(θ)| < ε This logicalimplication means that

Here are the details: Most consistent estimators we will encounter are totically normal, i.e., the “shape” of their distribution function converges towardsthe normal distribution, as we had it for the sample mean in the central limit the-orem In order to be able to use this asymptotic distribution for significance testsand confidence intervals, however, one needs more than asymptotic normality (andmany textbooks are not aware of this): one needs the convergence to normality to

asymp-be uniform in compact intervals [Rao73, p 346–351] Such estimators are calledconsistent uniformly asymptotically normal estimators (CUAN estimators)

Trang 4

If one limits oneself to CUAN estimators it can be shown that there are totically “best” CUAN estimators Since the distribution is asymptotically normal,there is no problem to define what it means to be asymptotically best: those es-timators are asymptotically best whose asymptotic MSE = asymptotic variance issmallest CUAN estimators whose MSE is asymptotically no larger than that ofany other CUAN estimator, are called asymptotically efficient Rao has shown thatfor CUAN estimators the lower bound for this asymptotic variance is the asymptoticlimit of the Cramer Rao lower bound (CRLB) (More about the CRLB below) Max-imum likelihood estimators are therefore usually efficient CUAN estimators In thissense one can think of maximum likelihood estimators to be something like asymp-totically best consistent estimators, compare a statement to this effect in [Ame94, p.144] And one can think of asymptotically efficient CUAN estimators as estimatorswho are in large samples as good as maximum likelihood estimators.

asymp-All these are large sample properties Among the asymptotically efficient tors there are still wide differences regarding the small sample properties Asymptoticefficiency should therefore again be considered a minimum requirement: there must

estima-be very good reasons not to estima-be working with an asymptotically efficient estimator

Problem 195 Can you think of situations in which an estimator is acceptablewhich is not asymptotically efficient?

Trang 5

Answer If robustness matters then the median may be preferable to the mean, although it

13.2 Small Sample Properties

In order to judge how good an estimator is for small samples, one has twodilemmas: (1) there are many different criteria for an estimator to be “good”; (2)even if one has decided on one criterion, a given estimator may be good for somevalues of the unknown parameters and not so good for others

If x and y are two estimators of the parameter θ, then each of the followingconditions can be interpreted to mean thatxis better thany:

Pr[|x− θ| ≤ |y− θ|] = 1(13.2.1)

Trang 6

for every continuous and nondecreasing function g

Pr[{|x− θ| > ε}] ≤ Pr[{|y− θ| > ε}] for every ε(13.2.4)

E[(x− θ)2] ≤ E[(y− θ)2](13.2.5)

Pr[|x− θ| < |y− θ|] ≥ Pr[|x− θ| > |y− θ|]

(13.2.6)

This list is from [Ame94, pp 118–122] But we will simply use the MSE

Therefore we are left with dilemma (2) There is no single estimator that hasuniformly the smallest MSE in the sense that its MSE is better than the MSE ofany other estimator whatever the value of the parameter value To see this, simplythink of the following estimator t of θ: t = 10; i.e., whatever the outcome of theexperiments, t always takes the value 10 This estimator has zero MSE when θhappens to be 10, but is a bad estimator when θ is far away from 10 If an estimatorexisted which had uniformly best MSE, then it had to be better than all the constantestimators, i.e., have zero MSE whatever the value of the parameter, and this is onlypossible if the parameter itself is observed

Although the MSE criterion cannot be used to pick one best estimator, it can beused to rule out estimators which are unnecessarily bad in the sense that other esti-mators exist which are never worse but sometimes better in terms of MSE whatever

Trang 7

the true parameter values Estimators which are dominated in this sense are calledinadmissible.

But how can one choose between two admissible estimators? [Ame94, p 124]gives two reasonable strategies One is to integrate the MSE out over a distribution

of the likely values of the parameter This is in the spirit of the Bayesians, althoughBayesians would still do it differently The other strategy is to choose a minimaxstrategy Amemiya seems to consider this an alright strategy, but it is really toodefensive Here is a third strategy, which is often used but less well founded theoreti-cally: Since there are no estimators which have minimum MSE among all estimators,one often looks for estimators which have minimum MSE among all estimators with

a certain property And the “certain property” which is most often used is ness The MSE of an unbiased estimator is its variance; and an estimator which hasminimum variance in the class of all unbiased estimators is called “efficient.”

unbiased-The class of unbiased estimators has a high-sounding name, and the resultsrelated with Cramer-Rao and Least Squares seem to confirm that it is an importantclass of estimators However I will argue in these class notes that unbiasedness itself

is not a desirable property

Trang 8

13.3 Comparison Unbiasedness ConsistencyLet us compare consistency with unbiasedness If the estimator is unbiased,then its expected value for any sample size, whether large or small, is equal to thetrue parameter value By the law of large numbers this can be translated into astatement about large samples: The mean of many independent replications of theestimate, even if each replication only uses a small number of observations, givesthe true parameter value Unbiasedness says therefore something about the smallsample properties of the estimator, while consistency does not.

The following thought experiment may clarify the difference between ness and consistency Imagine you are conducting an experiment which gives youevery ten seconds an independent measurement, i.e., a measurement whose value isnot influenced by the outcome of previous measurements Imagine further that theexperimental setup is connected to a computer which estimates certain parameters ofthat experiment, re-calculating its estimate every time twenty new observation havebecome available, and which displays the current values of the estimate on a screen.And assume that the estimation procedure used by the computer is consistent, butbiased for any finite number of observations

unbiased-Consistency means: after a sufficiently long time, the digits of the parameterestimate displayed by the computer will be correct That the estimator is biased,means: if the computer were to use every batch of 20 observations to form a new

Trang 9

estimate of the parameter, without utilizing prior observations, and then would usethe average of all these independent estimates as its updated estimate, it would end

up displaying a wrong parameter value on the screen

A biased extimator gives, even in the limit, an incorrect result as long as one’supdating procedure is the simple taking the averages of all previous estimates If

an estimator is biased but consistent, then a better updating method is available,which will end up in the correct parameter value A biased estimator therefore is notnecessarily one which gives incorrect information about the parameter value; but it

is one which one cannot update by simply taking averages But there is no reason tolimit oneself to such a crude method of updating Obviously the question whetherthe estimate is biased is of little relevance, as long as it is consistent The moral ofthe story is: If one looks for desirable estimators, by no means should one restrictone’s search to unbiased estimators! The high-sounding name “unbiased” for thetechnical property E[t] = θ has created a lot of confusion

Besides having no advantages, the category of unbiasedness even has some convenient properties: In some cases, in which consistent estimators exist, there are

in-no unbiased estimators And if an estimator t is an unbiased estimate for the rameter θ, then the estimator g(t) is usually no longer an unbiased estimator forg(θ) It depends on the way a certain quantity is measured whether the estimator isunbiased or not However consistency carries over

Trang 10

pa-Unbiasedness is not the only possible criterion which ensures that the values ofthe estimator are centered over the value it estimates Here is another plausibledefinition:

Definition 13.3.1 An estimator ˆθ of the scalar θ is called median unbiased forall θ ∈ Θ iff

(13.3.1) Pr[ˆθ < θ] = Pr[ˆθ > θ] =1

2This concept is always applicable, even for estimators whose expected value doesnot exist

Problem 196 6 points (Not eligible for in-class exams) The purpose of the lowing problem is to show how restrictive the requirement of unbiasedness is Some-times no unbiased estimators exist, and sometimes, as in the example here, unbiased-ness leads to absurd estimators Assume the random variable x has the geometricdistribution with parameter p, where 0 ≤ p ≤ 1 In other words, it can only assumethe integer values 1, 2, 3, , with probabilities

fol-(13.3.2) Pr[x= r] = (1 − p)r−1p

Show that the unique unbiased estimator of p on the basis of one observation ofxisthe random variable f (x) defined by f (x) = 1 if x = 1 and 0 otherwise Hint: Use

Trang 11

the mathematical fact that a function φ(q) that can be expressed as a power seriesφ(q) =P∞

j=0ajqj, and which takes the values φ(q) = 1 for all q in some interval ofnonzero length, is the power series with a0= 1 and aj = 0 for j 6= 0 (You will needthe hint at the end of your answer, don’t try to start with the hint!)

Answer Unbiasedness means that E[f ( x )] =P∞r=1f (r)(1 − p)r−1p = p for all p in the unit interval, thereforeP∞r=1f (r)(1 − p) r−1 = 1 This is a power series in q = 1 − p, which must be identically equal to 1 for all values of q between 0 and 1 An application of the hint shows that the constant term in this power series, corresponding to the value r − 1 = 0, must be = 1, and all other f (r) = 0 Here older formulation: An application of the hint with q = 1 − p, j = r − 1, and

a j = f (j + 1) gives f (1) = 1 and all other f (r) = 0 This estimator is absurd since it lies on the

Problem197 As in Question61, you make two independent trials of a Bernoulliexperiment with success probability θ, and you observet, the number of successes

• a Give an unbiased estimator of θ based on t (i.e., which is a function oft)

• b Give an unbiased estimator of θ2

• c Show that there is no unbiased estimator of θ3

Hint: Since t can only take the three values 0, 1, and 2, any estimator uwhich

is a function of t is determined by the values it takes whentis 0, 1, or 2, call them

u , u , and u Express E[u] as a function of u , u , and u

Trang 12

Answer E[ u ] = u 0 (1 − θ) 2 + 2u 1 θ(1 − θ) + u 2 θ 2 = u 0 + (2u 1 − 2u 0 )θ + (u 0 − 2u 1 + u 2 )θ 2 This

is always a second degree polynomial in θ, therefore whatever is not a second degree polynomial in θ cannot be the expected value of any function of t For E[ u ] = θ we need u 0 = 0, 2u 1 −2u 0 = 2u 1 = 1, therefore u 1 = 0.5, and u 0 − 2u 1 + u 2 = −1 + u 2 = 0, i.e u 2 = 1 This is, in other words, u = t /2 For E[ u ] = θ 2 we need u 0 = 0, 2u 1 − 2u 0 = 2u 1 = 0, therefore u 1 = 0, and u 0 − 2u 1 + u 2 = u 2 = 1, This is, in other words, u = t t − 1)/2 From this equation one also sees that θ 3 and higher powers,

or things like 1/θ, cannot be the expected values of any estimators 

• d Compute the moment generating function of t

Answer.

(13.3.3) E[eλt] = e0· (1 − θ) 2 + eλ· 2θ(1 − θ) + e 2λ · θ 2 = 1 − θ + θeλ2



Problem198 This is [KS79, Question 17.11 on p 34], originally [Fis, p 700]

• a 1 point Assumet and uare two unbiased estimators of the same unknownscalar nonrandom parameter θ tanduhave finite variances and satisfy var[u−t] 6=

0 Show that a linear combination of t and u, i.e., an estimator of θ which can bewritten in the form αt+ βu, is unbiased if and only if α = 1 − β In other words,any unbiased estimator which is a linear combination of t and u can be written inthe form

Trang 13

• b 2 points By solving the first order condition show that the unbiased linearcombination oft anduwhich has lowest MSE is

var[u−t] (u−t)Hint: your arithmetic will be simplest if you start with (13.3.4)

• c 1 point If ρ2 is the squared correlation coefficient between tandu−t, i.e.,

var[t] var[u−t]

show that var[ˆ] = var[t](1 − ρ2)

• d 1 point Show that cov[t,u−t] 6= 0 implies var[u−t] 6= 0

• e 2 points Use (13.3.5) to show that if t is the minimum MSE unbiasedestimator of θ, and uanother unbiased estimator of θ, then

• f 1 point Use (13.3.5) to show also the opposite: iftis an unbiased estimator

of θ with the property that cov[t,u−t] = 0 for every other unbiased estimator uof

θ, then t has minimum MSE among all unbiased estimators of θ

Trang 14

There are estimators which are consistent but their bias does not converge tozero:

(

θ with probability 1 −n1

n with probability n1Then Pr( ˆθn− θ

n , i.e., the estimator is consistent, but E[ˆ θ] = θn−1n + n → θ + n 

And of course there are estimators which are unbiased but not consistent: ply take the first observation x1 as an estimator if E[x] and ignore all the otherobservations

Trang 15

sim-13.4 The Cramer-Rao Lower BoundTake a scalar random variableywith density function fy The entropy ofy, if itexists, is H[y] = − E[log(fy(y))] This is the continuous equivalent of (3.11.2) Theentropy is the measure of the amount of randomness in this variable If there is littleinformation and much noise in this variable, the entropy is high.

Now let y 7→ g(y) be the density function of a different random variable x Inother words, g is some function which satisfies g(y) ≥ 0 for all y, andR+∞

−∞ g(y) dy = 1.Equation (3.11.10) with v = g(y) and w = fy(y) gives

(13.4.1) fy(y) − fy(y) log fy(y) ≤ g(y) − fy(y) log g(y)

This holds for every value y, and integrating over y gives 1 − E[log fy(y)] ≤ 1 −E[log g(y)] or

(13.4.2) E[log fy(y)] ≥ E[log g(y)]

This is an important extremal value property which distinguishes the density function

fy(y) ofyfrom all other density functions: That density function g which maximizesE[log g(y)] is g = fy, the true density function ofy

This optimality property lies at the basis of the Cramer-Rao inequality, and it

is also the reason why maximum likelihood estimation is so good The difference

Trang 16

between the left and right hand side in (13.4.2) is called the Kullback-Leibler crepancy between the random variablesyandx(wherexis a random variable whosedensity is g).

dis-The Cramer Rao inequality gives a lower bound for the MSE of an unbiasedestimator of the parameter of a probability distribution (which has to satisfy cer-tain regularity conditions) This allows one to determine whether a given unbiasedestimator has a MSE as low as any other unbiased estimator (i.e., whether it is

“efficient.”)

Problem 200 Assume the density function of y depends on a parameter θ,write it fy(y; θ), and θ◦ is the true value of θ In this problem we will compare theexpected value ofy and of functions ofywith what would be their expected value if thetrue parameter value were not θ◦ but would take some other value θ If the randomvariablet is a function ofy, we write Eθ[t] for what would be the expected value oft

if the true value of the parameter were θ instead of θ◦ Occasionally, we will use thesubscript ◦ as in E◦ to indicate that we are dealing here with the usual case in whichthe expected value is taken with respect to the true parameter value θ◦ Instead of E◦one usually simply writes E, since it is usually self-understood that one has to plugthe right parameter values into the density function if one takes expected values Thesubscript ◦ is necessary here only because in the present problem, we sometimes take

Trang 17

expected values with respect to the “wrong” parameter values The same notationalconvention also applies to variances, covariances, and the MSE.

Throughout this problem we assume that the following regularity conditions hold:(a) the range ofy is independent of θ, and (b) the derivative of the density functionwith respect to θ is a continuous differentiable function of θ These regularity condi-tions ensure that one can differentiate under the integral sign, i.e., for all functiont(y) follows

• a 1 point The score is defined as the random variable

∂θlog fy(y; θ).

In other words, we do three things to the density function: take its logarithm, thentake the derivative of this logarithm with respect to the parameter, and then plug therandom variable into it This gives us a random variable which also depends on the

Trang 18

nonrandom parameter θ Show that the score can also be written as

fy(y; θ)

∂fy(y; θ)

∂θAnswer This is the chain rule for differentiation: for any differentiable function g(θ),∂θ∂ log g(θ) =

(13.4.8) f y (y; θ, ψ) = exp

yθ − b(θ) a(ψ) + c(y, ψ)



, then

(13.4.9) ∂ log fy(y; θ, ψ)

y −∂b(θ)∂θa(ψ)



Trang 19

• c 3 points If fy(y; θ◦) is the true density function of y, then we know from(13.4.2) that E◦[log fy(y; θ◦)] ≥ E◦[log f (y; θ)] for all θ This explains why the score

is so important: it is the derivative of that function whose expected value is maximized

if the true parameter is plugged into the density function The first-order conditions

in this situation read: the expected value of this derivative must be zero for the trueparameter value This is the next thing you are asked to show: If θ◦ is the trueparameter value, show that E◦[q(y; θ◦)] = 0

Answer First write for general θ

For θ = θ ◦ this simplifies:

∂θ1 = 0.(13.4.11)

Here I am writing ∂fy (y;θ)

∂θ θ=θ ◦

instead of the simpler notation ∂fy (y;θ◦)

∂θ , in order to emphasize that one first has to take a derivative with respect to θ and then one plugs θ ◦ into that derivative 

• d Show that, in the case of the exponential dispersion family,

∂θ ◦

Trang 20

Answer Follows from the fact that the score function of the exponential family ( 13.4.7 ) has

• e 5 points If we differentiate the score, we obtain the Hessian

2

(∂θ)2log fy(y; θ)

From now on we will write the score function asq(θ) instead of q(y; θ); i.e., we will

no longer make it explicit thatq is a function ofy but write it as a random variablewhich depends on the parameter θ We also suppress the dependence of hon y; ournotation h(θ) is short for h(y; θ) Since there is only one parameter in the densityfunction, score and Hessian are scalars; but in the general case, the score is a vectorand the Hessian a matrix Show that, for the true parameter value θ◦, the negative

of the expected value of the Hessian equals the variance of the score, i.e., the expectedvalue of the square of the score:

Trang 21

and differentiate the rightmost expression one more time:

= −q2( y ; θ) + 1

f y ( y ; θ)

∂2

∂θ 2 f y ( y ; θ) (13.4.17)

Taking expectations we get

Trang 22

Answer Differentiation of ( 13.4.7 ) gives h (θ) = −∂2∂θb(θ)2 1

a(φ) This is constant and therefore equal to its own expected value ( 13.4.14 ) says therefore

2 b(θ)

∂θ 2 θ=θ ◦

1 a(φ)= E◦[q

2 (θ◦)] = 1

a(φ)2var ◦ [ y ]

Problem 201

• a Use the results from question 200 to derive the following strange and teresting result: for any random variable t which is a function of y, i.e., t = t(y),follows cov◦[q(θ◦),t] = ∂

Trang 23

If the θ in q (θ) is the right parameter value θ◦one can simplify:

t(y) dy (13.4.23)

(13.4.24)

= ∂

∂θEθ[t]

θ=θ ◦

E◦[q(θ◦)] = 0, we know var◦[q(θ◦)] = E◦[q2(θ◦)], and since t is unbiased, we knowvar◦[t] = MSE◦[t; θ◦] Therefore the Cauchy-Schwartz inequality reads

(13.4.26) MSE◦[t; θ◦] ≥ 1/ E◦[q2(θ◦)]

This is the Cramer-Rao inequality The inverse of the variance ofq(θ◦), 1/ var◦[q(θ◦)] =1/ E◦[q2(θ◦)], is called the Fisher information, written I(θ◦) It is a lower bound forthe MSE of any unbiased estimator of θ Because of (13.4.14), the Cramer Rao

Trang 24

inequality can also be written in the form

(13.4.27) MSE[t; θ◦] ≥ −1/ E◦[h(θ◦)]

(13.4.26) and (13.4.27) are usually written in the following form: Assumey hasdensity function fy(y; θ) which depends on the unknown parameter θ, and and lett(y) be any unbiased estimator of θ Then

If one has a whole vector of observations then the Cramer-Rao inequality involvesthe joint density function:

E[ ∂θ∂ log fy(y; θ)2]

E[∂θ∂22log fy(y; θ)].This inequality also holds if y is discrete and one uses its probability mass functioninstead of the density function In small samples, this lower bound is not alwaysattainable; in some cases there is no unbiased estimator with a variance as low asthe Cramer Rao lower bound

Trang 25

Problem 202 4 points Assume n independent observations of a variable y ∼

N(µ, σ2) are available, where σ2is known Show that the sample mean ¯y attains theCramer-Rao lower bound for µ

Answer The density function of each yiis

nXi=1

( yi− µ)2(13.4.31)

( yi− µ) (13.4.32)

In order to apply ( 13.4.29 ) you can either square this and take the expected value

alternatively one may take one more derivative from (13.4.32) to get

∂ 2

∂µ 2 `( y ; µ) = −n

σ 2

(13.4.34)

Trang 26

This is constant, therefore equal to its expected value Therefore the Cramer-Rao Lower Bound says that var[¯ y ] ≥ σ2/n This holds with equality 

Problem203 Assumeyi∼NID(0, σ2) (i.e., normally independently distributed)with unknown σ2 The obvious estimate of σ2 iss2=n1Py2

i

• a 2 points Show thats2is an unbiased estimator of σ2, is distributed ∼ σn2χ2n,and has variance 2σ4/n You are allowed to use the fact that aχ2n has variance 2n,which is equation (5.9.5)

Trang 27

y i = σz i

(13.4.37)

yi2= σ2z2i(13.4.38)

nXi=1

yi2= σ2

nXi=1

zi2∼ σ 2 χ2n

(13.4.39)

1 n

nXi=1

(13.4.40)

var1n

nXi=1

y 2 i



• b 4 points Show that this variance is at the same time the Cramer Rao lowerbound

Trang 28

Alternatively, one can differentiate one more time:

∂ 2 log f y

(∂σ 2 ) 2 (y; σ 2 ) = −y

2

σ 6 + 12σ 4

(13.4.46)

(13.4.47)

Problem 204 4 points Assume x1, ,xn is a random sample of independentobservations of a Poisson distribution with parameter λ, i.e., each of the x has

Trang 29

probability mass function

Here is a formulation of the Cramer Rao Inequality for probability mass tions, as you need it for Question 204 Assume y1, ,yn are n independent ob-servations of a random variable y whose probability mass function depends on theunknown parameter θ and satisfies certain regularity conditions Write the univari-ate probability mass function of each of the yi as py(y; θ) and let t be any unbiasedestimator of θ Then

Trang 30

Answer The Cramer Rao lower bound says no.

log p x ( x ; λ) = x log λ − log x ! − λ (13.4.50)

Therefore the Cramer Rao lower bound is λ, which is the variance of the sample mean 

If the density function depends on more than one unknown parameter, i.e., if

it has the form fy(y; θ1, , θk), the Cramer Rao Inequality involves the followingsteps: (1) define `(y; θ , · · · , θ ) = log f (y; θ , , θ ), (2) form the following matrix

Trang 31

which is called the information matrix :

−n E[ ∂ 2 `

∂θk∂θ1] · · · −n E[∂ 2 `

∂θ 2 k

and (3) form the matrix inverse I−1 If the vector random variable t =

t1

θn

, then the inverse of

the information matrix I−1 is a lower bound for the covariance matrix V[t] in thefollowing sense: the difference matrix V[t] − I−1 is always nonnegative definite

From this follows in particular: if iii is the ith diagonal element of I−1, thenvar[t] ≥ iii

Trang 32

13.5 Best Linear Unbiased Without Distribution Assumptions

If the xi are Normal with unknown expected value and variance, their samplemean has lowest MSE among all unbiased estimators of µ If one does not assumeNormality, then the sample mean has lowest MSE in the class of all linear unbiasedestimators of µ This is true not only for the sample mean but also for all least squaresestimates This result needs remarkably weak assumptions: nothing is assumed aboutthe distribution of the xi other than the existence of mean and variance Problem

205 shows that in some situations one can even dispense with the independence ofthe observations

Problem 205 5 points [Lar82, example 5.4.1 on p 266] Let y1 andy2 be tworandom variables with same mean µ and variance σ2, but we do not assume that theyare uncorrelated; their correlation coefficient is ρ, which can take any value |ρ| ≤ 1.Show that ¯y= (y1+y2)/2 has lowest mean squared error among all linear unbiasedestimators of µ, and compute its MSE (An estimator ˜µ of µ is linear iff it can bewritten in the form ˜µ = α y + α y with some constant numbers α and α )

Trang 33

Now sort by the powers of α:

= 2α2(1 − ρ) − 2α(1 − ρ) + 1 (13.5.5)

= 2(α2− α)(1 − ρ) + 1.

(13.5.6)

This takes its minimum value where the derivative ∂

∂α (α 2 − α) = 2α − 1 = 0 For the MSE plug

α 1 = α 2 − 1/2 into ( 13.5.3 ) to get σ22 1 + ρ 

Problem 206 You have two unbiased measurements with errors of the samequantityµ (which may or may not be random) The first measurementy1 has meansquared error E[(y1 −µ)2] = σ2, the other measurement y2 has E[(y1 −µ)2] =

τ2 The measurement errors y1−µ andy2−µ have zero expected values (i.e., themeasurements are unbiased) and are independent of each other

Trang 34

• a 2 points Show that the linear unbiased estimators ofµ based on these twomeasurements are simply the weighted averages of these measurements, i.e., they can

be written in the form ˜µ= αy1+ (1 − α)y2, and that the MSE of such an estimator

is α2σ2+ (1 − α)2τ2 Note: we are using the word “estimator” here even if µ israndom An estimator or predictor ˜µ is unbiased if E[˜µ−µ] = 0 Since we allow µ

to be random, the proof in the class notes has to be modified

Answer The estimator ˜ µ is linear (more precisely: affine) if it can written in the form

˜

µ = α 1 y1+ α 2 y2+ γ (13.5.7)

The measurements themselves are unbiased, i.e., E[ yi− µ ] = 0, therefore

E[˜ µ − µ ] = (α 1 + α 2 − 1) E[ µ ] + γ = 0 (13.5.8)

for all possible values of E[ µ ]; therefore γ = 0 and α 2 = 1 − α 1 To simplify notation, we will call from now on α 1 = α, α 2 = 1 − α Due to unbiasedness, the MSE is the variance of the estimation error

var[˜ µ − µ ] = α2σ2+ (1 − α)2τ2(13.5.9)

Trang 35

Show that the Best (i.e., minimum MSE) linear unbiased estimator (BLUE) of µbased on these two measurements is

i.e., it is the weighted average of y1 andy2where the weights are proportional to theinverses of the variances

Answer The variance ( 13.5.9 ) takes its minimum value where its derivative with respect of

α is zero, i.e., where

∂α α

2 σ2+ (1 − α)2τ2= 2ασ2− 2(1 − α)τ2= 0 (13.5.12)

... the left and right hand side in (13.4.2) is called the Kullback-Leibler crepancy between the random variablesyandx(wherexis a random variable whosedensity is g).

dis-The Cramer Rao inequality... parameter, and then plug therandom variable into it This gives us a random variable which also depends on the

Trang 18

nonrandom... Inequality involves the followingsteps: (1) define `(y; θ , · · · , θ ) = log f (y; θ , , θ ), (2) form the following matrix

Trang 31

Ngày đăng: 04/07/2014, 15:20

TỪ KHÓA LIÊN QUAN