Just as βˆis the best minimum meansquare error linear unbiased estimator of β, it has been shown in [Ati62], see also[Seb77, pp.. This suggests an analogy between linear and quadratic es
Trang 1We will work in the usual regression model
Trang 2where yis a vector of n observations, X is nonstochastic with rank r < n, and thedisturbance vectorεsatisfies E[ε] = o and E[εεεε>] = σ2I The nonstochastic vector
β and scalar σ2> 0 are the parameters to be estimated The usual estimator of σ2is
ˆ2i
where M = I − X(X>X)−X>andεˆ= My If X has full rank, thenεˆ=y− Xβˆ,where βˆis the least squares estimator of β Just as βˆis the best (minimum meansquare error) linear unbiased estimator of β, it has been shown in [Ati62], see also[Seb77, pp 52/3], that under certain additional assumptions,s2is the best unbiasedestimator of σ2which can be written in the formy>Ay with a nonnegative definite
A A precise formulation of these additional assumptions will be given below; theyare, for instance, satisfied if X = ι, the vector of ones, and theεiare i.i.d But theyare also satisfied for arbitrary X ifε is normally distributed (In this last case,s2 isbest in the larger class of all unbiased estimators.)
This suggests an analogy between linear and quadratic estimation which is, ever, by no means perfect The results just cited pose the following puzzles:
how-• Why iss2not best nonnegative quadratic unbiased for arbitrary X-matrixwhenever theε are i.i.d with zero mean? What is the common logic behind
Trang 3the two disparate alternatives, that either restrictions on X or restrictions
on the distribution of theεi can makes2 optimal?
• It comes as a surprise that, again under the assumption of normality, avery simple modification ofs2, namely, the Theil-Schweitzer estimator from[TS61]
ˆ2i,
which is biased, has lower mean square error (MSE) thans2
• It is unclear why it is necessary to require ex ante that A is nonnegativedefinite Wouldn’t estimators which can yield negative values for σ2 beautomatically inferior to nonnegative ones?
We will show that these puzzles can be resolved if one replaces the requirement ofunbiasedness by that of bounded MSE (This is particularly satisfying since such
a replacement is also called for in the case of linear estimators.) Then puzzle (2)disappears: the Theil-Schweitzer estimator is no longer an oddity but it is the bestbounded MSE quadratic estimator of σ2 when the kurtosis is zero And puzzle (3)disappears as well: nonnegativity is only necessary because unbiasedness alone doesnot imply bounded MSE Under this approach it becomes evident that there aretwo important disanalogies between linear and quadratic estimation: whereas the
Trang 4best bounded MSE linear estimatorβˆof β is (a) unbiased and (b) does not depend
on the nuisance parameter σ2, the best quadratic bounded MSE estimator of σ2
is (a) biased and (b) depends on a fourth-order nuisance parameter, the kurtosis
of the disturbances This, again, helps to dispel the false suggestiveness of puzzle(1) The main assumption is distributional If the kurtosis is known, then the bestnonnegative quadratic unbiased estimator exists However it is uninteresting, sincethe (biased) best bounded MSE quadratic estimator is better The class of unbiasedestimators only then becomes interesting when the kurtosis is not known: for certainX-matrices, the best nonnegative quadratic unbiased estimator does not depend onthe kurtosis
However even if the kurtosis is not known, this paper proposes to use as estimate
of σ2 the maximum value which one gets when one applies the best bonded meansquared error estimator for all possible values of the kurtosis
25.1 Setting the Framework StraightThe assumption of unbiasedness has often been criticized Despite its high-sounding name, there are no good reasons that one should confine one’s search forgood estimators to unbiased ones Many good estimators are unbiased, but theproperty of unbiasedness has no bearing on how good an estimator is In many casesunbiased estimators do not exist or are not desirable It is indeed surprising that the
Trang 5powerful building of least squares theory seems to rest on such a flimsy assumption
what-It is usually not recognized that even in the linear case, the assumption ofbounded MSE serves to unify the theory Christensen’s monograph [Chr87] treats,
as we do here in chapter 27, best linear prediction on the basis of known first andsecond moments in parallel with the regression model Both models have much incommon, but there is one result which seems to set them apart: best linear predic-tors exist in one, but only best linear unbiased predictors in the other [Chr87, p.226] If one considers bounded MSE to be one of the basic assumptions, this seemingirregularity is easily explained: If the first and second moments are known, then ev-ery linear predictor has bounded MSE, while in the regression model only unbiasedlinear estimators do
One might still argue that no real harm is done with the assumption of ness, because in the linear case, the best bounded MSE estimators or predictors turn
Trang 6unbiased-out to be unbiased This last defense of unbiasedness falls if one goes from linear toquadratic estimation We will show that the best bounded MSE quadratic estimator
is biased
As in the the linear case, it is possible to derive these results without fully ing the distributions involved In order to compute the MSE of linear estimators, oneneeds to know the first and second moments of the disturbances, which is reflected
specify-in the usual assumptionε∼ (o, σ2I) For the MSE of quadratic estimators, one alsoneeds information about the third and fourth moments We will therefore deriveoptimal quadratic estimators of σ2based on the following assumptions regarding thefirst four moments, which are satisfied whenever theεi are independently identicallydistributed:
Trang 7Assumption A vector of n observations y = Xβ +ε is available, where εisatisfy
Trang 8of β This bound b may depend on the known nonstochastic X and the distribution
ofε, but not on β
25.2 Derivation of the Best Bounded MSE Quadratic Estimator of the
VarianceTheorem 25.2.1 If the estimator ˜σ2 of σ2 in the regression model (25.0.12) isquadratic, i.e., if it has the form ˜σ2 = y>Ay with a symmetric A, then its meansquare error E[(y>Ay− σ2)2] is bounded (with respect to β) if and only if AX = O
Trang 9Proof: Clearly, the condition AX = O is sufficient It implies y>Ay=ε>Aε,which therefore only depends on the distribution of ε, not on the value of β Toshow necessity, note that bounded MSE means both bounded variance and boundedsquared bias The variance depends on skewness and kurtosis; writing a for thevector of diagonal elements of A, it is
(25.2.1) var[y>Ay] = 4σ2β>X>A2Xβ+4σ3γ1β>X>Aa+σ4γ2a>a+2 tr(A2).This formula can be found e.g in [Seb77, pp 14–16 and 52] If AX 6= O, then
a vector δ exists with δ>X>A2Xδ > 0; therefore, for the sequence β = jδ, thevariance is a quadratic polynomial in j, which is unbounded as j → ∞
The following ingredients are needed for the best bounded MSE quadratic mator of σ2:
esti-Theorem 25.2.2 We will use the letter τ to denote the vector whose ith ponent is the square of the ith residual τi =ˆ2i Then
where m is the diagonal vector of M = I − X(X>X)−X> Furthermore,
(25.2.3) V[τ] = σ4Ω where ΩΩ = γ Q2+ 2Q + mm>,
Trang 10Q is the matrix with qij = m2ij, i.e., its elements are the squares of the elements of
M , and γ2 is the kurtosis
Here is a proof in tile notation: from (9.2.23) follows
Trang 11∆+ σ4
∆
∆+ γ2σ4
Trang 12where λ = ΩΩ−m or any other vector satisfying
distur-In order to prove theorem25.2.3, one needs somewhat different matrix-algebraictricks than those familiar from linear estimators The problem at hand can bereduced to a minimum trace problem as defined in Rao [Rao73, pp 65–66], andpart of the following proof draws on a private communication of C R Rao regardingconsistency of equation (1f.3.4) in [Rao73, p 65]
Proof of theorem25.2.3: Take an alternative estimator of the form ˜σ2=y>Ay
where A is symmetric with AX = O Since the MSE is variance plus squared bias,
Trang 13it follows, using (25.2.1) and E[y>Ay] = σ2tr A + β>X>AXβ, that
(25.2.10) MSE = E[(˜σ2− σ2)2] = σ4(tr A − 1)2+ σ4γ2a>a + 2σ4tr(A2)
where a = diag A We will first prove the following property of a: every vector uwith Qu = o satisfies a>u = 0 Let U be the diagonal matrix with u in the diagonal,and note that Qu = o implies 0 = u>Qu Writing V = M U gives vij = mijuj,therefore u>Qu = P
i,juiqijuj = P
i,jvijvji = tr(M U M U ) Since M2 = M ,tr(M U M U ) = 0 implies tr(M U M U M ) = 0, and since M is nonnegative definite,
it follows M U M U M = O, and therefore already M U M = O Since AX =
O = X>A implies A = M AM , one can write a>u = tr(AU ) = tr(M AM U ) =tr(AM U M ) = 0
This property of a can also be formulated as: there exists a vector λ with a =
Qλ Let Λ be the diagonal matrix with λ in the diagonal, and write A = M ΛM +
D Then DX = O and D has zeros in the diagonal, therefore tr(M ΛM D) =tr(ΛM DM ) = tr(ΛD) = 0, since ΛD still has zeros in the diagonal Thereforetr(A2) = tr(M ΛM Λ) + tr(D2) = λ>Qλ + tr(D2) Regarding Q observe that
m = diag M can be written m = Qι, where ι is the vector of ones, therefore
tr A = ι>Qλ = m>λ Using all this in (25.2.10) gives
σ4MSE = (m>λ − 1)2+ γ2λ>Q2λ + 2λ>Qλ + 2 tr(D2)
Trang 14Define ΣΣΣ = (γ2+2)Q2+2(Q−Q2) It is the sum of two nonnegative definite matrices:γ2+ 2 ≥ 0 by (25.1.5), and Q − Q2 is nonnegative definite because λ>(Q − Q2)λ
is the sum of the squares of the offdiagonal elements of M ΛM Therefore ΣΣΣ isnonnegative definite and it follows m = (ΣΣΣ + mm>)(ΣΣΣ + mm>)−m (To see this,take any P with ΣΣΣ = P P> and apply the identity T = T T>(T T>)−T , proof e.g
in [Rao73, p 26], to the partitioned matrix T =P m.)
Writing ΩΩΩ = ΣΣΣ + mm>, one verifies therefore
(25.2.12) 1
σ4MSE = (λ − ΩΩ−m)>ΩΩ(λ − ΩΩ−m) − m>Ω−m + 1 + 2 tr(D2).Clearly, this is minimized by D = O and any λ with ΩΩΩ(λ − ΩΩ−m) = o, which gives(25.2.7)
25.3 Unbiasedness RevisitedUnbiasedness of the estimatory>Ayis equivalent to the two mathematical con-ditions tr A = 1 and X>AX = O This is not strong enough to ensure that theestimation error is a function of ε alone In [Hsu38], the first treatment of bestquadratic estimation of σ2, P L Hsu added therefore the condition that the MSE
be independent of the value of β
Trang 15But why should the data analyst be particulary interested in estimates whoseMSE is independent of β? The research following up on Hsu tried to get rid of thisassumption again C R Rao, in [Rao52], replaced independence of the MSE by theassumption that A be nonnegative definite We argue that this was unfortunate, forthe following two reasons:
• Although one can often read that it is “natural” to require A to be negative definite (see for instance [Ati62, p 84]), we disagree Of course,one should expect the best estimator to be nonnegative, but is perplexingthat one should have to assume it We already noted this in puzzle (3) atthe beginning
non-• In the light of theorem 25.2.1, Hsu’s additional condition is equivalent tothe requirement of bounded MSE It is therefore not as poorly motivated as
it was at first assumed to be Barnard’s article [Bar63], arguing that thisassumption is even in the linear case more meaningful than unbiasedness,appeared eleven years after Rao’s [Rao52] If one wanted to improve onHsu’s result, one should therefore discard the condition of unbiasedness,not that of bounded MSE
Even the mathematical proof based on unbiasedness and nonnegative definitenesssuggests that the condition AX = O, i.e., bounded MSE, is the more fundamentalassumption Nonnegative definitenes of A is used only once, in order to get from
Trang 16the condition X>AX = O implied by unbiasedness to AX = O Unbiasednessand a nonnegative definite A together happen to imply bounded MSE, but neithercondition separately should be considered “natural” in the present framework.
The foregoing discussion seems to be academic, since the best bounded MSEestimator depends on γ2, which is rarely known But it does not depend on it verymuch I have not yet researched it fully, but it seems to be a concave function with amaximum somewhere If one uses the estimate of σ2 in order to assess the precision
of some estimates, this maximum value may provide a conservative estimate which isstill smaller than the unbiased estimate of σ2 here these notes are still incomplete;
I would like to know more about this maximum value, and it seems this would bethe estimator which one should recommend
If the requirement of unbiasedness has any redeeming qualities, they come from
an unexpected yet remarkable fact In some special cases one does not need to knowthe kurtosis if one restricts oneself to unbiased estimators of σ2 In order to rederivethis (known) result in our framework, we will first give a formula for Hsu’s estimator
We obtain it from estimator (25.2.6) by multiplying it with the appropriate constantwhich makes it unbiased
Theorem 25.3.1 The best bounded MSE quadratic unbiased estimator of σ2,which is at the same time the best nonnegative quadratic unbiased estimator of σ2,
Trang 17(for instance one may use θ = λm1> λ.) M , m, ΩΩΩ, and λ are the same as in theorem
25.2.3 The MSE of this estimator is
(25.3.4) E[(ˆˆσ2− σ2)2] = σ4( 1
m>λ− 1)
We omit the proof, which is very similar to that of theorem25.2.3 In the generalcase, estimator (25.3.1) depends on the kurtosis, just as estimator (25.2.6) does But
if X is such that all diagonal elements of M are equal, a condition which Atiqullah
in [Ati62] called “quadratically balanced,” then it does not! Since tr M = n − r,equality of the diagonal elements implies m = n−r
n ι And since m = Qι, any vectorproportional to ι satisfies (25.3.2), i.e., one can find solutions of (25.3.2) without
Trang 18knowing the kurtosis (25.3.3) gives θ = ιn−r1 , i.e., the resulting estimator is noneother than the unbiaseds2 defined in (25.0.13).
The property of unbiasedness which makes it so popular in the classroom—it iseasy to check—gains here objective relevance For the best nonnegative quadraticunbiased estimator one needs to know ΩΩΩ only up to a scalar factor, and in somespecial cases the unknown kurtosis merges into this arbitrary multiplicator
25.4 Summary
If one replaces the requirement of unbiasedness by that of bounded MSE, onecan not only unify some known results in linear estimation and prediction, but onealso obtains a far-reaching analogy between linear estimation of β and quadraticestimation of σ2 The most important dissimilarity is that, whereas one does nothave to know the nuisance parameter σ2 in order to write down the best linearbounded MSE estimator of β, the best quadratic bounded MSE estimator of σ2depends on an additional fourth order nuisance parameter, namely, the kurtosis
In situations in which the kurtosis is known, one should consider the best quadraticbounded MSE estimator (25.2.6) of σ2to be the quadratic analog of the least squaresestimatorβˆ It is a linear combination of the squared residuals, and if the kurtosis iszero, it specializes to the Theil-Schweitzer estimator (25.0.14) Regression computer