DESIRABLE AND NOT-SO-DESIRABLE ESTIMATORS

Một phần của tài liệu Errors in Statistics (Trang 42 - 46)

“The method of maximum likelihood is, by far, the most popular tech- nique for deriving estimators” Casella and Berger [1990, p. 289]. The proper starting point for the selection of the “best” method of estimation is with the objectives of our study: What is the purpose of our estimate?

If our estimate is q* and the actual value of the unknown parameter is q, what losses will we be subject to? It is difficult to understand the popular-

ity of the method of maximum likelihood and other estimation procedures that do not take these losses into consideration.

The majority of losses will be monotone nondecreasing in nature; that is, the further apart the estimate q* and the true value q, the larger our losses are likely to be. Typical forms of the loss function are the absolute deviation |q* - q|, the square deviation (q* -q)2, and the jump—that is, no loss if |q* - q| < d, and a big loss otherwise. Or the loss function may resemble the square deviation but take the form of a step function increas- ing in discrete increments.

Desirable estimators share the following properties: impartial, consistent, efficient, robust, and minimum loss.

Impartiality

Estimation methods should be impartial. Decisions should not depend on the accidental and quite irrelevant labeling of the samples. Nor should decisions depend on the units in which the measurements are made.

Suppose we have collected data from two samples with the object of estimating the difference in location of the two populations involved.

Suppose further that the first sample includes the values a, b, c, d, and e, the second sample includes the values f, g, h, i, j, k, and our estimate of the difference is q*. If the observations are completely reversed—that is, if the first sample includes the values f, g, h, i, j, kand the second sample the values a, b, c, d, and e—our estimation procedure should declare the difference to be -q*.

The units we use in our observations should not affect the resulting estimates. We should be able to take a set of measurements in feet, convert to inches, make our estimate, convert back to feet, and get absolutely the same result as if we’d worked in feet throughout. Similarly, where we locate the zero point of our scale should not affect the

conclusions.

Finally, if our observations are independent of the time of day, the season, and the day on which they were recorded (facts that ought to be verified before proceeding further), then our estimators should be independent of the order in which the observations were collected.

Consistency

Estimators should be consistent; that is, the larger the sample, the greater the probability the resultant estimate will be close to the true population value.

Efficient

One consistent estimator certainly is to be preferred to another if the first consistent estimator can provide the same degree of accuracy with fewer

observations. To simplify comparisons, most statisticians focus on the asymptotic relative efficiency(ARE), defined as the limit with increasing sample size of the ratio of the number of observations required for each of two consistent statistical procedures to achieve the same degree of accuracy.

Robust

Estimators that are perfectly satisfactory for use with symmetric normally distributed populations may not be as desirable when the data come from nonsymmetric or heavy-tailed populations, or when there is a substantial risk of contamination with extreme values.

When estimating measures of central location, one way to create a more robust estimator is to trim the sample of its minimum and maximum values (the procedure used when judging ice-skating or gymnastics). As information is thrown away, trimmed estimators are less efficient.

In many instances, LAD (least absolute deviation) estimators are more robust than their LS (least square) counterparts.1This finding is in line with our discussion of the Fstatistic in the preceding chapter.

Many semiparametric estimators are not only robust but provide for high ARE with respect to their parametric counterparts.

As an example of a semi-parametric estimator, suppose the {Xi} are independent identically distributed (i.i.d.) observations with distribution Pr{ Xi£x} = F[y- D] and we want to estimate the location parameter D without having to specify the form of the distribution F. If F is normal and the loss function is proportional to the square of the estimation error, then the arithmetic mean is optimal for estimating D. Suppose, on the other hand, that F is symmetric but more likely to include very large or very small values than a normal distribution. Whether the loss function is proportional to the absolute value or the square of the estimation error, the median, a semiparametric estimator, is to be preferred. The median has an ARE relative to the mean that ranges from 0.64 (if the observa- tions really do come from a normal distribution) to values well in excess of 1 for distributions with higher proportions of very large and very small values (Lehmann, 1998, p. 242). Still, if the unknown distribution is

“almost” normal, the mean would be far preferable.

If we are uncertain whether or not F is symmetric, then our best choice is the Hodges–Lehmann estimator defined as the median of the pairwise averages

ˆ .

D =mediani j£ (Xj+Xi) 2

CHAPTER 4 ESTIMATION 43

1 See, for example, Yoo [2001].

Its ARE relative to the mean is 0.97 when Fis a normal distribution (Lehmann, 1998, p. 246). With little to lose with respect to the mean if F is near normal, and much to gain if Fis not, the Hodges–Lehmann estimator is recommended.

Suppose {Xi} and {Yj} are i.i.d. with distributions Pr{ Xi£ x} =F[x]

and Pr { Yj£ y} = F[y- D] and we want to estimate the shift parameter D without having to specify the form of the distribution F. For a normal distribution F, the optimal estimator with least-square losses is

the arithmetic average of the mndifferences Yj-Xi. Means are highly dependent on extreme values; a more robust estimator is given by

Minimum Loss

The value taken by an estimate, its accuracy (that is, the degree to which it comes close to the true value of the estimated parameter), and the asso- ciated losses will vary from sample to sample. A minimum loss estimator is one that minimizes the losses when the losses are averaged over the set of all possible samples. Thus its form depends upon all of the following:

the loss function, the population from which the sample is drawn, and the population characteristic that is being estimated. An estimate that is optimal in one situation may only exacerbate losses in another.

Minimum loss estimators in the case of least-square losses are widely and well documented for a wide variety of cases. Linear regression with an LAD loss function is discussed in Chapter 9.

Mini–Max Estimators

It’s easy to envision situations in which we are less concerned with the average loss than with the maximum possible loss we may incur by using a particular estimation procedure. An estimate that minimizes the maximum possible loss is termed a mini–max estimator. Alas, few off-the-shelf mini–max solutions are available for practical cases, but see Pilz [1991]

and Pinelis [1988].

Other Estimation Criteria

The expected value of an unbiased estimator is the population characteristic being estimated. Thus, unbiased estimators are also consistent estimators.

ˆ .

D =medianij(Xj -Xi)

D =mn1 ÂiÂj(Yj-Xi)=Y-X,

Minimum varianceestimators provide relatively consistent results from sample to sample. While minimum variance is desirable, it may be of practical value only if the estimator is also unbiased. For example, 6 is a minimum variance estimator, but offers few other advantages.

Plug-in estimators, in which one substitutes the sample statistic for the population statistic, the sample mean for the population mean, or the sample’s 20th percentile for the population’s 20th percentile, are consistent, but they are not always unbiased or minimum loss.

Always choose an estimator that will minimize losses.

Myth of Maximum Likelihood

The popularity of the maximum likelihood estimator is hard to compre- hend. This estimator may be completely unrelated to the loss function and has as its sole justification that it corresponds to that value of the parame- ter that makes the observations most probable—provided, that is, they are drawn from a specific predetermined distribution. The observations might have resulted from a thousand other a priori possibilities.

A common and lamentable fallacy is that the maximum likelihood esti- mator has many desirable properties—that it is unbiased and minimizes the mean-squared error. But this is true only for the maximum likelihood estimator of the mean of a normal distribution.2

Statistics instructors would be well advised to avoid introducing maximum likelihood estimation and to focus instead on methods for obtaining minimum loss estimators for a wide variety of loss functions.

Một phần của tài liệu Errors in Statistics (Trang 42 - 46)

Tải bản đầy đủ (PDF)

(223 trang)