Handbook of Econometrics Vols1-5 _ Chapter 10 pdf

This result means that the unbiased least squares rule may have an inferior mean square error when compared to other biased estimators.. Under this uncertainty if the 0, dimensional para

Trang 1

BIASED ESTIMATION

G G JUDGE and M E BOCK*

University of Illinois and Purdue University

3 Some possibly biased alternatives

3.1 Exact non-sample information

3.2 Stochastic non-sample information

3.3 Inequality non-sample information

3.4 Parameter distribution information (prior)

3.5 Some remarks

4 Pre-test-variable selection estimators

4.1 Conventional equality pre-test estimator

4.2 Stochastic hypothesis pre-test estimator

4.3 Inequality hypothesis pre-test estimator

4.4 Bayesian pre-test estimators

4.5 Variable selection estimators

5 Conventional estimator inadmissibility and the Stein-rule alternatives

5.1 Estimation under squared error loss

5.2 Stein-like rules under weighted squared error loss

6 Some biased estimator alternatives for the stochastic regressor case

7 Biased estimation with nearly collinear data

7.1 A measure of “near” collinearity

Handbook of Econometrics, Volume I, Edited by Z Griliches and M.D Intriligator

0 North-Holland Publishing Company, 1983

Trang 2

600

Trang 3

Despite its popularity the statistical implications of remaining in the linear unbiased family of rules may in many cases be rather severe One indication of the possibly questionable stature of the least squares rule occurred when Stein (1955) showed, under conditions normally fulfilled in practice, that there were other minimax estimators Following Stein’s result, James and Stein (1961) exhibited an estimator which under squared error loss dominates the least squares estimator and thus demonstrates its inadmissibility This result means that the unbiased least squares rule may have an inferior mean square error when compared to other biased estimators

Another trouble spot for the conventional least squares estimator arises in case

of a false statistical model Just as few economic variables are free of measure- ment error and few economic relations are non-stochastic, few statistical models are correctly specified and many of these specification errors imply a biased outcome when the least squares rule is used For example, consider the problem

of an investigator who has a single data set and wants to estimate the parameters

of a linear model which are known to lie in a high dimensional parameter space f3, The researcher may suspect the relationship may be characterized by a lower

Trang 4

dimensional parameter space 0, c 8, Under this uncertainty if the 0, dimensional parameter space is estimated by least squares the result, from the possibly overspecified model, will be unbiased but have large variance and thus may make

a poor showing in terms of mean square error Alternatively, the fz dimensional parameter space may incorrectly specify the statistical model and thus if estimated by least squares will be biased and this bias may or may not outweigh the reduction in variance if evaluated in a mean square error context

Although uncertainty concerning the proper column dimension of the matrix of explanatory variables is the rule, in many cases prior information exists about the individual parameters and/or relationships among the unknown parameters Ignoring this information and using only sample information and the least squares rule may lead to a loss of precision, while taking the information into account may lead to a more precise though biased estimator Intuitively it would seem any estimator that does not take account of existing non-sample information should lead to suboptimal rules

Furthermore, since most economic data are passively generated and thus do not come from an experimental design situation where the investigator has a good degree of control, the data may be nearly collinear and this means that approxi- mate linear relations may hold among the columns of the explanatory variables that appear in the design matrix X When this happens the least squares estimates are unstable, the X’X matrix is often nearly singular and small changes in the observations may result in large changes in the estimates of the unknown coefficients Ridge and minimax general ridge estimators have been suggested as alternatives to the least squares rule when handling data with these characteristics

In the linear statistical model when the errors are long tailed and the conventional normally distributed constant variance error specification is not appropriate, the least squares rule loses some of its inferential reach Under this scenario it is necessary to consider biased alternatives which are conceptually different from, for example the Stein and ridge approaches noted above In this chapter we do no more than identify the problem, since it will be discussed in full elsewhere in this Handbook

To cope with some of the problems noted above and to avoid the statistical consequences of remaining with the conventional estimator, researchers have proposed and evaluated a range of alternatives to least squares Useful summaries

of some of the results to date include papers by Dempster (1973), Mayer and Willke (1973), Gunst and Mason (1977), and Draper and Van Nostrand (1979)

In laying out the statistical implications of a range of biased alternatives to the least squares rule the chapter is organized as follows: In Section 2 conventional linear statistical models, estimators, and a hypothesis testing framework are presented and the sampling theory and Bayes bases for gauging estimator performance are specified In Section 3 sampling theory and Bayes estimators

Trang 5

Ch IO: Biased Estimation 603

which permit sample information and various types of non-sample information to

be jointly considered, are specified and appropriately evaluated In Section 4 testing frameworks are specified for evaluating the compatibility of the sample information and the various types of non-sample information and the corre- sponding pretest estimators are derived, compared, and evaluated In Section 5 the inadmissibility of the least squares estimator is discussed and a range of Stein-rule estimators are considered for alternative loss functions and design matrices In Section 6 alternatives to least squares are considered for the stochastic regressor case In Section 7 the problem of nearly collinear data is discussed and the ridge-type and general minimax estimators which have been suggested to cope with this age old problem, are compared and evaluated Finally, in Section 8 some comments are made about the statistical implications of these biased alternatives for econometric theory and practice

2 Conventional statistical models, estimators, tests, and measures of estimator performance

We are concerned with the sampling performance of a family of biased estimators for the following linear statistical model:

where y is a (T X 1) vector of observations, X is a known (T X K) design matrix

of rank K, /3 is a (K x 1) fixed vector of unknown parameters, e is a (T X 1) vector of unobservable normal random variables with mean vector zero and finite covariance matrix E[ee’] = a2+, with a2 unknown, and + is a known symmetric positive definite matrix We assume throughout that the random variables which

comprise e are independently and identically distributed, i.e E [ ee’] = u 21T or can

be transformed to this specification since + is known In almost all cases we will

assume e is a normal random vector

2.1 Conventional estimators and tests

Given that y is generated by the linear statistical model (2.1) the least squares basis for estimating the unknown coefficients is given by the linear rule

which is best linear unbiased If it is assumed that e is multivariate normal then (2.2) is the maximum likelihood estimator and is a minimax estimator no longer

Trang 6

604 G G Judge and M E Bock

limited to the class of linear estimators Furthermore, if e is normal then b has minimum risk E[(b - 8)‘(6 - fi)] among the unbiased (not necessarily linear) estimators of /3

The assumption that y is a normally distributed vector implies that the random vector (b - /3) is normally distributed with mean vector zero and covariance

(2.3)

Therefore, the quadratic form (b - /3)‘X’X(b - /3)/a* is distributed as a central

&i-square random variable with K degrees of freedom

A best quadratic unbiased estimator of the unknown scalar a* is given by

8*=(y-Xb)‘(y-xb)/(T-K)=y’(I~-X(xtX)-’x~)y/(T-K)

where M is an idempotent matrix of rank (T - K) If we leave the class of unbiased quadratic estimators of CT*, the minimum variance quadratic estimator, with smallest mean square error, is 6* = y’My/( T - K + 2) Since e is a normally distributed vector with mean vector zero and covariance a*I,, the quadratic form

where 6 = /3 - r is a (K X 1) vector representing specification errors and r is a K

dimensional known vector Given this formulation it is conventional to use likelihood ratio procedures to test the null hypothesis HO: j3 = r against the alternative hypothesis HA : j3 * r, by using the test statistic

If the hypotheses are correct and indeed r = j3, the test statistic u is a central F

random variable with K and (T - K) degrees of freedom, i.e u - FcK,T_Kj If the linear hypotheses are incorrect, u is distributed as a non-central F random

Trang 7

variable with K and (T - K) degrees of freedom and non-centrality parameter

The traditional test procedure for H,, against HA is to reject the linear hypotheses

Ho if the value of the test statistic u is greater than some specified value c The value of c is determined for a given significance level ~1 by

where S is a positive definite symmetric matrix with S’/2S1/2 = S = X’X, 8 = S’/2/3, Z = XS- ‘12, and Z’Z = IK Under this reparameterization a best linear unbiased estimator of 0 is w = Z’y with covariance fW = a21K Note also we may write (2.10a) as

where z = Z’y has a K variate normal distribution with mean vector 8 and covariance (7 21K This formulation is equivalent to the K mean statistical model usually analyzed in the statistical literature Although (2.10b) is a convenient form for analysis purposes we will remain in this chapter with the linear statistical (regression) form since this is the one most commonly dealt with in econometrics The common nature of the two problems should be realized in interpreting the results to be developed Alternatively consider the following canonical form:

where T is a non-singular matrix chosen so that the columns of XT are orthogonal One choice of T is to choose an orthogonal matrix P whose columns are orthonormal characteristic vectors of X’X Consequently, PP’ = I and

The columns of H are orthogonal since H’H = A, which is a diagonal matrix with elements h, > h, > > A,, that are the characteristic roots of X’X The

Trang 8

606 G G Judge and M E Bock

best linear unbiased estimator of a is 6 = A- “H’y, with covariance a*A- ‘ The

variance of ai,, i = 1, 2 , , K, is a*/h,

2.2 Measures of performance

Finally, let us consider the basis for gauging the performance of a range of alternative estimators We can, as we did with the estimators considered above, require the property of unbiasedness, and in this context b is the only unbiased estimate of fl based on sufficient statistics But why the concept of unbiasedness?

If the information from sample observations is to be used for decision purposes why not make use of statistical decision theory which is based on the analyses of losses due to incorrect decisions? This is in fact the approach we use in this chapter as a basis for comparing estimators as we go outside of traditional rules and enter the family of non-linear biased estimators

Although there are many forms for representing the loss or risk functions we will to a large extent be concerned with estimation alternatives under a squared error loss measure However, the estimators we consider are in general robust under a range of loss functions

Assume that y is a (T X 1) random vector If 6( y) is some estimator of the K

dimensional parameter vector 8, then the weighted squared error or weighted quadratic loss function is

(2.13)

where Q is a known positive definite weight matrix If Q = IK under this criterion, the unbiased estimator with minimum risk is the unbiased estimator with minimum variance If we make use of the condition that 6( y) be both linear in y and unbiased, this leads to the Gauss-Markoff criterion and the minimum risk or best linear unbiased estimator is 6( y) = y if E[ y] = /3

Reparameterizing the statistical model and transforming from one parameter space to another in many cases changes the measure of goodness used to judge performance For example, if interest centers on statistical model (2.1) and sampling performance in the /3 space (estimation problem), specifying an unweighted loss function in the 0 space (2.10), results in a weighted function in the /3 space, i.e

(4 -/J)@- 8) = (siP/& sV2/3)@V9 - 5%73)

Trang 9

Ch IO: Biased Estimation 607

Therefore, while the reparametrized model (2.10) is appropriate for analyzing the conditional mean forecasting problem of estimating X/3 by Xb, it is not appropriate for analyzing the performance of b as an estimate of /.I unless one is interested in the particular weight matrix (X’X)

Alternatively, an unweighted squared error loss risk in the /I space results in a weighted risk function in the 8 space, i.e

Finally, let us note for the canonical form (2.12) that the orthogonal transfor- mation preserves the distance measure, i.e

(d-a)‘(&-a)=(P’b-P;B)‘(P’b-P’j3)

The minimum mean square error criterion is another basis we will use for comparing the sampling performance of estimators This generalized mean square error or risk measure for some estimator B of j3 may be defined as

MSE[8,~l=E[(B-B)(B-8)‘1

Under this measure the diagonal elements are mean square errors and the trace of (2.18) is the squared error risk, when Q = IK In using the mean square error criterion an estimator b is equal or superior to another estimator b if, for all 8,

Trang 10

a Bayes estimator, &, which minimizes for all B the expected value of p( /3, b), where the expectation is taken over B with respect to its known distribution 7 The Bayes risk for /? is

i

In particular, for a weighted quadratic loss, such as (2.13),

(2.21)

the mean of the conditional distribution of /3 given the sample data

3 Some possibly biased alternatives

Under the standard linear normal statistical model and a sampling theory framework, when only sample information is used, the least squares estimator gives a minimum variance among unbiased estimators In the Bayesian framework for inference, if a non-informative-uniform prior is used in conjunction with the sample information, the minimum posterior mean square error property

is achieved via the least squares rule One problem with least squares in either framework is that it does not take into account the often existing prior information or relationships among the coefficients A Bayesian might even say that the non-informative prior which leads to least squares should be replaced by a proper distribution which reflects in a realistic way the existing non-sample information

To mitigate the impact of ignoring this non-sample information, and to patch

up their basis of estimation and inference so that it makes use of all of the information at hand, sampling theorists have developed procedures for combining sample and various types of non-sample information When the non-sample information is added and certain of these rules are used, although we gain in precision, biased estimators result if the prior information specification is incorrect In other cases biased estimators result even if the prior specification is correct Thus, we are led, in comparing the estimators, to a bias-variance dichotomy for measuring performance and some of the sampling theory estimators which make use of non-sample information show, for example, superior mean square error over much of the relevant parameter space Alternatively, there are other conventionally used biased sampling theory alternatives for which this result does not hold In the remainder of this section we review the sampling properties of these possibly biased estimators and evaluate their performance under a squared error loss measure

Trang 11

3 I Exact non -sample information

Let us assume that in addition to the sample information contained in (2.1) there also exists information about the K dimensional unknown vector 8, in the form of

J independent linear equality restrictions or hypotheses, where J I K This

information may be specified as

where R is a (J x K) known matrix of rank J which expresses the structure of the outside information as it relates to the individual parameters or their linear combinations and r is a (J X 1) vector of known elements The restrictions may also be written as S = 0, where S = R/3 - r and the (J X 1) vector S represents the specification errors in the prior information Under this scenario the maximum likelihood estimator which includes this non-sample information is

and is the solution to the problem minimizing the quadratic form ( y - Xf3)’

( y - Xfl) subject to Rfl = r, where S = X’X Thus, b* is multinormally distributed, with mean

E[b*]=fl-S-‘R’[RS-‘R’]-‘(R/3-r)

covariance matrix

E[(b*- E[b*])@*-E[b*])‘] =u2[S-I-C], (34

where C = S- ‘R’[ RS- 'R']- 'RS- ‘ The mean square error or risk matrix

Trang 12

These results imply that if 6 = R/3 - r = 0, then b* is best linear unbiased within the class of unbiased estimators which are linear functions of y and r If 6 z 0, then b* is biased (3.3) and has mean square or quadratic risk (3.5) and (3.6) Under the general mean square error or risk criterion, in comparing the restricted and unrestricted maximum likelihood estimators 6* and b, Toro- Vizcorrondo and Wallace (1968) show that E(b - /3)(b -/I>‘]- E[(b* - jl)(b* - /I)‘] = A, where A is a positive semi-definite if and only if

&(RS-‘R’)-‘6 <1 or S’(RS-‘R’)-‘S 1

Under the weighted quadratic risk criterion, b* has smaller risk than b, i.e

E[b* - /l)‘Q(b* - /I)] 5 E[(b - /3)‘Q(b - fi)], if and only if ~-‘u-~~‘[RS-‘R’]-’

RS- ‘QS- ‘R’[ RS- 'R']- ‘S I tr CQ2-‘ If the weight matrix under the quadratic risk criterion is X’X, i.e the conditional mean forecasting problem, then the restricted maximum likelihood estimator has risk

3.2 Stochastic non -sample information

Assume the following stochastic prior information exists about 8:

where r and R are defined in conjunction with (3.1) and v is a (J X 1) unobservable, normally distributed random vector with mean 6 and covariance cr2@, with @

Trang 13

611

known Following Theil and Goldberger (196 1) and Theil(1963) we may combine the sample information (2.1) with the stochastic prior information (3.9) in the linear statistical model

Trang 14

612 G G Judge and M E Bock

This means that (3.16) is positive semi-definite if and only if a’( RS- ‘R’-k @) - 66’

is positive semi-definite Consequently, under the generalized mean square error criterion the stochastic restricted estimator 6** IS superior to the least squares estimator b in the part of the parameter space where

3.3 Inequality non -sample information

Assume now the non-sample information about the unknown parameters exists in the form of linear inequality constraints which may be represented as

Trang 15

In order to give bias and risk evaluations we consider the orthonormal statistical model y = Z8 + e (2.1Oa) and the case where the information design matrix R has the form [IJ 01 In fact, without loss of generality, for expository purposes we consider the problem of estimating the ith unknown parameter 8, when non-sample information exists in the form 8 2 r, where r is a known scalar Since for any sample of data either the maximum likelihood estimator 6 violates the constraint or it does not, the inequality restricted estimator may be expressed

as

e+ = &X&r) 6% + &CQ(@~

= I(- m,S,a)((~ - eM+ + &,o,m)((~ - ev+

= 8 + &x+,b)~ - ~~-oo,s,o)bb(J~ (3.20) where I(.,( 0) is an indicator function that takes on the value 1 if the argument takes on a value within the subscripted interval and zero otherwise, w = (6 - 0)/a

is a standard normal random variable, and 6 = r - 0

6 + - 00, then E[O+] + 8 As S + 0, then E[B+] + 0 + u/G If the direction of the inequality is incorrect and as 6 + cc, the E[P] + 8 + 6 = r, the mean of the restricted least squares estimator The bias characteristics of the inequality estimator are presented in Figure 3.1

3.3.2 Risk

The risk-mean square error of the inequality estimator 8+ is

p(e,8+) = E[(e+ - e)“]

=E[(~-~)2]+~[Z~-,,~,~,(~)S2]-~2[~~-m,s,o)(~)~2]

(3.23)

Trang 16

Figure 3.1 The mean of the inequality restricted, estimator O+ as a function of 6/n

Using corollary 3 from Judge and Yancey (1978) the risk function may be expressed, when 6 < 0, as

and when 6 2 0, as

These results imply that if the direction of the inequality is correct and 6 < 0, then (i) as 6 + - 00 the ,o(e, 8’) + a2 and (ii) as 6 + 0 the p(B,B+) + a2/2 Consequently, if 6 < 0 the inequality estimator is minimax If the direction of the inequality is not correct and 6 + co, then p( 8, P)- a2 + 0, where a2 is the risk of the restricted least squares estimator The risk characteristics of the inequality restricted estimator are presented in Figure 3.2

When the linear statistical model is orthogonal and the information design matrix R is diagonal these results generalize directly to the K mean or K dimensional linear model problem The results also hold for a general linear statistical model and diagonal R under weighted squared error loss when X’X is the weight matrix

Trang 17

615

0

g2/,2

Figure 3.2 Risks for the maximum likelihood i restricted 0*, inequality restricted 6’+ and

pre-test inequality restricted B++ estimators as a function of 6*/o*

3.4 Parameter distribution information (prior)

The assumption that the parameter vector /3 is stochastic and that its distribution,

r, called the prior distribution for 8, is known, leads within the context of an appropriate loss function to the selection of a Bayes’ estimator & which may be biased As noted in Section 1, the least squares estimator b may be regarded as the Bayes estimator with respect to a diffuse prior distribution and a quadratic loss function and this particular Bayes estimator is unbiased

Alternatively, consider the use of an informative proper prior, such as the specification of the natural conjugate prior distribution r for /3 which is normal

2 with mean Band covariance u A -’ In the case of quadratic loss, this formulation results in the following optimal point estimate which minimizes posterior ex-

Trang 18

Their principal result is given in the form of an inequality that involves the sample observations, the prior parameters, and the unknown parameters of the statistical model It should be noted that under this measure of performance both estimators are admissible

Zellner (1980) has proposed a prior for fi which is the same form as the natural conjugate prior, where A = g( X’X) and g is a positive constant The posterior mean of this resulting Bayes estimation 1s is

which approaches zero as g + 0

Under a squared error loss measure Zellner notes that the Bayes estimator fig has risk

P( 89 fig) = mJ’( 1+ d2(8- /-VW- B)/&u2)0 + d’ (3.28) and average risk K,a2/( 1 + g), given u2, where K, = tr( XX))‘ The risk of /$ will be superior to that of b if (1+g2(~-~)‘(~-~)/Kou2)/(1+g)2<1 Fu:- thermore, there may be a considerable reduction in average risk if one uses &

Trang 19

611

instead of b Therefore, the g prior may provide a useful Bayesian analysis of the linear statistical model when there is information about the j? vector but little information about the prior covariance

3.5 Some remarks

We have discussed in this section three sampling theory estimators that are biased

if the non-sample information is incorrect Both the exact and stochastic restricted estimators win in terms of precision, but may lose, possibly heavily, in terms of bias If we could be sure of the direction of the inequality non-sample information, and in economics there are many cases when this may be true, the inequality estimator, although biased, wins in terms of precision and mean squared error and thus has appealing sampling properties

Rothenberg (1973) has studied inequality restrictions under very general as- sumptions regarding the form of the inequalities and has suggested an alternative class of estimators which are biased In particular he has shown that in the case of the linear regression model, if the true parameter vector p is known to be constrained in a convex, proper subset of the parameter space:then the restricted least squares estimator dominates the unconstrained least squares estimator under

a mean squared prediction error loss criterion However, if the sample is not normally distributed, the constrained estimator does not necessarily dominate its unconstrained counterpart under a generalized mean square error loss criterion

4 F’re-test-variable selection estimators

The previous subsections are informative relative to the sampling performance of the equality, stochastic, and inequality restricted least squares estimators One problem with these results is that the researcher is seldom certain about the correctness of the non-sample information and thus may have only a vague notion about 8 or the 6’6/2a2 specification error parameter space Therefore, the results are of little help in choosing between the restricted and unrestricted estimators or, more to the point, choosing the biased estimator with a minimum risk

Since there may be reasons to doubt the compatibility of the sample and non-sample information or uncertainty about the dimensions of the design matrix

X or Z, some biased estimators result when the investigator performs a preliminary test of significance (chooses a criterion) and on the basis of the test (criterion) makes a choice between the unbiased estimator and a possibly biased one To see the possible significance of these biased alternatives let us consider the equality and inequality pre-test estimators and one or more conventional

Trang 20

variable selection procedures For expository purposes we stay in the orthonormal

linear statistical model world where 2’2 = IK and continue to assume R = I,

4.1 Conventional equality by pre- test estimator

Using likelihood ratio test procedures we may test the null hypotheses IS,: 8 = r against the hypothesis 8 * r, by using the test statistic

which is distributed as a central F random variable with K and (T - K) degrees

of freedom if the hypotheses (restrictions) are correct Of course if the restrictions

are incorrect E [d - r] = (8 - r) = 6 f 0, and u (4.1) is distributed as a non-central

F with non-centrality parameter X = (0 - r)‘(e - r)/2a2 = 6’S/2a2 As a test

mechanism the null hypothesis is rejected if u 2 F(%, T_Kj = c, where c is determined for a given level of the test (Y by /,” d FcK, T_Kj = P[ qK,T_K) 2 c] = (Y

This means that by accepting the null hypothesis we use the restricted least squares estimator 8* as our estimate of 8, and by rejecting the null hypothesis

8 - r = S = 0 we use the unrestricted least squares estimator 1 Thus, the estimate that results is dependent upon a preliminary test of significance and this means the estimator used by many applied workers is of the form

Alternatively the estimator may be

I = ~,,,,,(u)e*+ It c,m) W

=B-r,,,.,(u)(B-e*)=I-

(4.2) written as

Z(O,.,W(~ 29, (4.3) This specification means that in a repeated sampling context the data, the linear hypotheses, and the selected level of statistical significance all determine the combination of the two estimators that is chosen

4.1.1 Bias

From (4.3) the mean of the pre-test estimator is

Trang 21

619

which by theorem 3.1 in Judge and Bock (1978, p 71) may be expressed as

E[~l=~-SP[X:,+*,,,/X:,-,,~c~/(T-K)] (4.5)

Consequently, if 6 = 0, the pre-test estimator is unbiased This fortunate outcome

aside, the size of the bias is affected by the probability of a random variable with

a non-central F distribution being less than a constant, which is determined by

the level of the test, the number of hypotheses, and the degree of hypothesis error,

6 or A Since the probability is always equal to or less than one, the bias of the

pre-test estimator is equal to or less than the bias of the restricted estimator (3.2)

4 I 2 Sampling performance

Since this estimator is used in much applied work, let us turn to its sampling

performance under the squared error loss criterion The risk function may be

written, using (4.3) and following Judge and Bock (1978, p 70), as

(1) If the restrictions are correct and _S = tj, the risk of the pre-test estimator is

u 2K [ 1 - l(2)], where 1 > (1 - l(2)) > 0 for 0 < c < cc Therefore, the pre-test esti-

mator has a smaller risk than the least squares estimator at the origin and the

Trang 22

G G Judge and M E Bock

decrease in risk depends on the level of significance (Y and correspondingly the critical value of the test c

(2) As the hypothesis error 6 or h grows, the risk of the pre-test estimator increases, obtaines a maximum after exceeding the risk of the least squares estimator, and then monotonically decreases to approach a2K, the risk of the least squares estimator

(3) As the hypothesis error 8 - r = 6, and thus 6’S/2a2 increases and approaches infinity, I(*) and 6’61( ) approach zero Therefore, the risk of the pre-test estimator approaches a2K, the risk of the unrestricted least squares estimator

(4) The pre-test estimator risk function crosses the risk function of the least squares estimator in the 6’S/2a2 parameter space within the bounds K/4 5 S’S/2u2 I K/2

The sampling characteristics of the preliminary test estimator are summarized

in Figure 4.1 for various levels of CI or c

These results mean that the pre-test estimator does well relative to the least squares estimator if the hypotheses are correctly specified However, in the

X = S’6/2u2 parameter space representing the range of hypothesis errors, the pre-test estimator is inferior to the least squares estimator over an infinite range

.‘.\

2

/ ,.-.“, 400- /;‘-I _ h_

_‘i;c, -._

‘L

1 “ ‘ , -.

Figure 4.1 Risk functions for the least squares and restricted least squares estimators and

Trang 23

Ch 10: Biased Estimation 621

of the parameter space Also, there is a range of the parameter space where the pre-test estimator has risk that is inferior to (greater than) both the unrestricted and restricted least squares estimators No one estimator in the e^, 8+, and 8 family dominates the other competitors In addition, in applied problems we seldom know the hypothesis errors nor the location of the correct h in the 6 parameter space Consequently, the choice of the estimator and the 0ptimum.a level are unresolved problems

4.2 Stochastic hypothesis pre -test estimator

Since the stochastic_prior and sample information provide two separate estimates

of R8, i.e r and R8, Theil (1963) has proposed that we test the compatibility of the two estimates by the test statistic

(4.8)

which, if a2 is known and 6 = 0, has a central &i-square distribution with J

degrees of freedom If S G= 0, then U, is distributed as a non-central &i-square with non-centrality parameter A, = S’(RS-‘R’+ iI)-‘i3/2a2

If we use the above test statistic to test the compatibility of the prior and sample information, the estimator chosen depends upon a preliminary test of significance and thereby produces the following pre-test estimator using (3.11):

where the 1( ,( u,) are indicator functions and c is determined for a given level of

a by /P”f(Gd u - a, I- where f(ul) is the density function of u,, under the assumption that S = 0

Since the stochastic linear statistical model can be reformulated in a restricted linear model framework, many of the statistical implications of pre-testing developed in Section 4.1 carry over for the stochastic restricted pre-test estimator (4.8) The mean of b** is

E[6**]=/3+1(2)S-‘R’(RS-‘R’+@-‘)-’a, (4.10)

where l(2) < 1 Consequently, the bias is 1(2)S-‘R’( RS- ‘R’ + !D-‘)-‘6

In terms of estimator comparisons, following Judge and Bock (1978) the stochastic restricted preliminary test estimator is better in a risk matrix or general mean square error sense than the unrestricted least squares estimator if the

Trang 24

stochastic restriction error in the form of the non centrality parameter is

where l/4 might be considered the rough bound

Alternatively, if the squared error loss criterion is used, then the equality of the risk of the stochastic restricted pre-test estimator and the least squares estimator occurs for a value of X, = S’[ RS- ‘R’ + 52]-‘6/2a2 within the following bounds:

where A = (RF’R’+ L?)-‘RS-‘R’, I(i) = P(~tr+~,~,) Q c], with 0 < l(2) < l(4) < 1 and d, and d, are the smallest and largest characteristic roots of A, respectively Since the results for both criteria depend on the critical value c or the level of the test (Y, the risk or risk matrix approaches that of the stochastic restricted estimator as (Y + 0 and c ) co Conversely, as cu + 1 and c + 0 the risk or risk matrix of the pre-test estimator approaches that of the least squares estimator Finally, for (Y < 1 the risk or risk matrix approaches that of the least squares estimator as X, + co As before the optimum level of the test is unresolved

4.3 Inequality hypothesis pre -test estimator

In the context of (3.20) we continue to consider a single parameter and the following null and alternative hypotheses:

As a basis for checking the compatibility of the sample information and a linear inequality hypothesis for 8, when a2 is known, consider the test statistic

which is distributed as a normal random variable with mean 6/a and variance 1

If it is assumed 6 = 6 - I-,= 0, then u2 is a standard normal random variable and the test structure may be formulated in terms of 6, with H,, : 6 2 0 and HA : 6 < 0 Using test statistic (4.13), we use the following test mechanism or decision rule:

(i) Reject the hypothesis H,, if (8 - r)/u = u2 < c2 G 0 and use the maximum likelihood estimator 8, where c2 is the critical value of the test from the standard normal table

Trang 25

Ch 10: Biased Estimation 623

(ii) Accept the hypothesis I& if u2 = (8 - r)/a > c2 and use the inequality restricted estimator

By accepting the hypothesis H,,, we take 8+ as the estimate of 8 and by rejecting

Ha the maximum likelihood estimate e is used Consequently, when a preliminary test of the inequality hypothesis is made and a decision is taken based on the data

at hand, the following pre-test estimator results:

4.3.1 Mean of the inequality pre-test estimator

When - 00 < 6 < 0 and 0 is less than r and thus in agreement with the hypothesis

HA, and if - 6/u > 0 and d > 0, if we apply the Judge-Yancey (1977) corollaries

1 and 2 of Section 3.3 to eq (4.16), the mean of the inequality restricted pre-test estimator is

E[e++]=e-(u/JZ;;)[P(Xb,~S2/u2)-P(X~2~~d2)]

- W)[p(x& >S2/u2)-P(&>d2)] (4.17a)

For any given critical value of c2 for c2 < 0, if r - 8 = 8 = 0, the E[B++] = 0 -

(a/&%)[ 1 - P(xt2, 2 ci)] and consequently has a negative bias However, as

6 + - 00 the E[I~“] = 8 and the pre-test estimator approaches 0, since in the

limit the maximum likelihood estimator 4 will always be used Furthermore if

c2 = 0 and 6 = 0, then E[ F+] = 8; if c +-MI and 6=0, then E[8++]+8- u/G; if c2 + -cc and 6 + co, then E[8++] + 8

When 0 < 6/a and d < 0, the mean of the pre-test estimator is

(4.17b)

Định dạng
Số trang	51
Dung lượng	2,63 MB