This result means that the unbiased least squares rule may have an inferior mean square error when compared to other biased estimators.. Under this uncertainty if the 0, dimensional para
Trang 1BIASED ESTIMATION
G G JUDGE and M E BOCK*
University of Illinois and Purdue University
3 Some possibly biased alternatives
3.1 Exact non-sample information
3.2 Stochastic non-sample information
3.3 Inequality non-sample information
3.4 Parameter distribution information (prior)
3.5 Some remarks
4 Pre-test-variable selection estimators
4.1 Conventional equality pre-test estimator
4.2 Stochastic hypothesis pre-test estimator
4.3 Inequality hypothesis pre-test estimator
4.4 Bayesian pre-test estimators
4.5 Variable selection estimators
5 Conventional estimator inadmissibility and the Stein-rule alternatives
5.1 Estimation under squared error loss
5.2 Stein-like rules under weighted squared error loss
6 Some biased estimator alternatives for the stochastic regressor case
7 Biased estimation with nearly collinear data
7.1 A measure of “near” collinearity
Handbook of Econometrics, Volume I, Edited by Z Griliches and M.D Intriligator
0 North-Holland Publishing Company, 1983
Trang 2600
Trang 3Despite its popularity the statistical implications of remaining in the linear unbiased family of rules may in many cases be rather severe One indication of the possibly questionable stature of the least squares rule occurred when Stein (1955) showed, under conditions normally fulfilled in practice, that there were other minimax estimators Following Stein’s result, James and Stein (1961) exhibited an estimator which under squared error loss dominates the least squares estimator and thus demonstrates its inadmissibility This result means that the unbiased least squares rule may have an inferior mean square error when compared to other biased estimators
Another trouble spot for the conventional least squares estimator arises in case
of a false statistical model Just as few economic variables are free of measure- ment error and few economic relations are non-stochastic, few statistical models are correctly specified and many of these specification errors imply a biased outcome when the least squares rule is used For example, consider the problem
of an investigator who has a single data set and wants to estimate the parameters
of a linear model which are known to lie in a high dimensional parameter space f3, The researcher may suspect the relationship may be characterized by a lower
Trang 4dimensional parameter space 0, c 8, Under this uncertainty if the 0, dimensional parameter space is estimated by least squares the result, from the possibly overspecified model, will be unbiased but have large variance and thus may make
a poor showing in terms of mean square error Alternatively, the fz dimensional parameter space may incorrectly specify the statistical model and thus if esti- mated by least squares will be biased and this bias may or may not outweigh the reduction in variance if evaluated in a mean square error context
Although uncertainty concerning the proper column dimension of the matrix of explanatory variables is the rule, in many cases prior information exists about the individual parameters and/or relationships among the unknown parameters Ignoring this information and using only sample information and the least squares rule may lead to a loss of precision, while taking the information into account may lead to a more precise though biased estimator Intuitively it would seem any estimator that does not take account of existing non-sample information should lead to suboptimal rules
Furthermore, since most economic data are passively generated and thus do not come from an experimental design situation where the investigator has a good degree of control, the data may be nearly collinear and this means that approxi- mate linear relations may hold among the columns of the explanatory variables that appear in the design matrix X When this happens the least squares estimates are unstable, the X’X matrix is often nearly singular and small changes in the observations may result in large changes in the estimates of the unknown coefficients Ridge and minimax general ridge estimators have been suggested as alternatives to the least squares rule when handling data with these characteris- tics
In the linear statistical model when the errors are long tailed and the conven- tional normally distributed constant variance error specification is not ap- propriate, the least squares rule loses some of its inferential reach Under this scenario it is necessary to consider biased alternatives which are conceptually different from, for example the Stein and ridge approaches noted above In this chapter we do no more than identify the problem, since it will be discussed in full elsewhere in this Handbook
To cope with some of the problems noted above and to avoid the statistical consequences of remaining with the conventional estimator, researchers have proposed and evaluated a range of alternatives to least squares Useful summaries
of some of the results to date include papers by Dempster (1973), Mayer and Willke (1973), Gunst and Mason (1977), and Draper and Van Nostrand (1979)
In laying out the statistical implications of a range of biased alternatives to the least squares rule the chapter is organized as follows: In Section 2 conventional linear statistical models, estimators, and a hypothesis testing framework are presented and the sampling theory and Bayes bases for gauging estimator performance are specified In Section 3 sampling theory and Bayes estimators
Trang 5Ch IO: Biased Estimation 603
which permit sample information and various types of non-sample information to
be jointly considered, are specified and appropriately evaluated In Section 4 testing frameworks are specified for evaluating the compatibility of the sample information and the various types of non-sample information and the corre- sponding pretest estimators are derived, compared, and evaluated In Section 5 the inadmissibility of the least squares estimator is discussed and a range of Stein-rule estimators are considered for alternative loss functions and design matrices In Section 6 alternatives to least squares are considered for the stochas- tic regressor case In Section 7 the problem of nearly collinear data is discussed and the ridge-type and general minimax estimators which have been suggested to cope with this age old problem, are compared and evaluated Finally, in Section 8 some comments are made about the statistical implications of these biased alternatives for econometric theory and practice
2 Conventional statistical models, estimators, tests, and measures of estimator performance
We are concerned with the sampling performance of a family of biased estimators for the following linear statistical model:
where y is a (T X 1) vector of observations, X is a known (T X K) design matrix
of rank K, /3 is a (K x 1) fixed vector of unknown parameters, e is a (T X 1) vector of unobservable normal random variables with mean vector zero and finite covariance matrix E[ee’] = a2+, with a2 unknown, and + is a known symmetric positive definite matrix We assume throughout that the random variables which
comprise e are independently and identically distributed, i.e E [ ee’] = u 21T or can
be transformed to this specification since + is known In almost all cases we will
assume e is a normal random vector
2.1 Conventional estimators and tests
Given that y is generated by the linear statistical model (2.1) the least squares basis for estimating the unknown coefficients is given by the linear rule
which is best linear unbiased If it is assumed that e is multivariate normal then (2.2) is the maximum likelihood estimator and is a minimax estimator no longer
Trang 6604 G G Judge and M E Bock
limited to the class of linear estimators Furthermore, if e is normal then b has minimum risk E[(b - 8)‘(6 - fi)] among the unbiased (not necessarily linear) estimators of /3
The assumption that y is a normally distributed vector implies that the random vector (b - /3) is normally distributed with mean vector zero and covariance
(2.3)
Therefore, the quadratic form (b - /3)‘X’X(b - /3)/a* is distributed as a central
&i-square random variable with K degrees of freedom
A best quadratic unbiased estimator of the unknown scalar a* is given by
8*=(y-Xb)‘(y-xb)/(T-K)=y’(I~-X(xtX)-’x~)y/(T-K)
where M is an idempotent matrix of rank (T - K) If we leave the class of unbiased quadratic estimators of CT*, the minimum variance quadratic estimator, with smallest mean square error, is 6* = y’My/( T - K + 2) Since e is a normally distributed vector with mean vector zero and covariance a*I,, the quadratic form
where 6 = /3 - r is a (K X 1) vector representing specification errors and r is a K
dimensional known vector Given this formulation it is conventional to use likelihood ratio procedures to test the null hypothesis HO: j3 = r against the alternative hypothesis HA : j3 * r, by using the test statistic
If the hypotheses are correct and indeed r = j3, the test statistic u is a central F
random variable with K and (T - K) degrees of freedom, i.e u - FcK,T_Kj If the linear hypotheses are incorrect, u is distributed as a non-central F random
Trang 7variable with K and (T - K) degrees of freedom and non-centrality parameter
The traditional test procedure for H,, against HA is to reject the linear hypotheses
Ho if the value of the test statistic u is greater than some specified value c The value of c is determined for a given significance level ~1 by
where S is a positive definite symmetric matrix with S’/2S1/2 = S = X’X, 8 = S’/2/3, Z = XS- ‘12, and Z’Z = IK Under this reparameterization a best linear unbiased estimator of 0 is w = Z’y with covariance fW = a21K Note also we may write (2.10a) as
where z = Z’y has a K variate normal distribution with mean vector 8 and covariance (7 21K This formulation is equivalent to the K mean statistical model usually analyzed in the statistical literature Although (2.10b) is a convenient form for analysis purposes we will remain in this chapter with the linear statistical (regression) form since this is the one most commonly dealt with in econometrics The common nature of the two problems should be realized in interpreting the results to be developed Alternatively consider the following canonical form:
where T is a non-singular matrix chosen so that the columns of XT are orthogo- nal One choice of T is to choose an orthogonal matrix P whose columns are orthonormal characteristic vectors of X’X Consequently, PP’ = I and
The columns of H are orthogonal since H’H = A, which is a diagonal matrix with elements h, > h, > > A,, that are the characteristic roots of X’X The
Trang 8606 G G Judge and M E Bock
best linear unbiased estimator of a is 6 = A- “H’y, with covariance a*A- ‘ The
variance of ai,, i = 1, 2 , , K, is a*/h,
2.2 Measures of performance
Finally, let us consider the basis for gauging the performance of a range of alternative estimators We can, as we did with the estimators considered above, require the property of unbiasedness, and in this context b is the only unbiased estimate of fl based on sufficient statistics But why the concept of unbiasedness?
If the information from sample observations is to be used for decision purposes why not make use of statistical decision theory which is based on the analyses of losses due to incorrect decisions? This is in fact the approach we use in this chapter as a basis for comparing estimators as we go outside of traditional rules and enter the family of non-linear biased estimators
Although there are many forms for representing the loss or risk functions we will to a large extent be concerned with estimation alternatives under a squared error loss measure However, the estimators we consider are in general robust under a range of loss functions
Assume that y is a (T X 1) random vector If 6( y) is some estimator of the K
dimensional parameter vector 8, then the weighted squared error or weighted quadratic loss function is
(2.13)
where Q is a known positive definite weight matrix If Q = IK under this criterion, the unbiased estimator with minimum risk is the unbiased estimator with mini- mum variance If we make use of the condition that 6( y) be both linear in y and unbiased, this leads to the Gauss-Markoff criterion and the minimum risk or best linear unbiased estimator is 6( y) = y if E[ y] = /3
Reparameterizing the statistical model and transforming from one parameter space to another in many cases changes the measure of goodness used to judge performance For example, if interest centers on statistical model (2.1) and sampling performance in the /3 space (estimation problem), specifying an un- weighted loss function in the 0 space (2.10), results in a weighted function in the /3 space, i.e
(4 -/J)@- 8) = (siP/& sV2/3)@V9 - 5%73)
Trang 9Ch IO: Biased Estimation 607
Therefore, while the reparametrized model (2.10) is appropriate for analyzing the conditional mean forecasting problem of estimating X/3 by Xb, it is not appropriate for analyzing the performance of b as an estimate of /.I unless one is interested in the particular weight matrix (X’X)
Alternatively, an unweighted squared error loss risk in the /I space results in a weighted risk function in the 8 space, i.e
Finally, let us note for the canonical form (2.12) that the orthogonal transfor- mation preserves the distance measure, i.e
(d-a)‘(&-a)=(P’b-P;B)‘(P’b-P’j3)
The minimum mean square error criterion is another basis we will use for comparing the sampling performance of estimators This generalized mean square error or risk measure for some estimator B of j3 may be defined as
MSE[8,~l=E[(B-B)(B-8)‘1
Under this measure the diagonal elements are mean square errors and the trace of (2.18) is the squared error risk, when Q = IK In using the mean square error criterion an estimator b is equal or superior to another estimator b if, for all 8,
Trang 10a Bayes estimator, &, which minimizes for all B the expected value of p( /3, b), where the expectation is taken over B with respect to its known distribution 7 The Bayes risk for /? is
i
In particular, for a weighted quadratic loss, such as (2.13),
(2.21)
the mean of the conditional distribution of /3 given the sample data
3 Some possibly biased alternatives
Under the standard linear normal statistical model and a sampling theory framework, when only sample information is used, the least squares estimator gives a minimum variance among unbiased estimators In the Bayesian frame- work for inference, if a non-informative-uniform prior is used in conjunction with the sample information, the minimum posterior mean square error property
is achieved via the least squares rule One problem with least squares in either framework is that it does not take into account the often existing prior informa- tion or relationships among the coefficients A Bayesian might even say that the non-informative prior which leads to least squares should be replaced by a proper distribution which reflects in a realistic way the existing non-sample information
To mitigate the impact of ignoring this non-sample information, and to patch
up their basis of estimation and inference so that it makes use of all of the information at hand, sampling theorists have developed procedures for combining sample and various types of non-sample information When the non-sample information is added and certain of these rules are used, although we gain in precision, biased estimators result if the prior information specification is incor- rect In other cases biased estimators result even if the prior specification is correct Thus, we are led, in comparing the estimators, to a bias-variance dichotomy for measuring performance and some of the sampling theory estima- tors which make use of non-sample information show, for example, superior mean square error over much of the relevant parameter space Alternatively, there are other conventionally used biased sampling theory alternatives for which this result does not hold In the remainder of this section we review the sampling properties of these possibly biased estimators and evaluate their performance under a squared error loss measure
Trang 11Ch IO: Biased Estimation 609
3 I Exact non -sample information
Let us assume that in addition to the sample information contained in (2.1) there also exists information about the K dimensional unknown vector 8, in the form of
J independent linear equality restrictions or hypotheses, where J I K This
information may be specified as
where R is a (J x K) known matrix of rank J which expresses the structure of the outside information as it relates to the individual parameters or their linear combinations and r is a (J X 1) vector of known elements The restrictions may also be written as S = 0, where S = R/3 - r and the (J X 1) vector S represents the specification errors in the prior information Under this scenario the maximum likelihood estimator which includes this non-sample information is
and is the solution to the problem minimizing the quadratic form ( y - Xf3)’
( y - Xfl) subject to Rfl = r, where S = X’X Thus, b* is multinormally distrib- uted, with mean
E[b*]=fl-S-‘R’[RS-‘R’]-‘(R/3-r)
covariance matrix
E[(b*- E[b*])@*-E[b*])‘] =u2[S-I-C], (34
where C = S- ‘R’[ RS- 'R']- 'RS- ‘ The mean square error or risk matrix
Trang 12These results imply that if 6 = R/3 - r = 0, then b* is best linear unbiased within the class of unbiased estimators which are linear functions of y and r If 6 z 0, then b* is biased (3.3) and has mean square or quadratic risk (3.5) and (3.6) Under the general mean square error or risk criterion, in comparing the restricted and unrestricted maximum likelihood estimators 6* and b, Toro- Vizcorrondo and Wallace (1968) show that E(b - /3)(b -/I>‘]- E[(b* - jl)(b* - /I)‘] = A, where A is a positive semi-definite if and only if
&(RS-‘R’)-‘6 <1 or S’(RS-‘R’)-‘S 1
Under the weighted quadratic risk criterion, b* has smaller risk than b, i.e
E[b* - /l)‘Q(b* - /I)] 5 E[(b - /3)‘Q(b - fi)], if and only if ~-‘u-~~‘[RS-‘R’]-’
RS- ‘QS- ‘R’[ RS- 'R']- ‘S I tr CQ2-‘ If the weight matrix under the quadratic risk criterion is X’X, i.e the conditional mean forecasting problem, then the restricted maximum likelihood estimator has risk
3.2 Stochastic non -sample information
Assume the following stochastic prior information exists about 8:
where r and R are defined in conjunction with (3.1) and v is a (J X 1) unobserva- ble, normally distributed random vector with mean 6 and covariance cr2@, with @
Trang 13611
known Following Theil and Goldberger (196 1) and Theil(1963) we may combine the sample information (2.1) with the stochastic prior information (3.9) in the linear statistical model
Trang 14612 G G Judge and M E Bock
This means that (3.16) is positive semi-definite if and only if a’( RS- ‘R’-k @) - 66’
is positive semi-definite Consequently, under the generalized mean square error criterion the stochastic restricted estimator 6** IS superior to the least squares estimator b in the part of the parameter space where
3.3 Inequality non -sample information
Assume now the non-sample information about the unknown parameters exists in the form of linear inequality constraints which may be represented as
Trang 15Ch IO: Biased Estimation 613
In order to give bias and risk evaluations we consider the orthonormal statistical model y = Z8 + e (2.1Oa) and the case where the information design matrix R has the form [IJ 01 In fact, without loss of generality, for expository purposes we consider the problem of estimating the ith unknown parameter 8, when non-sample information exists in the form 8 2 r, where r is a known scalar Since for any sample of data either the maximum likelihood estimator 6 violates the constraint or it does not, the inequality restricted estimator may be expressed
as
e+ = &X&r) 6% + &CQ(@~
= I(- m,S,a)((~ - eM+ + &,o,m)((~ - ev+
= 8 + &x+,b)~ - ~~-oo,s,o)bb(J~ (3.20) where I(.,( 0) is an indicator function that takes on the value 1 if the argument takes on a value within the subscripted interval and zero otherwise, w = (6 - 0)/a
is a standard normal random variable, and 6 = r - 0
6 + - 00, then E[O+] + 8 As S + 0, then E[B+] + 0 + u/G If the direction of the inequality is incorrect and as 6 + cc, the E[P] + 8 + 6 = r, the mean of the restricted least squares estimator The bias characteristics of the inequality estimator are presented in Figure 3.1
3.3.2 Risk
The risk-mean square error of the inequality estimator 8+ is
p(e,8+) = E[(e+ - e)“]
=E[(~-~)2]+~[Z~-,,~,~,(~)S2]-~2[~~-m,s,o)(~)~2]
(3.23)
Trang 16Figure 3.1 The mean of the inequality restricted, estimator O+ as a function of 6/n
Using corollary 3 from Judge and Yancey (1978) the risk function may be expressed, when 6 < 0, as
and when 6 2 0, as
These results imply that if the direction of the inequality is correct and 6 < 0, then (i) as 6 + - 00 the ,o(e, 8’) + a2 and (ii) as 6 + 0 the p(B,B+) + a2/2 Consequently, if 6 < 0 the inequality estimator is minimax If the direction of the inequality is not correct and 6 + co, then p( 8, P)- a2 + 0, where a2 is the risk of the restricted least squares estimator The risk characteristics of the inequality restricted estimator are presented in Figure 3.2
When the linear statistical model is orthogonal and the information design matrix R is diagonal these results generalize directly to the K mean or K dimensional linear model problem The results also hold for a general linear statistical model and diagonal R under weighted squared error loss when X’X is the weight matrix
Trang 17615
0
g2/,2
Figure 3.2 Risks for the maximum likelihood i restricted 0*, inequality restricted 6’+ and
pre-test inequality restricted B++ estimators as a function of 6*/o*
3.4 Parameter distribution information (prior)
The assumption that the parameter vector /3 is stochastic and that its distribution,
r, called the prior distribution for 8, is known, leads within the context of an appropriate loss function to the selection of a Bayes’ estimator & which may be biased As noted in Section 1, the least squares estimator b may be regarded as the Bayes estimator with respect to a diffuse prior distribution and a quadratic loss function and this particular Bayes estimator is unbiased
Alternatively, consider the use of an informative proper prior, such as the specification of the natural conjugate prior distribution r for /3 which is normal
2 with mean Band covariance u A -’ In the case of quadratic loss, this formulation results in the following optimal point estimate which minimizes posterior ex-
Trang 18616 G G Judge and M E Bock
Their principal result is given in the form of an inequality that involves the sample observations, the prior parameters, and the unknown parameters of the statistical model It should be noted that under this measure of performance both estimators are admissible
Zellner (1980) has proposed a prior for fi which is the same form as the natural conjugate prior, where A = g( X’X) and g is a positive constant The posterior mean of this resulting Bayes estimation 1s is
which approaches zero as g + 0
Under a squared error loss measure Zellner notes that the Bayes estimator fig has risk
P( 89 fig) = mJ’( 1+ d2(8- /-VW- B)/&u2)0 + d’ (3.28) and average risk K,a2/( 1 + g), given u2, where K, = tr( XX))‘ The risk of /$ will be superior to that of b if (1+g2(~-~)‘(~-~)/Kou2)/(1+g)2<1 Fu:- thermore, there may be a considerable reduction in average risk if one uses &
Trang 19611
instead of b Therefore, the g prior may provide a useful Bayesian analysis of the linear statistical model when there is information about the j? vector but little information about the prior covariance
3.5 Some remarks
We have discussed in this section three sampling theory estimators that are biased
if the non-sample information is incorrect Both the exact and stochastic re- stricted estimators win in terms of precision, but may lose, possibly heavily, in terms of bias If we could be sure of the direction of the inequality non-sample information, and in economics there are many cases when this may be true, the inequality estimator, although biased, wins in terms of precision and mean squared error and thus has appealing sampling properties
Rothenberg (1973) has studied inequality restrictions under very general as- sumptions regarding the form of the inequalities and has suggested an alternative class of estimators which are biased In particular he has shown that in the case of the linear regression model, if the true parameter vector p is known to be constrained in a convex, proper subset of the parameter space:then the restricted least squares estimator dominates the unconstrained least squares estimator under
a mean squared prediction error loss criterion However, if the sample is not normally distributed, the constrained estimator does not necessarily dominate its unconstrained counterpart under a generalized mean square error loss criterion
4 F’re-test-variable selection estimators
The previous subsections are informative relative to the sampling performance of the equality, stochastic, and inequality restricted least squares estimators One problem with these results is that the researcher is seldom certain about the correctness of the non-sample information and thus may have only a vague notion about 8 or the 6’6/2a2 specification error parameter space Therefore, the results are of little help in choosing between the restricted and unrestricted estimators or, more to the point, choosing the biased estimator with a minimum risk
Since there may be reasons to doubt the compatibility of the sample and non-sample information or uncertainty about the dimensions of the design matrix
X or Z, some biased estimators result when the investigator performs a pre- liminary test of significance (chooses a criterion) and on the basis of the test (criterion) makes a choice between the unbiased estimator and a possibly biased one To see the possible significance of these biased alternatives let us consider the equality and inequality pre-test estimators and one or more conventional
Trang 20618 G G Judge and M E Bock
variable selection procedures For expository purposes we stay in the orthonormal
linear statistical model world where 2’2 = IK and continue to assume R = I,
4.1 Conventional equality by pre- test estimator
Using likelihood ratio test procedures we may test the null hypotheses IS,: 8 = r against the hypothesis 8 * r, by using the test statistic
which is distributed as a central F random variable with K and (T - K) degrees
of freedom if the hypotheses (restrictions) are correct Of course if the restrictions
are incorrect E [d - r] = (8 - r) = 6 f 0, and u (4.1) is distributed as a non-central
F with non-centrality parameter X = (0 - r)‘(e - r)/2a2 = 6’S/2a2 As a test
mechanism the null hypothesis is rejected if u 2 F(%, T_Kj = c, where c is de- termined for a given level of the test (Y by /,” d FcK, T_Kj = P[ qK,T_K) 2 c] = (Y
This means that by accepting the null hypothesis we use the restricted least squares estimator 8* as our estimate of 8, and by rejecting the null hypothesis
8 - r = S = 0 we use the unrestricted least squares estimator 1 Thus, the estimate that results is dependent upon a preliminary test of significance and this means the estimator used by many applied workers is of the form
Alternatively the estimator may be
I = ~,,,,,(u)e*+ It c,m) W
=B-r,,,.,(u)(B-e*)=I-
(4.2) written as
Z(O,.,W(~ 29, (4.3) This specification means that in a repeated sampling context the data, the linear hypotheses, and the selected level of statistical significance all determine the combination of the two estimators that is chosen
4.1.1 Bias
From (4.3) the mean of the pre-test estimator is
Trang 21619
which by theorem 3.1 in Judge and Bock (1978, p 71) may be expressed as
E[~l=~-SP[X:,+*,,,/X:,-,,~c~/(T-K)] (4.5)
Consequently, if 6 = 0, the pre-test estimator is unbiased This fortunate outcome
aside, the size of the bias is affected by the probability of a random variable with
a non-central F distribution being less than a constant, which is determined by
the level of the test, the number of hypotheses, and the degree of hypothesis error,
6 or A Since the probability is always equal to or less than one, the bias of the
pre-test estimator is equal to or less than the bias of the restricted estimator (3.2)
4 I 2 Sampling performance
Since this estimator is used in much applied work, let us turn to its sampling
performance under the squared error loss criterion The risk function may be
written, using (4.3) and following Judge and Bock (1978, p 70), as
(1) If the restrictions are correct and _S = tj, the risk of the pre-test estimator is
u 2K [ 1 - l(2)], where 1 > (1 - l(2)) > 0 for 0 < c < cc Therefore, the pre-test esti-
mator has a smaller risk than the least squares estimator at the origin and the
Trang 22G G Judge and M E Bock
decrease in risk depends on the level of significance (Y and correspondingly the critical value of the test c
(2) As the hypothesis error 6 or h grows, the risk of the pre-test estimator increases, obtaines a maximum after exceeding the risk of the least squares estimator, and then monotonically decreases to approach a2K, the risk of the least squares estimator
(3) As the hypothesis error 8 - r = 6, and thus 6’S/2a2 increases and ap- proaches infinity, I(*) and 6’61( ) approach zero Therefore, the risk of the pre-test estimator approaches a2K, the risk of the unrestricted least squares estimator
(4) The pre-test estimator risk function crosses the risk function of the least squares estimator in the 6’S/2a2 parameter space within the bounds K/4 5 S’S/2u2 I K/2
The sampling characteristics of the preliminary test estimator are summarized
in Figure 4.1 for various levels of CI or c
These results mean that the pre-test estimator does well relative to the least squares estimator if the hypotheses are correctly specified However, in the
X = S’6/2u2 parameter space representing the range of hypothesis errors, the pre-test estimator is inferior to the least squares estimator over an infinite range
.‘.\
2
/ ,.-.“, 400- /;‘-I _ h_
_‘i;c, -._
‘L
1 “ ‘ , -.
Figure 4.1 Risk functions for the least squares and restricted least squares estimators and
Trang 23Ch 10: Biased Estimation 621
of the parameter space Also, there is a range of the parameter space where the pre-test estimator has risk that is inferior to (greater than) both the unrestricted and restricted least squares estimators No one estimator in the e^, 8+, and 8 family dominates the other competitors In addition, in applied problems we seldom know the hypothesis errors nor the location of the correct h in the 6 parameter space Consequently, the choice of the estimator and the 0ptimum.a level are unresolved problems
4.2 Stochastic hypothesis pre -test estimator
Since the stochastic_prior and sample information provide two separate estimates
of R8, i.e r and R8, Theil (1963) has proposed that we test the compatibility of the two estimates by the test statistic
(4.8)
which, if a2 is known and 6 = 0, has a central &i-square distribution with J
degrees of freedom If S G= 0, then U, is distributed as a non-central &i-square with non-centrality parameter A, = S’(RS-‘R’+ iI)-‘i3/2a2
If we use the above test statistic to test the compatibility of the prior and sample information, the estimator chosen depends upon a preliminary test of significance and thereby produces the following pre-test estimator using (3.11):
where the 1( ,( u,) are indicator functions and c is determined for a given level of
a by /P”f(Gd u - a, I- where f(ul) is the density function of u,, under the assumption that S = 0
Since the stochastic linear statistical model can be reformulated in a restricted linear model framework, many of the statistical implications of pre-testing developed in Section 4.1 carry over for the stochastic restricted pre-test estimator (4.8) The mean of b** is
E[6**]=/3+1(2)S-‘R’(RS-‘R’+@-‘)-’a, (4.10)
where l(2) < 1 Consequently, the bias is 1(2)S-‘R’( RS- ‘R’ + !D-‘)-‘6
In terms of estimator comparisons, following Judge and Bock (1978) the stochastic restricted preliminary test estimator is better in a risk matrix or general mean square error sense than the unrestricted least squares estimator if the
Trang 24622 G G Judge and M E Bock
stochastic restriction error in the form of the non centrality parameter is
where l/4 might be considered the rough bound
Alternatively, if the squared error loss criterion is used, then the equality of the risk of the stochastic restricted pre-test estimator and the least squares estimator occurs for a value of X, = S’[ RS- ‘R’ + 52]-‘6/2a2 within the following bounds:
where A = (RF’R’+ L?)-‘RS-‘R’, I(i) = P(~tr+~,~,) Q c], with 0 < l(2) < l(4) < 1 and d, and d, are the smallest and largest characteristic roots of A, respectively Since the results for both criteria depend on the critical value c or the level of the test (Y, the risk or risk matrix approaches that of the stochastic restricted estimator as (Y + 0 and c ) co Conversely, as cu + 1 and c + 0 the risk or risk matrix of the pre-test estimator approaches that of the least squares estimator Finally, for (Y < 1 the risk or risk matrix approaches that of the least squares estimator as X, + co As before the optimum level of the test is unresolved
4.3 Inequality hypothesis pre -test estimator
In the context of (3.20) we continue to consider a single parameter and the following null and alternative hypotheses:
As a basis for checking the compatibility of the sample information and a linear inequality hypothesis for 8, when a2 is known, consider the test statistic
which is distributed as a normal random variable with mean 6/a and variance 1
If it is assumed 6 = 6 - I-,= 0, then u2 is a standard normal random variable and the test structure may be formulated in terms of 6, with H,, : 6 2 0 and HA : 6 < 0 Using test statistic (4.13), we use the following test mechanism or decision rule:
(i) Reject the hypothesis H,, if (8 - r)/u = u2 < c2 G 0 and use the maximum likelihood estimator 8, where c2 is the critical value of the test from the standard normal table
Trang 25Ch 10: Biased Estimation 623
(ii) Accept the hypothesis I& if u2 = (8 - r)/a > c2 and use the inequality restricted estimator
By accepting the hypothesis H,,, we take 8+ as the estimate of 8 and by rejecting
Ha the maximum likelihood estimate e is used Consequently, when a preliminary test of the inequality hypothesis is made and a decision is taken based on the data
at hand, the following pre-test estimator results:
4.3.1 Mean of the inequality pre-test estimator
When - 00 < 6 < 0 and 0 is less than r and thus in agreement with the hypothesis
HA, and if - 6/u > 0 and d > 0, if we apply the Judge-Yancey (1977) corollaries
1 and 2 of Section 3.3 to eq (4.16), the mean of the inequality restricted pre-test estimator is
E[e++]=e-(u/JZ;;)[P(Xb,~S2/u2)-P(X~2~~d2)]
- W)[p(x& >S2/u2)-P(&>d2)] (4.17a)
For any given critical value of c2 for c2 < 0, if r - 8 = 8 = 0, the E[B++] = 0 -
(a/&%)[ 1 - P(xt2, 2 ci)] and consequently has a negative bias However, as
6 + - 00 the E[I~“] = 8 and the pre-test estimator approaches 0, since in the
limit the maximum likelihood estimator 4 will always be used Furthermore if
c2 = 0 and 6 = 0, then E[ F+] = 8; if c +-MI and 6=0, then E[8++]+8- u/G; if c2 + -cc and 6 + co, then E[8++] + 8
When 0 < 6/a and d < 0, the mean of the pre-test estimator is
(4.17b)