A widely applicable approach is to estimate the parameter by a sample analog; that is, by a statistic having the same properties in the sample as the parameter does in the population.. T
Trang 14.3 Mean independent linear models
4.4 Quantile independent monotone models
Estimation of general separable and response models
5.1 Closest-empirical-distribution estimation of separable models
5.2 Minimum-distance estimation of response models
*I am grateful for the comments of Rosa Matzkin and Jim Powell
Handbook of Econometrics, Volume IV, Edited by R.F Engle and D.L McFadden
0 1994 Elsevier Science B V All rights reserved
Trang 2Abstract
Suppose that one wants to estimate a parameter characterizing some feature of a specified population One has some prior information about the population and a random sample of observations A widely applicable approach is to estimate the parameter by a sample analog; that is, by a statistic having the same properties in the sample as the parameter does in the population If there is no such statistic, then one may choose an estimate that, in some well-defined sense, makes the known properties of the population hold as closely as possible in the sample These are analog estimation methods This chapter surveys some uses of analog methods to estimate two classes ofeconometric models, the separable and the response models
Familiar examples include use of the sample average to estimate the population mean and sample quantiles to estimate population quantiles The classical method
of moments (Pearson (1894)) is an analog approach, as is minimum chi-square estimation (Neyman (1949)) Maximum likelihood, least squares and least absolute deviations estimation are analog methods
This chapter surveys some uses of analog methods to estimate econometric models Section 2 presents the necessary preliminaries, defining the analogy principle, moment problems and the method of moments, and two classes of models, the separable and the response models Sections 3 and 4 describe the variety of separable and response models that imply moment problems and may be estimated
by the method of moments Section 5 discusses two more general analog estimation approaches: closest empirical distribution estimation of separable models and minimum distance estimation of response models Section 6 gives conclusions The reader wishing a more thorough treatment of much of the material in this chapter should see Manski (1988)
The analogy principle is used here to estimate population parameters Other chapters of this handbook exploit related ideas for other purposes The chapter
by Hall describes bootstrap methods, which apply the analogy principle to approxi-
Trang 3Ch 43: Analoy Estimation of Econometric Models
mate the distribution of sample statistics The chapter by Hajivassiliou and Ruud describes simulation methods, which use the analogy between an observed sample and a pseudo-sample from the same population, drawn at postulated parameter values
2 Preliminaries
2.1 The analogy principle
Assume that a probability distribution P on a sample space 2 characterizes a population One observes a sample of N independent realizations of a random variable z distributed P One knows that P is a member of some family 17 of prob-
ability distributions on 2 One also knows that a parameter b in a parameter space B solves an equation
where T(., *) is a given function mapping 17 x B into some vector space Y The
problem is to combine the sample data with the knowledge that DEB, PEJI and T(P, b) = 0 so as to estimate b
Many econometric models imply that a parameter solves an extremum problem rather than an equation We can use (1) to express extremum problems by saying that b solves
b - argmin W(P, c) = 0
Here W(., ) is a given function mapping 17 x B into the real line
Let P, be the empirical distribution of the sample of N draws from P That is,
P, is the multinomial probability distribution that places probability l/N on each
of the N observations of z The group of theorems collectively referred to as the laws of large numbers show that P, converges to P in various senses as N ) co
This suggests that to estimate b one might substitute the function T(P,, ) for T(P, ) and use
This defines the analog estimate when P, is a feasible value for P; that is, when P,EIZ In these cases T(P,;) is well-defined and has at least one zero in B, so B,
is the (possibly set-valued) analog estimate of b
Equation (3) does not explain how to proceed when P,#I7 We have so far
defined T(.;) only on the space ZZ x B of feasible population distributions and parameter values The function T(P,, ) is as yet undefined for P,#I7
Trang 4Let @ denote the space of all multinomial distributions on Z To define T(P,, ) for every sample size and all sample realizations, it suffices to extend T(., ) from
17 x B to the domain (n u @) x B Two approaches have proved useful in practice
Mapping P, into 17 One approach is to map P, into 17 Select a function
rc(.): Hu @ + 17 which maps every member of 17 into itself Now replace the equation T(P, b) = 0 with
This substitution leaves the estimation problem unchanged as T[rr(Q), ] = r(Q, ) for all Q~17 Moreover, n(P,)~17; so T[rc(P,);] is defined and has a zero in B The analogy principle applied to (4) yields the estimate
When P,EII, this estimate is the same as the one defined in equation (3) When
P,$II, the estimate (5) depends on the selected function rc(.); hence we write B,, rather than B,
A prominent example of this approach is kernel estimation of Lebesgue density functions Let n be the space of distributions having Lebesgue densities The empirical distribution P, is multinomial and so is not in 17 But P, can be smoothed
so as to yield a distribution that is in 17 In particular, the convolution of P, with
any element of 17 is itself an element of 17 The density of the convolution is a kernel density estimate See Manski (1988), Chapter 2
Direct extension Sometimes there is a natural direct way to extend the domain
of T(., ), so T(P,, ) is well-defined Whenever T(P,, ) has a zero in B, equation (3) gives the analog estimate If P, is not in l7, it may be that T(P,, c) # 0 for all
CEB Then the analogy principle suggests selection of an estimate that makes
T(P,, ) as close as possible to zero in some sense
To put this idea into practice, select an origin-preserving function r(.) which
maps values of T(., ) into the non-negative real half line That is, let I(.): Y + [0, co),
with T = O-r(T) = 0 Now replace the equation T(P, b) = 0 with the extremum
problem
min r[ T(P, c)]
This substitution leaves the estimation problem unchanged as T(Q,c) = 00
r[T(Q, c)] = 0 for (Q, c)~17 x B To estimate b, solve the sample analog of (6) Provided only that r[T(P,;)] attains its minimum on B, the analog estimate is
B,, = argmin r[T(P,, c)]
Trang 5Ch 43: Analog Estimation of Econometric Models 2563
If P,~17, this estimate is the same as the one defined in (3) If PN$17 but T(P,, ) has a zero in B, the estimate remains as in (3) If T(P,;) is everywhere non-zero, the estimate depends on the selected function r(.); hence we write B,, rather than B, Section 2.2 describes an extraordinarily useful application of this approach, the method of moments
2.2 Moment problems
Much of present-day econometrics is concerned with estimation of a parameter b
solving an equation of the form
In (8), g(., ) is a given function mapping 2 x B into a real vector space In (9), h(., )
is a given function mapping 2 x B into the real line Numerous prominent examples
of (8) and (9) will be given in Sections 3 and 4 respectively
When PN~n, application of the analogy principle to (8) and (9) yields the estimates
It remains only to consider the possibility that the estimates may not exist In applications, s h(z, ) dP, generally has a minimum On the other hand, jg(z, *)dP, often has no zero In that case, one may select an origin-preserving transformation
r(.) and replace (8) with the problem of minimizing r[J g(z, ) dP], as was done in (6)
Trang 6Minimizing the sample apalog yields
(12)
Estimation problems relating b to P by (8) or (9) are called moment problems Estimates of the forms (lo), (1 l), and (12) are method-of-moments estimates Use of the term “moment” rather than the equally descriptive “expectation,” “mean,” or
“integral” honors the early work of K Pearson on the method of moments
Consistency of method-of-moments estimates Clearly, consistent estimation of b
requires that the asserted moment problem has a unique solution; that is, b must
be identified If no solution exists, the estimation problem has been misspecified and
b is not defined If there are multiple solutions, sample data cannot possibly distinguish between them There is no general approach for determining the number
of solutions to equation systems of the form (8) or to extremum problems of the form (9) One must proceed more or less case-by-case
Given identification, method-of-moments estimates are consistent if the estima- tion problem is sufficiently regular Rigorous treatments appear in such econo- metrics texts as Amemiya (1985), Gallant (1987) and Manski (1988) I provide here
an heuristic explanation focussing on (12); case (11) involves no additional consi- derations
We are concerned with the behavior of the function r[Jg(z, *)dP,] as N + co The strong law of large numbers implies that for all CEB, s g(z, c) dP, + j g(z, c) dP
as N + co, almbst surely The convergence is uniform on B if the parameter space
is sufficiently small, the function g(.;) sufficiently smooth, and the distribution P sufficiently well-behaved (For example, it suffices for B to be a compact finite- dimensional set, for J (g(z, *) 1 dP to be bounded by an integrable function D(z), and for g(z;) to be continuous on B See Manski (1988), Chapter 7.) If the convergence
is uniform and I( ) is smooth, then as N + cc the minima on B of I [ J g(z, ) dP,] tend to occur increasingly near the minima of r[Jg(.z, ) dP] The unique minimum
of r[Jg(z, ) dP] occurs at b So the estimate B,, converges to b
Uniform convergence on B of Jg(z, ) dP, to Jg(z, )dP is close to a necessary condition for consistency of method-of-moments estimates If this condition is seriously violated, s g(z, ) dP, is not a good sample analog to jg(z, ) dP and the estimation approach does not work Beginning in the 1930s with the Glivenko- Cantelli Theorem, statisticians and econometricians have steadily broadened the range of specifications of B, g(., *) and P for which uniform laws of large numbers have been shown to hold (e.g Pollard (1984) and Andrews (1987)) Nevertheless, uniformity does break down in situations that are far from pathological Perhaps the most important practical concern is the size of the parameter space Given a specification for g(., ) and for P, uniformity becomes a more demanding property
as B becomes larger
Trang 7Sampling distributions The exact sampling distributions of method-of-moments estimates are generally complicated Hence the practice is to invoke local asymptotic approximations If the parameter space is finite-dimensional and the estimation problem is sufficiently regular, a method-of-moments estimate B,, converges at rate l/,,&andfl(B,,-b)h as a limiting normal distribution centered at zero Alter- native estimates of a given parameter may have limiting distributions with different variances This fact suggests use of the variance of the limiting distribution as a criterion for measuring precision
Comparison of the precision of alternative estimators has long engaged the attention of econometric theorists An estimate is termed asymptotic efficient if the variance of the limiting normal distribution of @(B,, - b) is the smallest possible
given the available prior information Hansen (1982) and Chamberlain (1987) provide the central findings on the efficiency of method-of-moments estimates For
an exposition, see Manski (1988), Chapters 8 and 9
Non-random sampling In discussing moment problems and estimation problems
more generally, I have assumed that the data are a random sample It is important
to understand that random sampling, albeit a useful simplifying idea, is not essential
to the success of analog estimation The essential requirement is that the sampling process be such that relevant features of the empirical distribution converge to corresponding population features
For example, consider stationary time series problems Here the data are obser- vations at N dates from a single realization of a stationary stochastic process whose marginal distribution is P So we do not have a random sample from P
Nevertheless, dependent sampling versions of the laws of large numbers show that
P, converges to P in various senses as N -+ co
2.3 Econometric models
We have been discussing an estimation problem relating a parameter b to a probabi-
lity distribution P generating realizations of an observable random variable z Eco-
nometric models typically relate a parameter b to realizations of the observable z and of an unobservable random variable, say u Analog estimation methods may
be used to estimate b if one can transform the econometric model into a representa-
tion relating b to P and to nuisance parameters
Formally, suppose that a probability distribution P,, on a space Z x U charac-
terizes a population A random sample of N realizations of a random variable (z, u) distributed P,, is drawn and one observes the realizations of z but not of U One knows that P,, is a member of some family D,, of probability distributions on
Z x U One also knows that a parameter b in a parameter space B solves an equation
Trang 8where f(., , ) maps Z x U x B into some vector space Equation (13) is to be interpreted as saying that almost every realization (i, q) of (z, U) satisfies the equation f(i, yl, b) = 0
Equation (13) typically has no content in the absence of information on the probability distribution P,, generating (z, u) A meaningful model combines (13) with some distributional knowledge The practice has been to impose restrictions
on the probability distribution of u conditional on some function of z, say x = x(z) taking values in a space X Let P,lx denote this conditional distribution Then a model is defined by equation (13) and by a restriction on the conditional distributions (P,l5,5EX)
Essentially all econometric research has specified f to have one of two forms A separable model makes the unobserved variable u additively separable, so that
where u,(.;) maps Z x B into U A response model defines z = (y, x), Z = Y x X, and makes f have the form
3 Method-of-moments estimation of separable models
Separable models suppose that realizations of (z, U) are related to the parameter b
through an equation
In the absence of information restricting the distribution of the unobserved U, this equation simply defines u and conveys no information about b In the presence of
various distributional restrictions (16) implies that b and a nuisance parameter solve
a type of moment equation known as an orthogonality condition, defined here
Orthogonality conditions Let x = x(z) take values in a real vector space X Let r
denote a space in which a nuisance parameter y lives Let e(*, ) be a function mapping U x r into a real vector space Let e(.;)’ denote the transpose of the
Trang 9Ch 43: Analog Estimation of Econometric Models 2561
column vector e(., ) The random vectors x and e(u, y) are orthogonal if
Equation (17) relates the observed random variable x to the unobserved random variable u Suppose that (16) holds Then we can replace u in (17) with u,(z, b), yielding
This orthogonality condition is a moment equation relating the parameters (b, y) to
the distribution P of the observable z
It is not easy to motivate orthogonality conditions directly, but we can readily show that these conditions are implied by various more transparent distributional restrictions The remainder of this section describes the leading cases
Most authors incorporate the nuisance parameter y into the specification of u,(., )
by giving that function a free intercept This done, u is declared to have mean zero and equation (19) is rewritten as
Trang 10Mean independence implies zero covariance but
covariance in the absence of mean independence
iterated expectation
it is difficult to motivate zero
To see why, rewrite (19) as the
u are unrelated in the sense of mean independence
Mean independence implies orthogonality conditions beyond (19) Let u(.) be any function mapping X into a real vector space It follows from (16) and (21) that
s
v(x) [u,(z, h) - y]’ dP =
s [S v(x) (u - y)’ dP, 1 x 1 dP, = 0, (23)
provided only that the integral in (23) exists So the random variables u(x) and u,(z, b) are uncorrelated In other words, all functions ofx are instrumental variables
3.2 Median independence
The assertion that u is mean independent of x expresses a belief that u has the same central tendency conditional on each realization of x Median independence offers another way to express this belief Median independence alone does not imply an orthogonality condition, but it does when the conditional distributions P,l& VEX
are componentwise continuous
Let U be the real line; the vector case introduces no new considerations as we shall deal with u componentwise For each 5 in X, let mg be the median of u
conditional on the event [x = 41 Let y be the unconditional median of U We say that u is median independent of x if
It can be shown (see Manski (1988), Chapter 4) that if P,/& [EX are continuous
probability distributions, their medians solve the conditional moment equations
s
Trang 11Ch 43: Analog Estimation of Econometric Models 2569
So median independence and continuity together imply that
s sgn(u - Y)dP,It = 0, VEX
It follows from (16) and (26) that
3.3 Conditional symmetry
Mean and median independence both express a belief that the central tendency of
u does not vary with x Yet they are different assertions This fact may cause the applied researcher some discomfort One often feels at ease saying that the central tendency of u does not vary with x But only occasionally can one pinpoint the mathematical sense in ulhich the term “central tendency” should be interpreted The need for care in defining central tendency disappears if the conditional distributions P,l<, {EX are componentwise symmetric with common center of symmetry Let U be the real line again and assume that for all realizations of x, the conditional distribution of u is symmetric around some point y That is,
Let II(.) be any odd function mapping the real line into a real vector space; that is,
h(q) = - h( - ‘I) for q in R’ Conditional symmetry implies
Trang 12for all u(.) and h(.) such that the integral in (30) exists So all functions of x are orthogonal to all odd functions of u - y The functions h(u - y) = u - y and h(u - y) = w(u - y) are odd Thus, the orthogonality conditions (23) and (27) that follow from mean and median independence are satisfied given conditional symmetry
3.4 Variance independence
One may believe that u not only has the same central tendency for each realization
of x but also the same spread The usual econometric practice has been to express spread by variance In the presence of mean independence, variance independence (homoskedasticity) is the additional condition
(31)
Here yi is the common mean of the distributions P,[<, VEX and y2 is the common variance matrix
Let u(.) be any real function on
solves the orthogonality condition
r W{ CU,(Z~ b) - Yll Cu& w -
X It follows from (16) and (31) that (b,y,,y,)
(32)
J
The assertion of variance independence imposes no restrictions on the variance matrix yZ In some applications, information about y2 is available For example, it may be known that the components of u are uncorrelated with one another, so y2
is a diagonal matrix Such information may be expressed by appropriately restric- ting the space of possible values of yZ
Let s(.) map U into a real vector space Let y be the unconditional mean of s(u)