Handbook of Econometrics Vols1-5 _ Chapter 43 ppsx

A widely applicable approach is to estimate the parameter by a sample analog; that is, by a statistic having the same properties in the sample as the parameter does in the population.. T

Trang 1

4.3 Mean independent linear models

4.4 Quantile independent monotone models

Estimation of general separable and response models

5.1 Closest-empirical-distribution estimation of separable models

5.2 Minimum-distance estimation of response models

*I am grateful for the comments of Rosa Matzkin and Jim Powell

Handbook of Econometrics, Volume IV, Edited by R.F Engle and D.L McFadden

Trang 2

Abstract

Suppose that one wants to estimate a parameter characterizing some feature of a specified population One has some prior information about the population and a random sample of observations A widely applicable approach is to estimate the parameter by a sample analog; that is, by a statistic having the same properties in the sample as the parameter does in the population If there is no such statistic, then one may choose an estimate that, in some well-defined sense, makes the known properties of the population hold as closely as possible in the sample These are analog estimation methods This chapter surveys some uses of analog methods to estimate two classes ofeconometric models, the separable and the response models

Familiar examples include use of the sample average to estimate the population mean and sample quantiles to estimate population quantiles The classical method

of moments (Pearson (1894)) is an analog approach, as is minimum chi-square estimation (Neyman (1949)) Maximum likelihood, least squares and least absolute deviations estimation are analog methods

This chapter surveys some uses of analog methods to estimate econometric models Section 2 presents the necessary preliminaries, defining the analogy principle, moment problems and the method of moments, and two classes of models, the separable and the response models Sections 3 and 4 describe the variety of separable and response models that imply moment problems and may be estimated

by the method of moments Section 5 discusses two more general analog estimation approaches: closest empirical distribution estimation of separable models and minimum distance estimation of response models Section 6 gives conclusions The reader wishing a more thorough treatment of much of the material in this chapter should see Manski (1988)

The analogy principle is used here to estimate population parameters Other chapters of this handbook exploit related ideas for other purposes The chapter

by Hall describes bootstrap methods, which apply the analogy principle to approxi-

Trang 3

Ch 43: Analoy Estimation of Econometric Models

mate the distribution of sample statistics The chapter by Hajivassiliou and Ruud describes simulation methods, which use the analogy between an observed sample and a pseudo-sample from the same population, drawn at postulated parameter values

2 Preliminaries

2.1 The analogy principle

Assume that a probability distribution P on a sample space 2 characterizes a population One observes a sample of N independent realizations of a random variable z distributed P One knows that P is a member of some family 17 of prob-

ability distributions on 2 One also knows that a parameter b in a parameter space B solves an equation

where T(., *) is a given function mapping 17 x B into some vector space Y The

problem is to combine the sample data with the knowledge that DEB, PEJI and T(P, b) = 0 so as to estimate b

Many econometric models imply that a parameter solves an extremum problem rather than an equation We can use (1) to express extremum problems by saying that b solves

b - argmin W(P, c) = 0

Here W(., ) is a given function mapping 17 x B into the real line

Let P, be the empirical distribution of the sample of N draws from P That is,

P, is the multinomial probability distribution that places probability l/N on each

of the N observations of z The group of theorems collectively referred to as the laws of large numbers show that P, converges to P in various senses as N ) co

This suggests that to estimate b one might substitute the function T(P,, ) for T(P, ) and use

This defines the analog estimate when P, is a feasible value for P; that is, when P,EIZ In these cases T(P,;) is well-defined and has at least one zero in B, so B,

is the (possibly set-valued) analog estimate of b

Equation (3) does not explain how to proceed when P,#I7 We have so far

defined T(.;) only on the space ZZ x B of feasible population distributions and parameter values The function T(P,, ) is as yet undefined for P,#I7

Trang 4

Let @ denote the space of all multinomial distributions on Z To define T(P,, ) for every sample size and all sample realizations, it suffices to extend T(., ) from

17 x B to the domain (n u @) x B Two approaches have proved useful in practice

Mapping P, into 17 One approach is to map P, into 17 Select a function

rc(.): Hu @ + 17 which maps every member of 17 into itself Now replace the equation T(P, b) = 0 with

This substitution leaves the estimation problem unchanged as T[rr(Q), ] = r(Q, ) for all Q~17 Moreover, n(P,)~17; so T[rc(P,);] is defined and has a zero in B The analogy principle applied to (4) yields the estimate

When P,EII, this estimate is the same as the one defined in equation (3) When

P,$II, the estimate (5) depends on the selected function rc(.); hence we write B,, rather than B,

A prominent example of this approach is kernel estimation of Lebesgue density functions Let n be the space of distributions having Lebesgue densities The empirical distribution P, is multinomial and so is not in 17 But P, can be smoothed

so as to yield a distribution that is in 17 In particular, the convolution of P, with

any element of 17 is itself an element of 17 The density of the convolution is a kernel density estimate See Manski (1988), Chapter 2

Direct extension Sometimes there is a natural direct way to extend the domain

of T(., ), so T(P,, ) is well-defined Whenever T(P,, ) has a zero in B, equation (3) gives the analog estimate If P, is not in l7, it may be that T(P,, c) # 0 for all

CEB Then the analogy principle suggests selection of an estimate that makes

T(P,, ) as close as possible to zero in some sense

To put this idea into practice, select an origin-preserving function r(.) which

maps values of T(., ) into the non-negative real half line That is, let I(.): Y + [0, co),

with T = O-r(T) = 0 Now replace the equation T(P, b) = 0 with the extremum

problem

min r[ T(P, c)]

This substitution leaves the estimation problem unchanged as T(Q,c) = 00

r[T(Q, c)] = 0 for (Q, c)~17 x B To estimate b, solve the sample analog of (6) Provided only that r[T(P,;)] attains its minimum on B, the analog estimate is

B,, = argmin r[T(P,, c)]

Trang 5

Ch 43: Analog Estimation of Econometric Models 2563

If P,~17, this estimate is the same as the one defined in (3) If PN$17 but T(P,, ) has a zero in B, the estimate remains as in (3) If T(P,;) is everywhere non-zero, the estimate depends on the selected function r(.); hence we write B,, rather than B, Section 2.2 describes an extraordinarily useful application of this approach, the method of moments

2.2 Moment problems

Much of present-day econometrics is concerned with estimation of a parameter b

solving an equation of the form

In (8), g(., ) is a given function mapping 2 x B into a real vector space In (9), h(., )

is a given function mapping 2 x B into the real line Numerous prominent examples

of (8) and (9) will be given in Sections 3 and 4 respectively

When PN~n, application of the analogy principle to (8) and (9) yields the estimates

It remains only to consider the possibility that the estimates may not exist In applications, s h(z, ) dP, generally has a minimum On the other hand, jg(z, *)dP, often has no zero In that case, one may select an origin-preserving transformation

r(.) and replace (8) with the problem of minimizing r[J g(z, ) dP], as was done in (6)

Trang 6

Minimizing the sample apalog yields

(12)

Estimation problems relating b to P by (8) or (9) are called moment problems Estimates of the forms (lo), (1 l), and (12) are method-of-moments estimates Use of the term “moment” rather than the equally descriptive “expectation,” “mean,” or

“integral” honors the early work of K Pearson on the method of moments

Consistency of method-of-moments estimates Clearly, consistent estimation of b

requires that the asserted moment problem has a unique solution; that is, b must

be identified If no solution exists, the estimation problem has been misspecified and

b is not defined If there are multiple solutions, sample data cannot possibly distinguish between them There is no general approach for determining the number

of solutions to equation systems of the form (8) or to extremum problems of the form (9) One must proceed more or less case-by-case

Given identification, method-of-moments estimates are consistent if the estimation problem is sufficiently regular Rigorous treatments appear in such econometrics texts as Amemiya (1985), Gallant (1987) and Manski (1988) I provide here

an heuristic explanation focussing on (12); case (11) involves no additional considerations

We are concerned with the behavior of the function r[Jg(z, *)dP,] as N + co The strong law of large numbers implies that for all CEB, s g(z, c) dP, + j g(z, c) dP

as N + co, almbst surely The convergence is uniform on B if the parameter space

is sufficiently small, the function g(.;) sufficiently smooth, and the distribution P sufficiently well-behaved (For example, it suffices for B to be a compact finite- dimensional set, for J (g(z, *) 1 dP to be bounded by an integrable function D(z), and for g(z;) to be continuous on B See Manski (1988), Chapter 7.) If the convergence

is uniform and I( ) is smooth, then as N + cc the minima on B of I [ J g(z, ) dP,] tend to occur increasingly near the minima of r[Jg(.z, ) dP] The unique minimum

of r[Jg(z, ) dP] occurs at b So the estimate B,, converges to b

Uniform convergence on B of Jg(z, ) dP, to Jg(z, )dP is close to a necessary condition for consistency of method-of-moments estimates If this condition is seriously violated, s g(z, ) dP, is not a good sample analog to jg(z, ) dP and the estimation approach does not work Beginning in the 1930s with the Glivenko- Cantelli Theorem, statisticians and econometricians have steadily broadened the range of specifications of B, g(., *) and P for which uniform laws of large numbers have been shown to hold (e.g Pollard (1984) and Andrews (1987)) Nevertheless, uniformity does break down in situations that are far from pathological Perhaps the most important practical concern is the size of the parameter space Given a specification for g(., ) and for P, uniformity becomes a more demanding property

as B becomes larger

Trang 7

Sampling distributions The exact sampling distributions of method-of-moments estimates are generally complicated Hence the practice is to invoke local asymptotic approximations If the parameter space is finite-dimensional and the estimation problem is sufficiently regular, a method-of-moments estimate B,, converges at rate l/,,&andfl(B,,-b)h as a limiting normal distribution centered at zero Alter- native estimates of a given parameter may have limiting distributions with different variances This fact suggests use of the variance of the limiting distribution as a criterion for measuring precision

Comparison of the precision of alternative estimators has long engaged the attention of econometric theorists An estimate is termed asymptotic efficient if the variance of the limiting normal distribution of @(B,, - b) is the smallest possible

given the available prior information Hansen (1982) and Chamberlain (1987) provide the central findings on the efficiency of method-of-moments estimates For

an exposition, see Manski (1988), Chapters 8 and 9

Non-random sampling In discussing moment problems and estimation problems

more generally, I have assumed that the data are a random sample It is important

to understand that random sampling, albeit a useful simplifying idea, is not essential

to the success of analog estimation The essential requirement is that the sampling process be such that relevant features of the empirical distribution converge to corresponding population features

For example, consider stationary time series problems Here the data are observations at N dates from a single realization of a stationary stochastic process whose marginal distribution is P So we do not have a random sample from P

Nevertheless, dependent sampling versions of the laws of large numbers show that

P, converges to P in various senses as N -+ co

2.3 Econometric models

We have been discussing an estimation problem relating a parameter b to a probabi-

lity distribution P generating realizations of an observable random variable z Eco-

nometric models typically relate a parameter b to realizations of the observable z and of an unobservable random variable, say u Analog estimation methods may

be used to estimate b if one can transform the econometric model into a representa-

tion relating b to P and to nuisance parameters

Formally, suppose that a probability distribution P,, on a space Z x U charac-

terizes a population A random sample of N realizations of a random variable (z, u) distributed P,, is drawn and one observes the realizations of z but not of U One knows that P,, is a member of some family D,, of probability distributions on

Z x U One also knows that a parameter b in a parameter space B solves an equation

Trang 8

where f(., , ) maps Z x U x B into some vector space Equation (13) is to be interpreted as saying that almost every realization (i, q) of (z, U) satisfies the equation f(i, yl, b) = 0

Equation (13) typically has no content in the absence of information on the probability distribution P,, generating (z, u) A meaningful model combines (13) with some distributional knowledge The practice has been to impose restrictions

on the probability distribution of u conditional on some function of z, say x = x(z) taking values in a space X Let P,lx denote this conditional distribution Then a model is defined by equation (13) and by a restriction on the conditional distributions (P,l5,5EX)

Essentially all econometric research has specified f to have one of two forms A separable model makes the unobserved variable u additively separable, so that

where u,(.;) maps Z x B into U A response model defines z = (y, x), Z = Y x X, and makes f have the form

3 Method-of-moments estimation of separable models

Separable models suppose that realizations of (z, U) are related to the parameter b

through an equation

In the absence of information restricting the distribution of the unobserved U, this equation simply defines u and conveys no information about b In the presence of

various distributional restrictions (16) implies that b and a nuisance parameter solve

a type of moment equation known as an orthogonality condition, defined here

Orthogonality conditions Let x = x(z) take values in a real vector space X Let r

denote a space in which a nuisance parameter y lives Let e(*, ) be a function mapping U x r into a real vector space Let e(.;)’ denote the transpose of the

Trang 9

Ch 43: Analog Estimation of Econometric Models 2561

column vector e(., ) The random vectors x and e(u, y) are orthogonal if

Equation (17) relates the observed random variable x to the unobserved random variable u Suppose that (16) holds Then we can replace u in (17) with u,(z, b), yielding

This orthogonality condition is a moment equation relating the parameters (b, y) to

the distribution P of the observable z

It is not easy to motivate orthogonality conditions directly, but we can readily show that these conditions are implied by various more transparent distributional restrictions The remainder of this section describes the leading cases

Most authors incorporate the nuisance parameter y into the specification of u,(., )

by giving that function a free intercept This done, u is declared to have mean zero and equation (19) is rewritten as

Trang 10

Mean independence implies zero covariance but

covariance in the absence of mean independence

iterated expectation

it is difficult to motivate zero

To see why, rewrite (19) as the

u are unrelated in the sense of mean independence

Mean independence implies orthogonality conditions beyond (19) Let u(.) be any function mapping X into a real vector space It follows from (16) and (21) that

s

v(x) [u,(z, h) - y]’ dP =

s [S v(x) (u - y)’ dP, 1 x 1 dP, = 0, (23)

provided only that the integral in (23) exists So the random variables u(x) and u,(z, b) are uncorrelated In other words, all functions ofx are instrumental variables

3.2 Median independence

The assertion that u is mean independent of x expresses a belief that u has the same central tendency conditional on each realization of x Median independence offers another way to express this belief Median independence alone does not imply an orthogonality condition, but it does when the conditional distributions P,l& VEX

are componentwise continuous

Let U be the real line; the vector case introduces no new considerations as we shall deal with u componentwise For each 5 in X, let mg be the median of u

conditional on the event [x = 41 Let y be the unconditional median of U We say that u is median independent of x if

It can be shown (see Manski (1988), Chapter 4) that if P,/& [EX are continuous

probability distributions, their medians solve the conditional moment equations

s

Trang 11

Ch 43: Analog Estimation of Econometric Models 2569

So median independence and continuity together imply that

s sgn(u - Y)dP,It = 0, VEX

It follows from (16) and (26) that

3.3 Conditional symmetry

Mean and median independence both express a belief that the central tendency of

u does not vary with x Yet they are different assertions This fact may cause the applied researcher some discomfort One often feels at ease saying that the central tendency of u does not vary with x But only occasionally can one pinpoint the mathematical sense in ulhich the term “central tendency” should be interpreted The need for care in defining central tendency disappears if the conditional distributions P,l<, {EX are componentwise symmetric with common center of symmetry Let U be the real line again and assume that for all realizations of x, the conditional distribution of u is symmetric around some point y That is,

Let II(.) be any odd function mapping the real line into a real vector space; that is,

h(q) = - h( - ‘I) for q in R’ Conditional symmetry implies

Trang 12

for all u(.) and h(.) such that the integral in (30) exists So all functions of x are orthogonal to all odd functions of u - y The functions h(u - y) = u - y and h(u - y) = w(u - y) are odd Thus, the orthogonality conditions (23) and (27) that follow from mean and median independence are satisfied given conditional symmetry

3.4 Variance independence

One may believe that u not only has the same central tendency for each realization

of x but also the same spread The usual econometric practice has been to express spread by variance In the presence of mean independence, variance independence (homoskedasticity) is the additional condition

(31)

Here yi is the common mean of the distributions P,[<, VEX and y2 is the common variance matrix

Let u(.) be any real function on

solves the orthogonality condition

r W{ CU,(Z~ b) - Yll Cu& w -

X It follows from (16) and (31) that (b,y,,y,)

(32)

J

The assertion of variance independence imposes no restrictions on the variance matrix yZ In some applications, information about y2 is available For example, it may be known that the components of u are uncorrelated with one another, so y2

is a diagonal matrix Such information may be expressed by appropriately restricting the space of possible values of yZ

Let s(.) map U into a real vector space Let y be the unconditional mean of s(u)

Định dạng
Số trang	24
Dung lượng	1,21 MB