Book Econometric Analysis of Cross Section and Panel Data By Wooldridge - Chapter 13 pdf

All the models we considered in Part I could be estimatedwithout making full distributional assumptions about the endogenous variablesconditional on the exogenous variables: maximum like

Trang 1

13.1 Introduction

This chapter contains a general treatment of maximum likelihood estimation (MLE)under random sampling All the models we considered in Part I could be estimatedwithout making full distributional assumptions about the endogenous variablesconditional on the exogenous variables: maximum likelihood methods were notneeded Instead, we focused primarily on zero-covariance and zero-conditional-meanassumptions, and secondarily on assumptions about conditional variances and co-variances These assumptions were su‰cient for obtaining consistent, asymptoticallynormal estimators, some of which were shown to be e‰cient within certain classes ofestimators

Some texts on advanced econometrics take maximum likelihood estimation as theunifying theme, and then most models are estimated by maximum likelihood In ad-dition to providing a uniﬁed approach to estimation, MLE has some desirable e‰-ciency properties: it is generally the most e‰cient estimation procedure in the class ofestimators that use information on the distribution of the endogenous variables giventhe exogenous variables (We formalize the e‰ciency of MLE in Section 14.5.) Sowhy not always use MLE?

As we saw in Part I, e‰ciency usually comes at the price of nonrobustness, and this

is certainly the case for maximum likelihood Maximum likelihood estimators aregenerally inconsistent if some part of the speciﬁed distribution is misspeciﬁed As anexample, consider from Section 9.5 a simultaneous equations model that is linear inits parameters but nonlinear in some endogenous variables There, we discussed esti-mation by instrumental variables methods We could estimate SEMs nonlinear inendogenous variables by maximum likelihood if we assumed independence betweenthe structural errors and the exogenous variables and if we assumed a particular dis-tribution for the structural errors, say, multivariate normal The MLE would beasymptotically more e‰cient than the best GMM estimator, but failure of normalitygenerally results in inconsistent estimators of all parameters

As a second example, suppose we wish to estimate Eð y j xÞ, where y is boundedbetween zero and one The logistic function, expðxb Þ=½1 þ expðxb Þ, is a reasonablemodel for Eð y j xÞ, and, as we discussed in Section 12.2, nonlinear least squaresprovides consistent, ffiffiffiffiffi

N

p-asymptotically normal estimators under weak regularityconditions We can easily make inference robust to arbitrary heteroskedasticity inVarð y j xÞ An alternative approach is to model the density of y given x—which, ofcourse, implies a particular model for Eð y j xÞ—and use maximum likelihood esti-mation As we will see, the strength of MLE is that, under correct speciﬁcation of the

Trang 2

density, we would have the asymptotically e‰cient estimators, and we would be able

to estimate any feature of the conditional distribution, such as Pð y ¼ 1 j xÞ Thedrawback is that, except in special cases, if we have misspeciﬁed the density in anyway, we will not be able to consistently estimate the conditional mean

In most applications, specifying the distribution of the endogenous variables ditional on exogenous variables must have a component of arbitrariness, as economictheory rarely provides guidance Our perspective is that, for robustness reasons, it

con-is desirable to make as few assumptions as possible—at least until relaxing thembecomes practically di‰cult There are cases in which MLE turns out to be robust tofailure of certain assumptions, but these must be examined on a case-by-case basis, aprocess that detracts from the unifying theme provided by the MLE approach (Onesuch example is nonlinear regression under a homoskedastic normal assumption; the

MLE of the parameters bois identical to the NLS estimator, and we know the latter

is consistent and asymptotically normal quite generally We will cover some otherleading cases in Chapter 19.)

Maximum likelihood plays an important role in modern econometric analysis, forgood reason There are many problems for which it is indispensable For example, inChapters 15 and 16 we study various limited dependent variable models, and MLEplays a central role

13.2 Preliminaries and Examples

Traditional maximum likelihood theory for independent, identically distributedobservations fyiARG: i¼ 1; 2; g starts by specifying a family of densities for yi.This is the framework used in introductory statistics courses, where yiis a scalar with

a normal or Poisson distribution But in almost all economic applications, we areinterested in estimating parameters in conditional distributions Therefore, we assumethat each random draw is partitioned asðxi; yiÞ, where xiARKand yiARG, and weare interested in estimating a model for the conditional distribution of yigiven xi Weare not interested in the distribution of xi, so we will not specify a model for it.Consequently, the method of this chapter is properly called conditional maximumlikelihood estimation (CMLE) By taking xito be null we cover unconditional MLE

as a special case

An alternative to viewingðxi; yiÞ as a random draw from the population is to treatthe conditioning variables xias nonrandom vectors that are set ahead of time and thatappear in the unconditional distribution of yi (This is analogous to the ﬁxed regres-sor assumption in classical regression analysis.) Then, the yi cannot be identicallydistributed, and this fact complicates the asymptotic analysis More importantly,

Trang 3

treating the xias nonrandom is much too restrictive for all uses of maximum hood In fact, later on we will cover methods where xicontains what are endogenousvariables in a structural model, but where it is convenient to obtain the distribution ofone set of endogenous variables conditional on another set Once we know how toanalyze the general CMLE case, applications follow fairly directly.

likeli-It is important to understand that the subsequent results apply any time we haverandom sampling in the cross section dimension Thus, the general theory applies tosystem estimation, as in Chapters 7 and 9, provided we are willing to assume a dis-tribution for yigiven xi In addition, panel data settings with large cross sections andrelatively small time periods are encompassed, since the appropriate asymptoticanalysis is with the time dimension ﬁxed and the cross section dimension tending toinﬁnity

In order to perform maximum likelihood analysis we need to specify, or derivefrom an underlying (structural) model, the density of yi given xi We assume thisdensity is known up to a ﬁnite number of unknown parameters, with the result that

we have a parametric model of a conditional density The vector yican be continuous

or discrete, or it can have both discrete and continuous characteristics In many ofour applications, yiis a scalar, but this fact does not simplify the general treatment

We will carry along two examples in this chapter to illustrate the general theory ofconditional maximum likelihood The ﬁrst example is a binary response model, spe-ciﬁcally the probit model We postpone the uses and interepretation of binary responsemodels until Chapter 15

Example 13.1 (Probit): Suppose that the latent variable yifollows

where eiis independent of xi(which is a 1 K vector with ﬁrst element equal to unity

for all i), y is a K 1 vector of parameters, and ei@ Normal(0,1) Instead of

Trang 4

We can easily obtain the distribution of yi given xi:

Pð yi¼ 1 j xiÞ ¼ Pð yi>0j xiÞ ¼ Pðxiyþ ei>0j xiÞ

¼ Pðei>xiyj xiÞ ¼ 1 FðxiyÞ ¼ FðxiyÞ ð13:4Þwhere FðÞ denotes the standard normal cumulative distribution function (cdf ) Wehave used Property CD.4 in the chapter appendix along with the symmetry of thenormal distribution Therefore,

We can combine equations (13.4) and (13.5) into the density of yi given xi:

fð y j xiÞ ¼ ½FðxiyÞy½1 FðxiyÞ1y; y¼ 0; 1 ð13:6ÞThe fact that fð y j xiÞ is zero when y B f0; 1g is obvious, so we will not be explicit

about this in the future

Our second example is useful when the variable to be explained takes on negative integer values Such a variable is called a count variable We will discuss theuse and interpretation of count data models in Chapter 19 For now, it su‰ces tonote that a linear model for Eð y j xÞ when y takes on nonnegative integer values isnot ideal because it can lead to negative predicted values Further, since y can take onthe value zero with positive probability, the transformation logð yÞ cannot be used toobtain a model with constant elasticities or constant semielasticities A functionalform well suited for Eð y j xÞ is expðxy Þ We could estimate y by using nonlinear leastsquares, but all of the standard distributions for count variables imply hetero-skedasticity (see Chapter 19) Thus, we can hope to do better A traditional approach

non-to regression models with count data is non-to assume that yi given xi has a Poissondistribution

Example 13.2 (Poisson Regression): Let yibe a nonnegative count variable; that is,

yi can take on integer values 0; 1; 2; : Denote the conditional mean of yigiven thevector xi as Eð yij xiÞ ¼ mðxiÞ A natural distribution for yi given xi is the Poissondistribution:

fð y j xiÞ ¼ exp½mðxiÞfmðxiÞgy= y!; y¼ 0; 1; 2; ð13:7Þ(We use y as the dummy argument in the density, not to be confused with the randomvariable yi.) Once we choose a form for the conditional mean function, we havecompletely determined the distribution of yi given xi For example, from equation(13.7), Pð y ¼ 0 j xÞ ¼ exp½mðxÞ An important feature of the Poisson distribu-

Trang 5

tion is that the variance equals the mean: Varð yij xiÞ ¼ Eð yij xiÞ ¼ mðxiÞ The usualchoice for mðÞ is mðxÞ ¼ expðxy Þ, where y is K 1 and x is 1 K with ﬁrst elementunity.

13.3 General Framework for Conditional MLE

Let poðy j xÞ denote the conditional density of yi given xi¼ x, where y and x aredummy arguments We index this density by ‘‘o’’ to emphasize that it is the truedensity of yi given xi, and not just one of many candidates It will be useful to let

X H RK denote the possible values for xi and Y denote the possible values of yi; Xand Y are called the supports of the random vectors xiand yi, respectively

For a general treatment, we assume that, for all x A X, poð j xÞ is a density withrespect to a s-finite measure, denoted nðdyÞ Defining a s-finite measure would take

us too far aﬁeld We will say little more about the measure nðdyÞ because it doesnot play a crucial role in applications It su‰ces to know that nðdyÞ can be chosen toallow yito be discrete, continuous, or some mixture of the two When yi is discrete,the measure nðdyÞ simply turns all integrals into sums; when yi is purely continuous,

we obtain the usual Riemann integrals Even in more complicated cases—where, say,

yi has both discrete and continuous characteristics—we can get by with tools frombasic probability without ever explicitly deﬁning nðdyÞ For more on measures andgeneral integrals, you are referred to Billingsley (1979) and Davidson (1994, Chapters

3 and 4)

In Chapter 12 we saw how nonlinear least squares can be motivated by the factthat moðxÞ 1 Eð y j xÞ minimizes Ef½ y mðxÞ2g for all other functions mðxÞ withEf½mðxÞ2g < y Conditional maximum likelihood has a similar motivation The

result from probability that is crucial for applying the analogy principle is the ditional Kullback-Leibler information inequality Although there are more generalstatements of this inequality, the following su‰ces for our purpose: for any non-negative function fð j xÞ such that

con-ð

Y

Property CD.1 in the chapter appendix implies that

Trang 6

We can apply inequality (13.9) to a parametric model for poð j xÞ,

which we assume satisﬁes condition (13.8) for each x A X and each y A Y; if it does

not, then fð j x; y Þ does not integrate to unity (with respect to the measure n), and as

a result it is a very poor candidate for poðy j xÞ Model (13.10) is a correctly speciﬁedmodel of the conditional density, poð j Þ, if, for some yoAY,

As we discussed in Chapter 12, it is useful to use yoto distinguish the true value of theparameter from a generic element of Y In particular examples, we will not bothermaking this distinction unless it is needed to make a point

For each x A X, Kð f ; xÞ can be written as Eflog½ poðyij xiÞ j xi¼ xg Eflog½ f ðyij xiÞ j xi¼ xg Therefore, if the parametric model is correctly speciﬁed,then Eflog½ f ðyij xi;yoÞ j xig b Eflog½ f ðyij xi;yÞ j xig, or

E½liðyoÞ j xi b E½liðy Þ j xi; y A Y ð13:12Þwhere

liðy Þ 1 lðyi; xi;y Þ 1 log f ðyij xi;yÞ ð13:13Þ

is the conditional log likelihood for observation i Note that liðy Þ is a random function

of y, since it depends on the random vectorðxi; yiÞ By taking the expected value of

expression (13.12) and using iterated expectations, we see that yosolves

A solution to problem (13.15), assuming that one exists, is the conditional maximum

likelihood estimator (CMLE) of yo, which we denote as ^y y We will sometimes drop

‘‘conditional’’ when it is not needed for clarity

The CMLE is clearly an M-estimator, since a maximization problem is easilyturned into a minimization problem: in the notation of Chapter 12, take wi1ðxi; yiÞand qðwi;y Þ 1 log f ðyij xi;yÞ As long as we keep track of the minus sign in front

of the log likelihood, we can apply the results in Chapter 12 directly

Trang 7

The motivation for the conditional MLE as a solution to problem (13.15) mayappear backward if you learned about maximum likelihood estimation in an intro-ductory statistics course In a traditional framework, we would treat the xi as con-stants appearing in the distribution of yi, and we would deﬁne ^y y as the solution to

mator of yoare necessarily heuristic By contrast, the analogy principle applies directly

to problem (13.15), and we need not assume that the xiare ﬁxed

In our two examples, the conditional log likelihoods are fairly simple

Example 13.1 (continued): In the probit example, the log likelihood for observation

i is liðy Þ ¼ yilog FðxiyÞ þ ð1 yiÞ log½1 FðxiyÞ

Example 13.2 (continued): In the Poisson example, liðy Þ ¼ expðxiyÞ þ yixiylogð yi!Þ Normally, we would drop the last term in deﬁning liðy Þ because it does not

a¤ect the maximization problem

13.4 Consistency of Conditional MLE

In this section we state a formal consistency result for the CMLE, which is a specialcase of the M-estimator consistency result Theorem 12.2

theorem13.1 (Consistency of CMLE): Letfðxi; yiÞ: i ¼ 1; 2; g be a random ple with xiA X H RK, yiA Y H RG Let Y H RPbe the parameter set and denote theparametric model of the conditional density as f f ð j x; y Þ: x A X; y A Yg Assume

sam-that (a) fð j x; y Þ is a true density with respect to the measure nðdyÞ for all x and y, so that condition (13.8) holds; (b) for some yoAY, poð j xÞ ¼ f ð j x; yoÞ, all x A X, and

yois the unique solution to problem (13.14); (c) Y is a compact set; (d) for each y A Y, lð ; y Þ is a Borel measurable function on Y X; (e) for each ðy; xÞ A Y X, lðy; x; Þ

is a continuous function on Y; and (f )jlðw; y Þj a bðwÞ, all y A Y, and E½bðwÞ < y.

Then there exists a solution to problem (13.15), the CMLE ^y y, and plim ^ y ¼ yo

As we discussed in Chapter 12, the measurability assumption in part d is purelytechnical and does not need to be checked in practice Compactness of Y can be

Trang 8

relaxed, but doing so usually requires considerable work The continuity assumptionholds in most econometric applications, but there are cases where it fails, such aswhen estimating certain models of auctions—see Donald and Paarsch (1996) Themoment assumption in part f typically restricts the distribution of xiin some way, butsuch restrictions are rarely a serious concern For the most part, the key assumptions

are that the parametric model is correctly speciﬁed, that yois identiﬁed, and that the

log-likelihood function is continuous in y.

For the probit and Poisson examples, the log likelihoods are clearly continuous in

y We can verify the moment condition (f ) if we bound certain moments of xi andmake the parameter space compact But our primary concern is that densities arecorrectly speciﬁed For example, in the probit case, the density for yigiven xi will beincorrect if the latent error eiis not independent of xiand normally distributed, or ifthe latent variable model is not linear to begin with For identiﬁcation we must ruleout perfect collinearity in xi The Poisson CMLE turns out to have desirable prop-erties even if the Poisson distributional assumption does not hold, but we postpone adiscussion of the robustness of the Poisson CMLE until Chapter 19

13.5 Asymptotic Normality and Asymptotic Variance Estimation

Under the di¤erentiability and moment assumptions that allow us to apply the orems in Chapter 12, we can show that the MLE is generally asymptotically normal.Naturally, the computational methods discussed in Section 12.7, including concen-trating parameters out of the log likelihood, apply directly

the-13.5.1 Asymptotic Normality

We can derive the limiting distribution of the MLE by applying Theorem 12.3 We

will have to assume the regularity conditions there; in particular, we assume that yois

in the interior of Y, and liðy Þ is twice continuously di¤erentiable on the interior of Y.

The score of the log likelihood for observation i is simply

Trang 9

Example 13.2 (continued): The score for the Poisson case, where y is again K 1, is

siðy Þ ¼ expðxiyÞx0iþ yixi0¼ x0i½ yi expðxiyÞ ð13:19Þ

In the vast majority of cases, the score of the log-likelihood function has an portant zero conditional mean property:

In other words, when we evaluate the P 1 score at yo, and take its expectation withrespect to fð j xi;yoÞ, the expectation is zero Under condition (13.20), E½siðyoÞ ¼ 0,which was a key condition in deriving the asymptotic normality of the M-estimator

sðy; xi;yÞf ðy j xi;yÞnðdyÞ

If integration and di¤erentation can be interchanged on intðYÞ—that is, if

Y

‘yfðy j xi;yÞnðdyÞ ð13:21Þfor all xiA X, y A intðYÞ—then

Trang 10

Example 13.2 (continued): Deﬁne ui1yi expðxiyoÞ Then siðyoÞ ¼ x0

iui and soE½siðyoÞ j xi ¼ 0

Assuming that liðy Þ is twice continuously di¤erentiable on the interior of Y, let

the Hessian for observation i be the P P matrix of second partial derivatives of

which is generally a positive definite matrix when yo is identified Under standardregularity conditions, the asymptotic normality of the CMLE follows from Theorem12.3: ffiffiffiffiffi

N

p

ð ^y yoÞ @a Normalð0; A1o BoA1o Þ, where Bo1Var½siðyoÞ 1 E½siðyoÞsiðyoÞ0

It turns out that this general form of the asymptotic variance matrix is too cated We now show that Bo¼ Ao

compli-We must assume enough smoothness such that the following interchange of gral and derivative is valid (see Newey and McFadden, 1994, Section 5.1, for the case

Y

‘y½siðy Þf ðy j xi;yÞnðdyÞ ð13:25Þ

Then, taking the derivative of the identity

ð

Y

siðy Þf ðy j xi;y ÞnðdyÞ 1 Ey½siðy Þ j xi ¼ 0; y A intðYÞ

and using equation (13.25), gives, for all y A intðYÞ,

Ey½Hiðy Þ j xi ¼ Vary½siðy Þ j xi

where the indexing by y denotes expectation and variance when fð j xi;yÞ is thedensity of yigiven xi When evaluated at y ¼ yowe get a very important equality:

E½HiðyoÞ j xi ¼ E½siðyoÞsiðyoÞ0j xi ð13:26Þwhere the expectation and variance are with respect to the true conditional distri-bution of yi given xi Equation (13.26) is called the conditional information matrixequality (CIME) Taking the expectation of equation (13.26) (with respect to the

Trang 11

distribution of xi) and using the law of iterated expectations gives

or Ao¼ Bo This relationship is best thought of as the unconditional informationmatrix equality (UIME)

theorem 13.2 (Asymptotic Normality of CMLE): Let the conditions of Theorem

13.1 hold In addition, assume that (a) yoAintðYÞ; (b) for each ðy; xÞ A Y X,lðy; x; Þ is twice continuously di¤erentiable on intðYÞ; (c) the interchanges of de-

rivative and integral in equations (13.21) and (13.25) hold for all y A intðYÞ; (d)

the elements of ‘y2lðy; x; yÞ are bounded in absolute value by a function bðy; xÞwith finite expectation; and (e) Ao defined by expression (13.24) is positive definite.Then

deriva-at the rderiva-ate ffiffiffiffiffi

N

p

Some progress has been made for speciﬁc models when the support

of the distribution depends on unknown parameters; see, for example, Donald andPaarsch (1996)

13.5.2 Estimating the Asymptotic Variance

Estimating Avarð ^yÞ requires estimating Ao From the equalities derived previously,there are at least three possible estimators of Aoin the CMLE context In fact, underslight extensions of the regularity conditions in Theorem 13.2, each of the matrices

Aðxi;yoÞ 1 E½Hðyi; xi;yoÞ j xi ð13:31Þ

Trang 12

Thus, Avaˆrð ^yÞ can be taken to be any of the three matrices

of some linear combinations of the parameters will not be well deﬁned

The second estimator in equation (13.32), based on the outer product of the score,

is always positive deﬁnite (whenever the inverse exists) This simple estimator wasproposed by Berndt, Hall, Hall, and Hausman (1974) Its primary drawback is that itcan be poorly behaved in even moderate sample sizes, as we discussed in Section12.6.2

If the conditional expectation Aðxi;yoÞ is in closed form (as it is in some leadingcases) or can be simulated—as discussed in Porter (1999)—then the estimator based

on Aðxi; ^yÞ has some attractive features First, it often depends only on first tives of a conditional mean or conditional variance function Second, it is positivedefinite when it exists because of the conditional information matrix equality (13.26).Third, this estimator has been found to have significantly better finite sample prop-erties than the outer product of the score estimator in some situations where Aðxi;yoÞcan be obtained in closed form

deriva-Example 13.1 (continued): The Hessian for the probit log-likelihood is a mess.Fortunately, E½HiðyoÞ j xi has a fairly simple form Taking the derivative of equation(13.18) and using the product rule gives

Trang 13

ixi In this example, the Hessian does not depend on yi,

so there is no distinction between HiðyoÞ and E½HiðyoÞ j xi The positive deﬁnite timate of Avaˆrð ^yÞ is simply

The three tests covered in Chapter 12 are immediately applicable to the MLE case.Since the information matrix equality holds when the density is correctly speciﬁed, weneed only consider the simplest forms of the test statistics The Wald statistic is given

in equation (12.63), and the conditions su‰cient for it to have a limiting chi-squaredistribution are discussed in Section 12.6.1

Deﬁne the log-likelihood function for the entire sample by Lðy Þ 1PN

i¼1liðy Þ Let

^

y be the unrestricted estimator, and let ~ y y be the estimator with the Q nonredundant

constraints imposed Then, under the regularity conditions discussed in Section12.6.3, the likelihood ratio (LR) statistic,

is distributed asymptotically as wQ2 under H0 As with the Wald statistic, we cannotuse LR as approximately w2

Q when yo is on the boundary of the parameter set The

LR statistic is very easy to compute once the restricted and unrestricted models havebeen estimated, and the LR statistic is invariant to reparameterizing the conditionaldensity

The score or LM test is based on the restricted estimation only Let sið ~yÞ be the

P 1 score of liðy Þ evaluated at the restricted estimates ~ y y That is, we compute the

partial derivatives of lðy Þ with respect to each of the P parameters, but then we

Trang 14

evaluate this vector of partials at the restricted estimates Then, from Section 12.6.2and the information matrix equality, the statistics

~

Ai

!1

XN i¼1

containing any conditioning variables; see Problem 13.5 We have already used theexpected Hessian form of the LM statistic for nonlinear regression in Section 12.6.2

We will use it in several applications in Part IV, including binary response modelsand Poisson regression models In these examples, the statistic can be computedconveniently using auxiliary regressions based on weighted residuals

Because the unconditional information matrix equality holds, we know from tion 12.6.4 that the three classical statistics have the same limiting distribution underlocal alternatives Therefore, either small-sample considerations, invariance, or com-putational issues must be used to choose among the statistics

Sec-13.7 Speciﬁcation Testing

Since MLE generally relies on its distributional assumptions, it is useful to haveavailable a general class of speciﬁcation tests that are simple to compute One generalapproach is to nest the model of interest within a more general model (which may bemuch harder to estimate) and obtain the score test against the more general alternative.RESET in a linear model and its extension to exponential regression models in Section12.6.2 are examples of this approach, albeit in a non-maximum-likelihood setting

In the context of MLE, it makes sense to test moment conditions implied by theconditional density speciﬁcation Let wi¼ ðxi; yiÞ and suppose that, when f ð j x; y Þ is

correctly speciﬁed,

Trang 15

where gðw; y Þ is a Q 1 vector Any application implies innumerable choices for thefunction g Since the MLE ^y y sets the sum of the score to zero, g ðw; y Þ cannot contain

elements of sðw; y Þ Generally, g should be chosen to test features of a model that are

of primary interest, such as ﬁrst and second conditional moments, or various tional probabilities

condi-A test of hypothesis (13.37) is based on how far the sample average of gðwi; ^yÞ isfrom zero To derive the asymptotic distribution, note that

Po1fE½siðyoÞsiðyoÞ0g1fE½siðyoÞgiðyoÞ0g

is the P Q matrix of population regression coe‰cients from regressing giðyoÞ0 on

siðyoÞ0 Using a mean-value expansion about yoand algebra similar to that in ter 12, we can write

E½‘ygiðyoÞ j xi ¼ E½giðyoÞsiðyoÞ0j xi: ð13:39Þ

To show equation (13.39), write

Ey½giðy Þ j xi ¼

ð

Y

gðy; xi;yÞf ðy j xi;yÞnðdyÞ ¼ 0 ð13:40Þ

for all y Now, if we take the derivative with respect to y and assume that the

inte-grals and derivative can be interchanged, equation (13.40) implies that

ð

‘ygðy; xi;yÞf ðy j xi;yÞnðdyÞ þ

ðgðy; xi;yÞ‘yfðy j xi;yÞnðdyÞ ¼ 0

Trang 16

or Ey½‘ygiðy Þ j xi þ Ey½giðy Þsiðy Þ0j xi ¼ 0, where we use the fact that ‘yfðy j x; y Þ ¼

sðy; x; y Þ0fðy j x; y Þ Plugging in y ¼ yoand rearranging gives equation (13.39).What we have shown is that

con-i¼1ð^gi ^ssiPÞð^^ gi ^ssiPÞ^ 0 When we construct the dratic form, we get the Newey-Tauchen-White (NTW ) statistic,

ð^gi ^ssiPÞð^^ gi ^ssiPÞ^ 0

XN i¼1

gið ^yÞ

" #

ð13:41Þ

This statistic was proposed independently by Newey (1985) and Tauchen (1985), and

is an extension of White’s (1982a) information matrix (IM) test statistic

For computational purposes it is useful to note that equation (13.41) is identical to

N SSR0¼ NR2from the regression

1 on ^ss0i; ^gi0; i¼ 1; 2; ; N ð13:42Þwhere SSR0 is the usual sum of squared residuals Under the null that the density iscorrectly speciﬁed, NTW is distributed asymptotically as w2

Q, assuming that gðw; y Þcontains Q nonredundant moment conditions Unfortunately, the outer product form

of regression (13.42) means that the statistic can have poor ﬁnite sample properties

In particular applications—such as nonlinear least squares, binary response analysis,and Poisson regression, to name a few—it is best to use forms of test statistics based

on the expected Hessian We gave the regression-based test for NLS in equation(12.72), and we will see other examples in later chapters For the information matrixtest statistic, Davidson and MacKinnon (1992) have suggested an alternative form ofthe IM statistic that appears to have better ﬁnite sample properties

Example 13.2 (continued): To test the speciﬁcation of the conditional mean forPoission regression, we might take gðw;y Þ ¼ expðxyÞx0½ y expðxyÞ ¼ expðxyÞsðw; yÞ,

Trang 17

where the score is given by equation (13.19) If Eð y j xÞ ¼ expðxyoÞ then E½gðw; yoÞ j x

¼ expðxyoÞE½sðw; yoÞ j x ¼ 0 To test the Poisson variance assumption, Varð y j xÞ ¼

Eð y j xÞ ¼ expðxyoÞ, g can be of the form gðw; y Þ ¼ aðx; y Þf½ y expðxy Þ2 expðxy Þg,

where aðx; y Þ is a Q 1 vector If the Poisson assumption is true, then u ¼ y expðxyoÞ has a zero conditional mean and Eðu2j xÞ ¼ Varð y j xÞ ¼ expðxyoÞ It fol-lows that E½gðw; yoÞ j x ¼ 0

Example 13.2 contains examples of what are known as conditional moment tests

As the name suggests, the idea is to form orthogonality conditions based on somekey conditional moments, usually the conditional mean or conditional variance, butsometimes conditional probabilities or higher order moments The tests for nonlinearregression in Chapter 12 can be viewed as conditional moment tests, and we willsee several other examples in Part IV For reasons discussed earlier, we will avoidcomputing the tests using regression (13.42) whenever possible See Newey (1985),Tauchen (1985), and Pagan and Vella (1989) for general treatments and applications

of conditional moment tests White’s (1982a) information matrix test can often beviewed as a conditional moment test; see Hall (1987) for the linear regression modeland White (1994) for a general treatment

13.8 Partial Likelihood Methods for Panel Data and Cluster Samples

Up to this point we have assumed that the parametric model for the density of ygiven x is correctly speciﬁed This assumption is fairly general because x can containany observable variable The leading case occurs when x contains variables we view

as exogenous in a structural model In other cases, x will contain variables that areendogenous in a structural model, but putting them in the conditioning set and ﬁnd-ing the new conditional density makes estimation of the structural parameters easier.For studying various panel data models, for estimation using cluster samples, andfor various other applications, we need to relax the assumption that the full condi-tional density of y given x is correctly speciﬁed In some examples, such a model istoo complicated Or, for robustness reasons, we do not wish to fully specify the den-sity of y given x

13.8.1 Setup for Panel Data

For panel data applications we let y denote a T 1 vector, with generic element yt.Thus, yi is a T 1 random draw vector from the cross section, with tth element yit

As always, we are thinking of T small relative to the cross section sample size With a

Trang 18

slight notational change we can replace yit with, say, a G-vector for each t, an tension that allows us to cover general systems of equations with panel data.

ex-For some vector xtcontaining any set of observable variables, let Dð ytj xtÞ denotethe distribution of yt given xt The key assumption is that we have a correctly speci-ﬁed model for the density of ytgiven xt; call it ftð ytj xt;yÞ, t ¼ 1; 2; ; T The vector

xtcan contain anything, including conditioning variables zt, lags of these, and lagged

values of y The vector y consists of all parameters appearing in ft for any t; some orall of these may appear in the density for every t, and some may appear only in thedensity for a single time period

What distinguishes partial likelihood from maximum likelihood is that we do notassume that

We deﬁne the partial log likelihood for each observation i as

liðy Þ 1XT

t¼1

which is the sum of the log likelihoods across t What makes partial likelihood

methods work is that yo maximizes the expected value of equation (13.44) provided

we have the densities ftð ytj xt;yÞ correctly speciﬁed

By the Kullback-Leibler information inequality, yomaximizes E½log ftð yitj xit;yÞ

over Y for each t, so yoalso maximizes the sum of these over t As usual,

identiﬁca-tion requires that yo be the unique maximizer of the expected value of equation

(13.44) It is su‰cient that youniquely maximizes E½log ftð yitj xit;yÞ for each t, butthis assumption is not necessary

The partial maximum likelihood estimator (PMLE) ^y y solves

Tiêu đề	Maximum Likelihood Methods
Trường học	University (assumed, not specified)
Chuyên ngành	Econometrics
Thể loại	Textbook chapter

Định dạng
Số trang	36
Dung lượng	253,54 KB