Book Econometric Analysis of Cross Section and Panel Data By Wooldridge - Chapter 16 potx

In addition, we will obtain e‰cient estimates of quantities such as Eð y j xÞ.The following example shows how a simple economic model leads to an econo-metric model where y can be zero w

Trang 1

16.1 Introduction and Motivation

In this chapter we cover a class of models traditionally called censored regressionmodels Censored regression models generally apply when the variable to be explained

is partly continuous but has positive probability mass at one or more points In order

to apply these methods e¤ectively, we must understand that the statistical modelunderlying censored regression analysis applies to problems that are conceptuallyvery di¤erent

For the most part, censored regression applications can be put into one of twocategories In the ﬁrst case there is a variable with quantitative meaning, call it y,and we are interested in the population regression Eð yj xÞ If y and x were ob-served for everyone in the population, there would be nothing new: we could usestandard regression methods (ordinary or nonlinear least squares) But a data prob-lem arises because y is censored above or below some value; that is, it is not ob-servable for part of the population An example is top coding in survey data Forexample, assume that y is family wealth, and, for a randomly drawn family, theactual value of wealth is recorded up to some threshold, say, $200,000, but abovethat level only the fact that wealth was more than $200,000 is recorded Top coding is

an example of data censoring, and is analogous to the data-coding problem we cussed in Section 15.10.2 in connection with interval regression

dis-Example 16.1 (Top Coding of Wealth): In the population of all families in theUnited States, let wealth denote actual family wealth, measured in thousands ofdollars Suppose that wealthfollows the linear regression model Eðwealthj xÞ ¼ xb,

where x is a 1 K vector of conditioning variables However, we observe wealthonly when wealtha200 When wealth is greater than 200 we know that it is, but

we do not know the actual value of wealth Deﬁne observed wealth as

wealth¼ minðwealth;200Þ

The deﬁnition wealth¼ 200 when wealth>200 is arbitrary, but it is useful for

deﬁning the statistical model that follows To estimate b we might assume that

wealthgiven x has a homoskedastic normal distribution In error form,

wealth¼ xb þ u; uj x @ Normalð0; s2Þ

This is a strong assumption about the conditional distribution of wealth, something

we could avoid entirely if wealth were not censored above 200 Under these sumptions we can write recorded wealth as

Trang 2

Data censoring also arises in the analysis of duration models, a topic we treat inChapter 20.

A second kind of application of censored regression models appears more often ineconometrics and, unfortunately, is where the label ‘‘censored regression’’ is leastappropriate To describe the situation, let y be an observable choice or outcomedescribing some economic agent, such as an individual or a ﬁrm, with the followingcharacteristics: y takes on the value zero with positive probability but is a continuousrandom variable over strictly positive values There are many examples of variablesthat, at least approximately, have these features Just a few examples include amount

of life insurance coverage chosen by an individual, family contributions to an vidual retirement account, and ﬁrm expenditures on research and development Ineach of these examples we can imagine economic agents solving an optimizationproblem, and for some agents the optimal choice will be the corner solution, y¼ 0

indi-We will call this kind of response variable a corner solution outcome For corner tion outcomes, it makes more sense to call the resulting model a corner solutionmodel Unfortunately, the name ‘‘censored regression model’’ appears to be ﬁrmlyentrenched

solu-For corner solution applications, we must understand that the issue is not dataobservability: we are interested in features of the distribution of y given x, such as

Eð y j xÞ and Pð y ¼ 0 j xÞ If we are interested only in the e¤ect of the xjon the meanresponse, Eð y j xÞ, it is natural to ask, Why not just assume Eð y j xÞ ¼ xb and applyOLS on a random sample? Theoretically, the problem is that, when y b 0, Eð y j xÞcannot be linear in x unless the range of x is fairly limited A related weakness is thatthe model implies constant partial e¤ects Further, for the sample at hand, predicted

values for y can be negative for many combinations of x and b These are very

sim-ilar to the shortcomings of the linear probability model for binary responses

We have already seen functional forms that ensure that Eð y j xÞ is positive forall values of x and parameters, the leading case being the exponential function,

Eð y j xÞ ¼ expðxbÞ [We cannot use logð yÞ as the dependent variable in a linear gression because logð0Þ is undeﬁned.] We could then estimate b using nonlinear leastsquares (NLS), as in Chapter 12 Using an exponential conditional mean function

re-is a reasonable strategy to follow, as it ensures that predicted values are positiveand that the parameters are easy to interpret However, it also has limitations First,

if y is a corner solution outcome, Varð y j xÞ is probably heteroskedastic, and soNLS could be ine‰cient While we may be able to partly solve this problem usingweighted NLS, any model for the conditional variance would be arbitrary Probably

a more important criticism is that we would not be able to measure the e¤ect ofeach x on other features of the distribution of y given x Two that are commonly of

Trang 3

interest are Pð y ¼ 0 j xÞ and Eð y j x; y > 0Þ By deﬁnition, a model for Eð y j xÞ doesnot allow us to estimate other features of the distribution If we make a full distribu-tional assumption for y given x, we can estimate any feature of the conditional dis-tribution In addition, we will obtain e‰cient estimates of quantities such as Eð y j xÞ.The following example shows how a simple economic model leads to an econo-metric model where y can be zero with positive probability and where the conditionalexpectation Eð y j xÞ is not a linear function of parameters.

Example 16.2 (Charitable Contributions): Problem 15.1 shows how to derive aprobit model from a utility maximization problem for charitable giving, using utilityfunction utiliðc; qÞ ¼ c þ ai logð1 þ qÞ, where c is annual consumption, in dollars, and

q is annual charitable giving The variable aidetermines the marginal utility of givingfor family i Maximizing subject to the budget constraint ciþ piqi¼ mi (where mi isfamily income and piis the price of a dollar of charitable contributions) and the in-equality constraint c, q b 0, the solution qi is easily shown to be qi¼ 0 if ai= pia1and qi¼ ai= pi 1 if ai= pi>1 We can write this relation as 1þ qi¼ maxð1; ai= piÞ

If ai¼ expðzigþ uiÞ, where uiis an unobservable independent ofðzi; pi; miÞ and mally distributed, then charitable contributions are determined by the equationlogð1 þ qiÞ ¼ max½0; zig logð piÞ þ ui ð16:2Þ

nor-Comparing equations (16.2) and (16.1) shows that they have similar statisticalstructures In equation (16.2) we are taking a maximum, and the lower threshold iszero, whereas in equation (16.1) we are taking a minimum with an upper threshold of

200 Each problem can be transformed into the same statistical model: for a domly drawn observation i from the population,

The charitable contributions example immediately ﬁts into the standard censoredTobit framework by deﬁning xi¼ ½zi;logð piÞ and yi¼ logð1 þ qiÞ This particulartransformation of qi and the restriction that the coe‰cient on logð piÞ is 1 dependcritically on the utility function used in the example In practice, we would probablytake y ¼ qiand allow all parameters to be unrestricted

Trang 4

The wealth example can be cast as equations (16.3) and (16.4) after a simpletransformation:

ðwealthi 200Þ ¼ maxð0; 200 xib uiÞ

and so the intercept changes, and all slope coe‰cients have the opposite sign fromequation (16.1) For data-censoring problems, it is easier to study the censoringscheme directly, and many econometrics packages support various kinds of datacensoring Problem 16.3 asks you to consider general forms of data censoring,including the case when the censoring point can change with observation, in whichcase the model is often called the censored normal regression model (This labelproperly emphasizes the data-censoring aspect.)

For the population, we write the standard censored Tobit model as

where, except in rare cases, x contains unity As we saw from the two previousexamples, di¤erent features of this model are of interest depending on the type of

application In examples with true data censoring, such as Example 16.1, the vector b

tells us everything we want to know because Eð yj xÞ ¼ xb is of interest For corner solution outcomes, such as Example 16.2, b does not give the entire story Usually,

we are interested in Eð y j xÞ or Eð y j x; y > 0Þ These certainly depend on b, but in anonlinear fashion

For the statistical model (16.5) and (16.6) to make sense, the variable y shouldhave characteristics of a normal random variable In data censoring cases this re-quirement means that the variable of interest y should have a homoskedastic nor-mal distribution In some cases the logarithmic transformation can be used to makethis assumption more plausible Example 16.1 might be one such case if wealth ispositive for all families See also Problems 16.1 and 16.2

In corner solution examples, the variable y should be (roughly) continuous when

y > 0 Thus the Tobit model is not appropriate for ordered responses, as in Section15.10 Similarly, Tobit should not be applied to count variables, especially when thecount variable takes on only a small number of values (such as number of patentsawarded annually to a ﬁrm or the number of times someone is arrested during ayear) Poisson regression models, a topic we cover in Chapter 19, are better suited foranalyzing count data

For corner solution outcomes, we must avoid placing too much emphasis on thelatent variable y Most of the time y is an artiﬁcial construct, and we are notinterested in Eð yj xÞ In Example 16.2 we derived the model for charitable con-

Trang 5

tributions using utility maximization, and a latent variable never appeared Viewing

yas something like ‘‘desired charitable contributions’’ can only sow confusion: thevariable of interest, y, is observed charitable contributions

16.2 Derivations of Expected Values

In corner solution applications such as the charitable contributions example, interestcenters on probabilities or expectations involving y Most of the time we focus on theexpected values Eð y j x; y > 0Þ and Eð y j xÞ

Before deriving these expectations for the Tobit model, it is interesting to derive aninequality that bounds Eð y j xÞ from below Since the function gðzÞ 1 maxð0; zÞ isconvex, it follows from the conditional Jensen’s inequality (see Appendix 2A) that

Eð y j xÞ b max½0; Eð yj xÞ This condition holds when y has any distribution andfor any form of Eð yj xÞ If Eð yj xÞ ¼ xb, then

which is always nonnegative Equation (16.7) shows that Eð y j xÞ is bounded from

below by the larger of zero and xb.

When u is independent of x and has a normal distribution, we can ﬁnd an explicitexpression for Eð y j xÞ We ﬁrst derive Pð y > 0 j xÞ and Eð y j x; y > 0Þ, which are ofinterest in their own right Then, we use the law of iterated expectations to obtain

Eð y j xÞ:

Eð y j xÞ ¼ Pð y ¼ 0 j xÞ 0 þ Pð y > 0 j xÞ Eð y j x; y > 0Þ

Deriving Pð y > 0 j xÞ is easy Deﬁne the binary variable w ¼ 1 if y > 0, w ¼ 0 if

y¼ 0 Then w follows a probit model:

Pðw ¼ 1 j xÞ ¼ Pð y>0j xÞ ¼ Pðu > xb j xÞ

One implication of equation (16.9) is that g 1 b=s, but not b and s separately, can be

consistently estimated from a probit of w on x

To derive Eð y j x; y > 0Þ, we need the following fact about the normal distribution:

if z @ Normalð0; 1Þ, then, for any constant c,

Eðz j z > cÞ ¼ fðcÞ

1 FðcÞ

Trang 6

where fðÞ is the standard normal density function fThis is easily shown by notingthat the density of z given z > c is fðzÞ=½1 FðcÞ, z > c, and then integrating zfðzÞ

from c to y.g Therefore, if u @ Normalð0; s2Þ, then

be true by equations (16.7) and (16.8)

For any c the quantity lðcÞ 1 fðcÞ=FðcÞ is called the inverse Mills ratio Thus,

Eð y j x; y > 0Þ is the sum of xb and s times the inverse Mills ratio evaluated at xb=s

If xj is a continuous explanatory variable, then

qxj

This equation shows that the partial e¤ect of xj on Eð y j x; y > 0Þ is not entirely termined by bj; there is an adjustment factor multiplying bj, the term in f g, that

de-depends on x through the index xb=s We can use the fact that if z @ Normalð0; 1Þ,

then Varðz j z > cÞ ¼ 1 lðcÞ½c þ lðcÞ for any c A R, which implies that the ment factor in equation (16.11), call it yðxb=sÞ ¼ f1 lðxb=sÞ½xb=s þ lðxb=sÞg, isstrictly between zero and one Therefore, the sign of bjis the same as the sign of thepartial e¤ect of xj

adjust-Other functional forms are easily handled Suppose that x1 ¼ logðz1Þ (and that this

is the only place z1appears in x) Then

qEð y j x; y > 0Þ

qz1

Trang 7

where b1 now denotes the coe‰cient on logðz1Þ Or, suppose that x1¼ z1 and x2¼

where b1 is the coe‰cient on z1 and b2 is the coe‰cient on z12 Interaction terms are

handled similarly Generally, we compute the partial e¤ect of xb with respect to the

variable of interest and multiply this by the factor yðxb=sÞ

All of the usual economic quantities such as elasticities can be computed Theelasticity of y with respect to x1, conditional on y > 0, is

qEð y j x; y > 0Þ

qx1

and equations (16.11) and (16.10) can be used to ﬁnd the elasticity when x1appears

in levels form If z1 appears in logarithmic form, the elasticity is obtained simply as

q log Eð y j x; y > 0Þ=q logðz1Þ

If x1is a binary variable, the e¤ect of interest is obtained as the di¤erence between

Eð y j x; y > 0Þ with x1¼ 1 and x1¼ 0 Other discrete variables (such as number ofchildren) can be handled similarly

We can also compute Eð y j xÞ from equation (16.8):

Eð y j xÞ ¼ Pð y > 0 j xÞ Eð y j x; y > 0Þ

¼ Fðxb=sÞ½xb þ slðxb=sÞ ¼ Fðxb=sÞxb þ sfðxb=sÞ ð16:14Þ

We can ﬁnd the partial derivatives of Eð y j xÞ with respect to continuous xjusing thechain rule In examples where y is some quantity chosen by individuals (labor supply,charitable contributions, life insurance), this derivative accounts for the fact thatsome people who start at y¼ 0 may switch to y > 0 when xjchanges Formally,qEð y j xÞ

0j xÞ ¼ Fðxb=sÞ, qPð y > 0 j xÞ=qxj¼ ðbj=sÞfðxb=sÞ If we plug this along with tion (16.11) into equation (16.15), we get a remarkable simpliﬁcation:

equa-qEð y j xÞ

The estimated scale factor for a given x is Fðx ^b b=^sÞ This scale factor has a very teresting interpretation: Fðx ^b b=^sÞ ¼ ^Pð y > 0 j xÞ; that is, Fðx ^b b=^sÞ is the estimated

Trang 8

in-probability of observing a positive response given x If Fðx ^b b=^sÞ is close to one, then

it is unlikely we observe yi¼ 0 when xi¼ x, and the adjustment factor becomesunimportant In practice, a single adjustment factor is obtained as Fðx ^b b=^sÞ, where xdenotes the vector of mean values If the estimated probability of a positive response

is close to one at the sample means of the covariates, the adjustment factor can beignored In most interesting Tobit applications, Fðx ^b b=^sÞ is notably less than unity.For discrete variables or for large changes in continuous variables, we can computethe di¤erence in Eð y j xÞ at di¤erent values of x [Incidentally, equations (16.11) and(16.16) show that s is not a ‘‘nuisance parameter,’’ as it is sometimes called in Tobitapplications: s plays a crucial role in estimating the partial e¤ects of interest in cornersolution applications.]

Equations (16.9), (16.11), and (16.14) show that, for continuous variables xj and

xh, the relative partial e¤ects on Pð y > 0 j xÞ, Eð y j x; y > 0Þ, and Eð y j xÞ are allequal to bj= h (assuming that bh00) This fact can be a limitation of the Tobitmodel, something we take up further in Section 16.7

By taking the log of equation (16.8) and di¤erentiating, we see that the elasticity(or semielasticity) of Eð y j xÞ with respect to any xjis simply the sum of the elasticities(or semielasticities) of Fðxb=sÞ and Eð y j x; y > 0Þ, each with respect to xj

Trang 9

only observations with uncensored durations It would be convenient if OLS using

only the uncensored observations were consistent for b, but such is not the case.

From equation (16.14) it is also pretty clear that regressing yion xiusing all of the

data will not consistently estimate b: Eð y j xÞ is nonlinear in x, b, and s, so it would

be a ﬂuke if a linear regression consistently estimated b.

There are some interesting theoretical results about how the slope coe‰cients in b

can be estimated up to scale using one of the two OLS regressions that we have cussed Therefore, each OLS coe‰cient is inconsistent by the same multiplicativefactor This fact allows us—both in data-censoring applications and corner solutionapplications—to estimate the relative e¤ects of any two explanatory variables Theassumptions made to derive such results are very restrictive, and they generally ruleout discrete and other discontinuous regressors [Multivariate normality of ðx; yÞ issu‰cient.] The arguments, which rely on linear projections, are elegant—see, for ex-ample, Chung and Goldberger (1984)—but such results have questionable practicalvalue

dis-The previous discussion does not mean a linear regression of yi on xi is mative Remember that, whether or not the Tobit model holds, we can always writethe linear projection of y on x as Lð y j xÞ ¼ xg for g ¼ ½Eðx0xÞ1Eðx0yÞ, under themild restriction that all second moments are ﬁnite It is possible that gj approximatesthe e¤ect of xj on Eð y j xÞ when x is near its population mean Similarly, a linear re-gression of yi on xi, using only observations with yi>0, might approximate thepartial e¤ects on Eð y j x; y > 0Þ near the mean values of the xj Such issues have notbeen fully explored in corner solution applications of the Tobit model

uninfor-16.4 Estimation and Inference with Censored Tobit

Letfðxi; yiÞ: i ¼ 1; 2; Ng be a random sample following the censored Tobit model

To use maximum likelihood, we need to derive the density of yi given xi We havealready shown that fð0 j xiÞ ¼ Pð yi¼ 0 j xiÞ ¼ 1 Fðxib=sÞ Further, for y > 0,

Pð yiay j xiÞ ¼ Pð y

i ay j xiÞ, which implies that

fðy j xiÞ ¼ fðy j xiÞ; all y > 0

where fð j xiÞ denotes the density of yigiven xi (We use y as the dummy argument

in the density.) By assumption, y

i j xi@ Normalðxib;s2Þ, so

fðy j xiÞ ¼1

sf½ðy xibÞ=s; y < y < y

Trang 10

(As in recent chapters, we will use b and s2 to denote the true values as well asdummy arguments in the log-likelihood function and its derivatives.) We can writethe density for yigiven xi compactly using the indicator function 1½ as

fðy j xiÞ ¼ f1 Fðxib=sÞg1½ y¼0fð1=sÞf½ðy xibÞ=sg1½ y>0 ð16:19Þ

where the density is zero for y < 0 Let y 1 ð b0;s2Þ0denote theðK þ 1Þ 1 vector ofparameters The conditional log likelihood is

liðyÞ ¼ 1½ yi¼ 0 log½1 Fðxib=sÞ þ 1½ yi>0flog f½ð yi xibÞ=s logðs2Þ=2g

ð16:20ÞApart from a constant that does not a¤ect the maximization, equation (16.20) can bewritten as

ai¼ s2fxigfi ½fi2=ð1 FiÞ Fig

bi¼ s3fðxigÞ2fiþ fi ½ðxigÞfi2=ð1 FiÞg=2

ci¼ s4fðxigÞ3fiþ ðxigÞfi ½ðxigÞfi2=ð1 FiÞ 2Fig=4

g ¼ b=s, and fiand Fiare evaluated at xig This matrix is used in equation (13.32) to

obtain the estimate of Avarð ^yÞ See Amemiya (1973) for details

Testing is easily carried out in a standard MLE framework Single exclusionrestrictions are tested using asymptotic t statistics once ^bjand its asymptotic standarderror have been obtained Multiple exclusion restrictions are easily tested using the

LR statistic, and some econometrics packages routinely compute the Wald statistic

Trang 11

If the unrestricted model has so many variables that computation becomes an issue,the LM statistic is an attractive alternative.

The Wald statistic is the easiest to compute for testing nonlinear restrictions on b,

just as in binary response analysis, because the unrestricted model is just standardTobit

16.5 Reporting the Results

For data censoring applications, the quantities of interest are the ^bj and their dard errors (We might use these to compute elasticities, and so on.) We interpret theestimated model as if there were no data-censoring problem, because the populationmodel is a linear conditional mean The value of the log-likelihood function should

stan-be reported for any estimated model stan-because of its role in obtaining likelihood ratiostatistics We can test for omitted variables, including nonlinear functions of alreadyincluded variables, using either t tests or LR tests All of these rely on the homo-skedastic normal assumption in the underlying population

For corner solution applications, the same statistics can be reported, and, in tion, we should report estimated partial e¤ects on Eð y j x; y > 0Þ and Eð y j xÞ The

addi-formulas for these are given in Section 16.2, where b and s are replaced with their

MLEs Because these estimates depend on x, we must decide at what values of x toreport the partial e¤ects or elasticities As with probit, the average values of x can beused, or, if some elements of x are qualitative variables, we can assign them values ofparticular interest For the important elements of x, the partial e¤ects or elasticitiescan be estimated at a range of values, holding the other elements ﬁxed For example,

if x1 is price, then we can compute equation (16.11) or (16.16), or the correspondingelasticities, for low, medium, and high prices, while keeping all other elements ﬁxed

If x1 is a dummy variable, then we can obtain the di¤erence in estimates with x1¼ 1and x1¼ 0, holding all other elements of x ﬁxed Standard errors of these estimatescan be obtained by the delta method, although the calculations can be tedious.Example 16.3 (Annual Hours Equation for Married Women): We use the Mroz(1987) data (MROZ.RAW ) to estimate a reduced form annual hours equation formarried women The equation is a reduced form because we do not include hourlywage o¤er as an explanatory variable The hourly wage o¤er is unlikely to be exog-enous, and, just as importantly, we cannot observe it when hours¼ 0 We will showhow to deal with both these issues in Chapter 17 For now, the explanatory variablesare the same ones appearing in the labor force participation probit in Example 15.2

Of the 753 women in the sample, 428 worked for a wage outside the home duringthe year; 325 of the women worked zero hours For the women who worked positive

Trang 12

hours, the range is fairly broad, ranging from 12 to 4,950 Thus, annual hoursworked is a reasonable candidate for a Tobit model We also estimate a linear model(using all 753 observations) by OLS The results are in Table 16.1.

Not surprisingly, the Tobit coe‰cient estimates are the same sign as the sponding OLS estimates, and the statistical signiﬁcance of the estimates is similar.(Possible exceptions are the coe‰cients on nwifeinc and kidsge6, but the t statisticshave similar magnitudes.) Second, though it is tempting to compare the magnitudes

corre-of the OLS estimates and the Tobit estimates, such comparisons are not very mative We must not think that, because the Tobit coe‰cient on kidslt6 is roughlytwice that of the OLS coe‰cient, the Tobit model somehow implies a much greaterresponse of hours worked to young children

infor-We can multiply the Tobit estimates by the adjustment factors in equations (16.11)and (16.16), evaluated at the estimates and the mean values of the xj (but where wesquare exper rather than use the average of the exper2

iÞ, to obtain the partial e¤ects

on the conditional expectations The factor in equation (16.11) is about 451 Forexample, conditional on hours being positive, a year of education (starting fromthe mean values of all variables) is estimated to increase expected hours by about

Table 16.1

OLS and Tobit Estimation of Annual Hours Worked

Dependent Variable: hours

(2.54)

8.81 (4.46)

(12.95)

80.65 (21.58)

(9.96)

131.56 (17.28)

(.325)

1.86 (0.54)

(4.36)

54.41 (7.42)

(58.85)

894.02 (111.88)

(23.18)

16.22 (38.64)

(270.78)

965.31 (446.44)

^

Trang 13

.451(80.65) A 36.4 hours Using the approximation for one more young child gives a fall in expected hours by about (.451)(894.02) A 403.2 Of course, this ﬁgure does not

make sense for a woman working less than 403.2 hours It would be better to estimatethe expected values at two di¤erent values of kidslt6 and form the di¤erence, ratherthan using the calculus approximation

The factor in equation (16.16), again evaluated at the mean values of the xj, isabout 645 This result means that the estimated probability of a woman being in theworkforce, at the mean values of the covariates, is about 645 Therefore, the mag-nitudes of the e¤ects of each xj on expected hours—that is, when we account forpeople who initially do not work, as well as those who are initially working—is largerthan when we condition on hours > 0 We can multiply the Tobit coe‰cients, at leastthose on roughly continuous explanatory variables, by 645 to make them roughlycomparable to the OLS estimates in the ﬁrst column In most cases the estimatedTobit e¤ect at the mean values are signiﬁcantly above the corresponding OLSestimate For example, the Tobit e¤ect of one more year of education is about

.645(80.65) A 52.02, which is well above the OLS estimate of 28.76.

We have reported an R-squared for both the linear regression model and the Tobitmodel The R-squared for OLS is the usual one For Tobit, the R-squared is thesquare of the correlation coe‰cient between yi and ^yi, where ^yi¼ Fðxib=^^ sÞxi^þ

^

s

sfðxib=^^ sÞ is the estimate of Eð y j x ¼ xiÞ This statistic is motivated by the fact thatthe usual R-squared for OLS is equal to the squared correlation between the yi andthe OLS ﬁtted values

Based on the R-squared measures, the Tobit conditional mean function ﬁts thehours data somewhat better, although the di¤erence is not overwhelming However, weshould remember that the Tobit estimates are not chosen to maximize an R-squared—they maximize the log-likelihood function—whereas the OLS estimates produce thehighest R-squared given the linear functional form for the conditional mean

When two additional variables, the local unemployment rate and a binary city dicator, are included, the log likelihood becomes about 3,817.89 The likelihoodratio statistic is about 2(3,819.09 3,817.89) ¼ 2.40 This is the outcome of a w2

in-2variate under H0, and so the p-value is about 30 Therefore, these two variables arejointly insigniﬁcant

16.6 Speciﬁcation Issues in Tobit Models

16.6.1 Neglected Heterogeneity

Suppose that we are initially interested in the model

y¼ maxð0; xb þ gq þ uÞ; uj x; q @ Normalð0; s2Þ ð16:24Þ

Trang 14

where q is an unobserved variable that is assumed to be independent of x and has aNormalð0; t2Þ distribution It follows immediately that

y¼ maxð0; xb þ vÞ; vj x @ Normalð0; s2þ g2t2Þ ð16:25ÞThus, y conditional on x follows a Tobit model, and Tobit of y on x consistently

estimates b and h21s2þ g2t2 In data-censoring cases we are interested in b; g is of

no use without observing q, and g cannot be estimated anyway We have shown thatheterogeneity independent of x and normally distributed has no important con-sequences in data-censoring examples

Things are more complicated in corner solution examples because, at least initially,

we are interested in Eð y j x; qÞ or Eð y j x; q; y > 0Þ As we discussed in Sections 2.2.5and 15.7.1, we are often interested in the average partial e¤ects (APEs), where, say,

Eð y j x; qÞ is averaged over the population distribution of q, and then derivatives ordi¤erences with respect to elements of x are obtained From Section 2.2.5 we knowthat when the heterogeneity is independent of x, the APEs are obtained by ﬁnding

Eð y j xÞ [or Eð y j x; y > 0Þ] Naturally, these conditional means come from the tribution of y given x Under the preceding assumptions, it is exactly this distributionthat Tobit of y on x estimates In other words, we estimate the desired quantities—the APEs—by simply ignoring the heterogeneity This is the same conclusion wereached for the probit model in Section 15.7.1

dis-If q is not normal, then these arguments do not carry over because y given x doesnot follow a Tobit model But the ﬂavor of the argument does A more di‰cult issuearises when q and x are correlated, and we address this in the next subsection.16.6.2 Endogenous Explanatory Variables

Suppose we now allow one of the variables in the Tobit model to be endogenous Themodel is

where ðu1; v2Þ are zero-mean normally distributed, independent of z If u1and v2 arecorrelated, then y2 is endogenous For identiﬁcation we need the usual rank condi-

tion d220 0; Eðz0zÞ is assumed to have full rank, as always

If equation (16.26) represents a data-censoring problem, we are interested, as always,

in the parameters, d1and a1, as these are the parameters of interest in the uncensoredpopulation model For corner solution outcomes, the quantities of interest are moresubtle However, when the endogeneity of y2 is due to omitted variables or simulta-

neity, the parameters we need to estimate to obtain average partial e¤ects are d , a ,

Trang 15

and s2¼ Varðu1Þ The reasoning is just as for the probit model in Section 15.7.2.Holding other factors ﬁxed, the di¤erence in y1 when y2changes from y2to y2þ 1 ismax½0; z1d1þ a1ðy2þ 1Þ þ u1 max½0; z1d1þ a1y2þ u1

Averaging this expression across the distribution of u1 gives di¤erences in tions that have the form (16.14), with x¼ ½z1;ðy2þ 1Þ in the ﬁrst case, x ¼ ðz1; y2Þ

expecta-in the second, and s¼ s1 Importantly, unlike in the data censoring case, we need toestimate s2

1 in order to estimate the partial e¤ects of interest (the APEs)

Before estimating this model by maximum likelihood, a procedure that requiresobtaining the distribution of ð y1; y2Þ given z, it is convenient to have a two-stepprocedure that also delivers a simple test for the endogeneity of y2 Smith andBlundell (1986) propose a two-step procedure that is analogous to the Rivers-Vuongmethod (see Section 15.7.2) for binary response models Under bivariate normality of

ðu1; v2Þ, we can write

where y1¼ h1= 22, h1¼ Covðu1; v2Þ, t2

2 ¼ Varðv2Þ, and e1is independent of v2with azero-mean normal distribution and variance, say, t2 Further, becauseðu1; v2Þ is in-dependent of z, e1 is independent of ðz; v2Þ Now, plugging equation (16.28) intoequation (16.26) gives

y1 ¼ maxð0; z1d1þ a1y2þ y1v2þ e1Þ ð16:29Þ

where e1j z; v2@ Normalð0; t2

1Þ It follows that, if we knew v2, we would just estimate

d1, a1, y1, and t2

1 by standard censored Tobit We do not observe v2 because it

depends on the unknown vector d2 However, we can easily estimate d2 by OLS in aﬁrst stage The Smith-Blundell procedure is as follows:

Procedure 16.1: (a) Estimate the reduced form of y2 by OLS; this step gives ^d2.Deﬁne the reduced-form OLS residuals as ^vv2 ¼ y2 z ^d2

(b) Estimate a standard Tobit of y1 on z1, y2, and ^vv2 This step gives consistent

estimators of d1, a1, y1, and t2

1.The usual t statistic on ^vv2 reported by Tobit provides a simple test of the null

H0: y1¼ 0, which says that y2 is exogenous Further, under y1¼ 0, e1¼ u1, and sonormality of v2 plays no role: as a test for endogeneity of y2, the Smith-Blundellapproach is valid without any distributional assumptions on the reduced form of y2.Example 16.4 (Testing Exogeneity of Education in the Hours Equation): As an illus-tration, we test for endogeneity of educ in the reduced-form hours equation in Example16.3 We assume that motheduc, fatheduc, and huseduc are exogenous in the hours

Trang 16

equation, and so these are valid instruments for educ We ﬁrst obtain ^vv2as the OLSresiduals from estimating the reduced form for educ When ^vv2 is added to the Tobitmodel in Example 16.3 (without unem and city), its coe‰cient is 39.88 with tstatistic¼ 91 Thus, there is little evidence that educ is endogenous in the equation.The test is valid under the null hypothesis that educ is exogenous even if educ doesnot have a conditional normal distribution.

When y100, the second-stage Tobit standard errors and test statistics are notasymptotically valid because ^d2 has been used in place of d2 Smith and Blundell(1986) contain formulas for correcting the asymptotic variances; these can be derivedusing the formulas for two-step M-estimators in Chapter 12 It is easily seen that jointnormality of ðu1; v2Þ is not absolutely needed for the procedure to work It su‰cesthat u1 conditional on z and v2 is distributed as Normalðy1v2;t2

1Þ Still, this is a fairlyrestrictive assumption

When y100, the Smith-Blundell procedure does not allow us to estimate s2,which is needed to estimate average partial e¤ects in corner solution outcomes.Nevertheless, we can obtain consistent estimates of the average partial e¤ects byusing methods similar to those in the probit case Using the same reasoning in Sec-tion 15.7.2, the APEs are obtained by computing derivatives or di¤erences of

Ev2½mðz1d1þ a1y2þ y1v2;t21Þ ð16:30Þ

where mðz; s2Þ 1 Fðz=sÞz þ sfðz=sÞ and Ev2½ denotes expectation with respect tothe distribution of v2 Using the same argument as in Section 16.6.1, expression(16.30) can be written as mðz1d1þ a1y2;y12t22þ t2

1Þ Therefore, consistent estimators

of the APEs are obtained by taking, with respect to elements of ðz1; y2Þ, derivatives

or di¤erences of

mðz1^1þ ^a1y2; ^y12^22þ ^tt12Þ ð16:31Þwhere all estimates except ^tt2

2 come from step b of the Smith-Blundell procedure; ^tt2

2 issimply the usual estimate of the error variance from the ﬁrst-stage OLS regression

As in the case of probit, obtaining standard errors for the APEs based on expression(16.31) and the delta method would be quite complicated An alternative procedure,where mðz1^1þ ^a1y2þ ^y1^i2; ^tt2Þ is averaged across i, is also consistent, but it doesnot exploit the normality of v2

A full maximum likelihood approach avoids the two-step estimation problem Thejoint distribution ofð y1; y2Þ given z is most easily found by using

fð y1; y2j zÞ ¼ f ð y1j y2; zÞ f ð y2j zÞ ð16:32Þ

Định dạng
Số trang	33
Dung lượng	232,29 KB