In addition, we will obtain e‰cient estimates of quantities such as Eð y j xÞ.The following example shows how a simple economic model leads to an econo-metric model where y can be zero w
Trang 116.1 Introduction and Motivation
In this chapter we cover a class of models traditionally called censored regressionmodels Censored regression models generally apply when the variable to be explained
is partly continuous but has positive probability mass at one or more points In order
to apply these methods e¤ectively, we must understand that the statistical modelunderlying censored regression analysis applies to problems that are conceptuallyvery di¤erent
For the most part, censored regression applications can be put into one of twocategories In the first case there is a variable with quantitative meaning, call it y,and we are interested in the population regression Eð yj xÞ If y and x were ob-served for everyone in the population, there would be nothing new: we could usestandard regression methods (ordinary or nonlinear least squares) But a data prob-lem arises because y is censored above or below some value; that is, it is not ob-servable for part of the population An example is top coding in survey data Forexample, assume that y is family wealth, and, for a randomly drawn family, theactual value of wealth is recorded up to some threshold, say, $200,000, but abovethat level only the fact that wealth was more than $200,000 is recorded Top coding is
an example of data censoring, and is analogous to the data-coding problem we cussed in Section 15.10.2 in connection with interval regression
dis-Example 16.1 (Top Coding of Wealth): In the population of all families in theUnited States, let wealth denote actual family wealth, measured in thousands ofdollars Suppose that wealthfollows the linear regression model Eðwealthj xÞ ¼ xb,
where x is a 1 K vector of conditioning variables However, we observe wealthonly when wealtha200 When wealth is greater than 200 we know that it is, but
we do not know the actual value of wealth Define observed wealth as
wealth¼ minðwealth;200Þ
The definition wealth¼ 200 when wealth>200 is arbitrary, but it is useful for
defining the statistical model that follows To estimate b we might assume that
wealthgiven x has a homoskedastic normal distribution In error form,
wealth¼ xb þ u; uj x @ Normalð0; s2Þ
This is a strong assumption about the conditional distribution of wealth, something
we could avoid entirely if wealth were not censored above 200 Under these sumptions we can write recorded wealth as
Trang 2Data censoring also arises in the analysis of duration models, a topic we treat inChapter 20.
A second kind of application of censored regression models appears more often ineconometrics and, unfortunately, is where the label ‘‘censored regression’’ is leastappropriate To describe the situation, let y be an observable choice or outcomedescribing some economic agent, such as an individual or a firm, with the followingcharacteristics: y takes on the value zero with positive probability but is a continuousrandom variable over strictly positive values There are many examples of variablesthat, at least approximately, have these features Just a few examples include amount
of life insurance coverage chosen by an individual, family contributions to an vidual retirement account, and firm expenditures on research and development Ineach of these examples we can imagine economic agents solving an optimizationproblem, and for some agents the optimal choice will be the corner solution, y¼ 0
indi-We will call this kind of response variable a corner solution outcome For corner tion outcomes, it makes more sense to call the resulting model a corner solutionmodel Unfortunately, the name ‘‘censored regression model’’ appears to be firmlyentrenched
solu-For corner solution applications, we must understand that the issue is not dataobservability: we are interested in features of the distribution of y given x, such as
Eð y j xÞ and Pð y ¼ 0 j xÞ If we are interested only in the e¤ect of the xjon the meanresponse, Eð y j xÞ, it is natural to ask, Why not just assume Eð y j xÞ ¼ xb and applyOLS on a random sample? Theoretically, the problem is that, when y b 0, Eð y j xÞcannot be linear in x unless the range of x is fairly limited A related weakness is thatthe model implies constant partial e¤ects Further, for the sample at hand, predicted
values for y can be negative for many combinations of x and b These are very
sim-ilar to the shortcomings of the linear probability model for binary responses
We have already seen functional forms that ensure that Eð y j xÞ is positive forall values of x and parameters, the leading case being the exponential function,
Eð y j xÞ ¼ expðxbÞ [We cannot use logð yÞ as the dependent variable in a linear gression because logð0Þ is undefined.] We could then estimate b using nonlinear leastsquares (NLS), as in Chapter 12 Using an exponential conditional mean function
re-is a reasonable strategy to follow, as it ensures that predicted values are positiveand that the parameters are easy to interpret However, it also has limitations First,
if y is a corner solution outcome, Varð y j xÞ is probably heteroskedastic, and soNLS could be ine‰cient While we may be able to partly solve this problem usingweighted NLS, any model for the conditional variance would be arbitrary Probably
a more important criticism is that we would not be able to measure the e¤ect ofeach x on other features of the distribution of y given x Two that are commonly of
Trang 3interest are Pð y ¼ 0 j xÞ and Eð y j x; y > 0Þ By definition, a model for Eð y j xÞ doesnot allow us to estimate other features of the distribution If we make a full distribu-tional assumption for y given x, we can estimate any feature of the conditional dis-tribution In addition, we will obtain e‰cient estimates of quantities such as Eð y j xÞ.The following example shows how a simple economic model leads to an econo-metric model where y can be zero with positive probability and where the conditionalexpectation Eð y j xÞ is not a linear function of parameters.
Example 16.2 (Charitable Contributions): Problem 15.1 shows how to derive aprobit model from a utility maximization problem for charitable giving, using utilityfunction utiliðc; qÞ ¼ c þ ai logð1 þ qÞ, where c is annual consumption, in dollars, and
q is annual charitable giving The variable aidetermines the marginal utility of givingfor family i Maximizing subject to the budget constraint ciþ piqi¼ mi (where mi isfamily income and piis the price of a dollar of charitable contributions) and the in-equality constraint c, q b 0, the solution qi is easily shown to be qi¼ 0 if ai= pia1and qi¼ ai= pi 1 if ai= pi>1 We can write this relation as 1þ qi¼ maxð1; ai= piÞ
If ai¼ expðzigþ uiÞ, where uiis an unobservable independent ofðzi; pi; miÞ and mally distributed, then charitable contributions are determined by the equationlogð1 þ qiÞ ¼ max½0; zig logð piÞ þ ui ð16:2Þ
nor-Comparing equations (16.2) and (16.1) shows that they have similar statisticalstructures In equation (16.2) we are taking a maximum, and the lower threshold iszero, whereas in equation (16.1) we are taking a minimum with an upper threshold of
200 Each problem can be transformed into the same statistical model: for a domly drawn observation i from the population,
The charitable contributions example immediately fits into the standard censoredTobit framework by defining xi¼ ½zi;logð piÞ and yi¼ logð1 þ qiÞ This particulartransformation of qi and the restriction that the coe‰cient on logð piÞ is 1 dependcritically on the utility function used in the example In practice, we would probablytake y ¼ qiand allow all parameters to be unrestricted
Trang 4The wealth example can be cast as equations (16.3) and (16.4) after a simpletransformation:
ðwealthi 200Þ ¼ maxð0; 200 xib uiÞ
and so the intercept changes, and all slope coe‰cients have the opposite sign fromequation (16.1) For data-censoring problems, it is easier to study the censoringscheme directly, and many econometrics packages support various kinds of datacensoring Problem 16.3 asks you to consider general forms of data censoring,including the case when the censoring point can change with observation, in whichcase the model is often called the censored normal regression model (This labelproperly emphasizes the data-censoring aspect.)
For the population, we write the standard censored Tobit model as
where, except in rare cases, x contains unity As we saw from the two previousexamples, di¤erent features of this model are of interest depending on the type of
application In examples with true data censoring, such as Example 16.1, the vector b
tells us everything we want to know because Eð yj xÞ ¼ xb is of interest For corner solution outcomes, such as Example 16.2, b does not give the entire story Usually,
we are interested in Eð y j xÞ or Eð y j x; y > 0Þ These certainly depend on b, but in anonlinear fashion
For the statistical model (16.5) and (16.6) to make sense, the variable y shouldhave characteristics of a normal random variable In data censoring cases this re-quirement means that the variable of interest y should have a homoskedastic nor-mal distribution In some cases the logarithmic transformation can be used to makethis assumption more plausible Example 16.1 might be one such case if wealth ispositive for all families See also Problems 16.1 and 16.2
In corner solution examples, the variable y should be (roughly) continuous when
y > 0 Thus the Tobit model is not appropriate for ordered responses, as in Section15.10 Similarly, Tobit should not be applied to count variables, especially when thecount variable takes on only a small number of values (such as number of patentsawarded annually to a firm or the number of times someone is arrested during ayear) Poisson regression models, a topic we cover in Chapter 19, are better suited foranalyzing count data
For corner solution outcomes, we must avoid placing too much emphasis on thelatent variable y Most of the time y is an artificial construct, and we are notinterested in Eð yj xÞ In Example 16.2 we derived the model for charitable con-
Trang 5tributions using utility maximization, and a latent variable never appeared Viewing
yas something like ‘‘desired charitable contributions’’ can only sow confusion: thevariable of interest, y, is observed charitable contributions
16.2 Derivations of Expected Values
In corner solution applications such as the charitable contributions example, interestcenters on probabilities or expectations involving y Most of the time we focus on theexpected values Eð y j x; y > 0Þ and Eð y j xÞ
Before deriving these expectations for the Tobit model, it is interesting to derive aninequality that bounds Eð y j xÞ from below Since the function gðzÞ 1 maxð0; zÞ isconvex, it follows from the conditional Jensen’s inequality (see Appendix 2A) that
Eð y j xÞ b max½0; Eð yj xÞ This condition holds when y has any distribution andfor any form of Eð yj xÞ If Eð yj xÞ ¼ xb, then
which is always nonnegative Equation (16.7) shows that Eð y j xÞ is bounded from
below by the larger of zero and xb.
When u is independent of x and has a normal distribution, we can find an explicitexpression for Eð y j xÞ We first derive Pð y > 0 j xÞ and Eð y j x; y > 0Þ, which are ofinterest in their own right Then, we use the law of iterated expectations to obtain
Eð y j xÞ:
Eð y j xÞ ¼ Pð y ¼ 0 j xÞ 0 þ Pð y > 0 j xÞ Eð y j x; y > 0Þ
Deriving Pð y > 0 j xÞ is easy Define the binary variable w ¼ 1 if y > 0, w ¼ 0 if
y¼ 0 Then w follows a probit model:
Pðw ¼ 1 j xÞ ¼ Pð y>0j xÞ ¼ Pðu > xb j xÞ
One implication of equation (16.9) is that g 1 b=s, but not b and s separately, can be
consistently estimated from a probit of w on x
To derive Eð y j x; y > 0Þ, we need the following fact about the normal distribution:
if z @ Normalð0; 1Þ, then, for any constant c,
Eðz j z > cÞ ¼ fðcÞ
1 FðcÞ
Trang 6where fðÞ is the standard normal density function fThis is easily shown by notingthat the density of z given z > c is fðzÞ=½1 FðcÞ, z > c, and then integrating zfðzÞ
from c to y.g Therefore, if u @ Normalð0; s2Þ, then
be true by equations (16.7) and (16.8)
For any c the quantity lðcÞ 1 fðcÞ=FðcÞ is called the inverse Mills ratio Thus,
Eð y j x; y > 0Þ is the sum of xb and s times the inverse Mills ratio evaluated at xb=s
If xj is a continuous explanatory variable, then
qxj
This equation shows that the partial e¤ect of xj on Eð y j x; y > 0Þ is not entirely termined by bj; there is an adjustment factor multiplying bj, the term in f g, that
de-depends on x through the index xb=s We can use the fact that if z @ Normalð0; 1Þ,
then Varðz j z > cÞ ¼ 1 lðcÞ½c þ lðcÞ for any c A R, which implies that the ment factor in equation (16.11), call it yðxb=sÞ ¼ f1 lðxb=sÞ½xb=s þ lðxb=sÞg, isstrictly between zero and one Therefore, the sign of bjis the same as the sign of thepartial e¤ect of xj
adjust-Other functional forms are easily handled Suppose that x1 ¼ logðz1Þ (and that this
is the only place z1appears in x) Then
qEð y j x; y > 0Þ
qz1
Trang 7where b1 now denotes the coe‰cient on logðz1Þ Or, suppose that x1¼ z1 and x2¼
where b1 is the coe‰cient on z1 and b2 is the coe‰cient on z12 Interaction terms are
handled similarly Generally, we compute the partial e¤ect of xb with respect to the
variable of interest and multiply this by the factor yðxb=sÞ
All of the usual economic quantities such as elasticities can be computed Theelasticity of y with respect to x1, conditional on y > 0, is
qEð y j x; y > 0Þ
qx1
and equations (16.11) and (16.10) can be used to find the elasticity when x1appears
in levels form If z1 appears in logarithmic form, the elasticity is obtained simply as
q log Eð y j x; y > 0Þ=q logðz1Þ
If x1is a binary variable, the e¤ect of interest is obtained as the di¤erence between
Eð y j x; y > 0Þ with x1¼ 1 and x1¼ 0 Other discrete variables (such as number ofchildren) can be handled similarly
We can also compute Eð y j xÞ from equation (16.8):
Eð y j xÞ ¼ Pð y > 0 j xÞ Eð y j x; y > 0Þ
¼ Fðxb=sÞ½xb þ slðxb=sÞ ¼ Fðxb=sÞxb þ sfðxb=sÞ ð16:14Þ
We can find the partial derivatives of Eð y j xÞ with respect to continuous xjusing thechain rule In examples where y is some quantity chosen by individuals (labor supply,charitable contributions, life insurance), this derivative accounts for the fact thatsome people who start at y¼ 0 may switch to y > 0 when xjchanges Formally,qEð y j xÞ
0j xÞ ¼ Fðxb=sÞ, qPð y > 0 j xÞ=qxj¼ ðbj=sÞfðxb=sÞ If we plug this along with tion (16.11) into equation (16.15), we get a remarkable simplification:
equa-qEð y j xÞ
The estimated scale factor for a given x is Fðx ^b b=^sÞ This scale factor has a very teresting interpretation: Fðx ^b b=^sÞ ¼ ^Pð y > 0 j xÞ; that is, Fðx ^b b=^sÞ is the estimated
Trang 8in-probability of observing a positive response given x If Fðx ^b b=^sÞ is close to one, then
it is unlikely we observe yi¼ 0 when xi¼ x, and the adjustment factor becomesunimportant In practice, a single adjustment factor is obtained as Fðx ^b b=^sÞ, where xdenotes the vector of mean values If the estimated probability of a positive response
is close to one at the sample means of the covariates, the adjustment factor can beignored In most interesting Tobit applications, Fðx ^b b=^sÞ is notably less than unity.For discrete variables or for large changes in continuous variables, we can computethe di¤erence in Eð y j xÞ at di¤erent values of x [Incidentally, equations (16.11) and(16.16) show that s is not a ‘‘nuisance parameter,’’ as it is sometimes called in Tobitapplications: s plays a crucial role in estimating the partial e¤ects of interest in cornersolution applications.]
Equations (16.9), (16.11), and (16.14) show that, for continuous variables xj and
xh, the relative partial e¤ects on Pð y > 0 j xÞ, Eð y j x; y > 0Þ, and Eð y j xÞ are allequal to bj= h (assuming that bh00) This fact can be a limitation of the Tobitmodel, something we take up further in Section 16.7
By taking the log of equation (16.8) and di¤erentiating, we see that the elasticity(or semielasticity) of Eð y j xÞ with respect to any xjis simply the sum of the elasticities(or semielasticities) of Fðxb=sÞ and Eð y j x; y > 0Þ, each with respect to xj
Trang 9only observations with uncensored durations It would be convenient if OLS using
only the uncensored observations were consistent for b, but such is not the case.
From equation (16.14) it is also pretty clear that regressing yion xiusing all of the
data will not consistently estimate b: Eð y j xÞ is nonlinear in x, b, and s, so it would
be a fluke if a linear regression consistently estimated b.
There are some interesting theoretical results about how the slope coe‰cients in b
can be estimated up to scale using one of the two OLS regressions that we have cussed Therefore, each OLS coe‰cient is inconsistent by the same multiplicativefactor This fact allows us—both in data-censoring applications and corner solutionapplications—to estimate the relative e¤ects of any two explanatory variables Theassumptions made to derive such results are very restrictive, and they generally ruleout discrete and other discontinuous regressors [Multivariate normality of ðx; yÞ issu‰cient.] The arguments, which rely on linear projections, are elegant—see, for ex-ample, Chung and Goldberger (1984)—but such results have questionable practicalvalue
dis-The previous discussion does not mean a linear regression of yi on xi is mative Remember that, whether or not the Tobit model holds, we can always writethe linear projection of y on x as Lð y j xÞ ¼ xg for g ¼ ½Eðx0xÞ1Eðx0yÞ, under themild restriction that all second moments are finite It is possible that gj approximatesthe e¤ect of xj on Eð y j xÞ when x is near its population mean Similarly, a linear re-gression of yi on xi, using only observations with yi>0, might approximate thepartial e¤ects on Eð y j x; y > 0Þ near the mean values of the xj Such issues have notbeen fully explored in corner solution applications of the Tobit model
uninfor-16.4 Estimation and Inference with Censored Tobit
Letfðxi; yiÞ: i ¼ 1; 2; Ng be a random sample following the censored Tobit model
To use maximum likelihood, we need to derive the density of yi given xi We havealready shown that fð0 j xiÞ ¼ Pð yi¼ 0 j xiÞ ¼ 1 Fðxib=sÞ Further, for y > 0,
Pð yiay j xiÞ ¼ Pð y
i ay j xiÞ, which implies that
fðy j xiÞ ¼ fðy j xiÞ; all y > 0
where fð j xiÞ denotes the density of yigiven xi (We use y as the dummy argument
in the density.) By assumption, y
i j xi@ Normalðxib;s2Þ, so
fðy j xiÞ ¼1
sf½ðy xibÞ=s; y < y < y
Trang 10(As in recent chapters, we will use b and s2 to denote the true values as well asdummy arguments in the log-likelihood function and its derivatives.) We can writethe density for yigiven xi compactly using the indicator function 1½ as
fðy j xiÞ ¼ f1 Fðxib=sÞg1½ y¼0fð1=sÞf½ðy xibÞ=sg1½ y>0 ð16:19Þ
where the density is zero for y < 0 Let y 1 ð b0;s2Þ0denote theðK þ 1Þ 1 vector ofparameters The conditional log likelihood is
liðyÞ ¼ 1½ yi¼ 0 log½1 Fðxib=sÞ þ 1½ yi>0flog f½ð yi xibÞ=s logðs2Þ=2g
ð16:20ÞApart from a constant that does not a¤ect the maximization, equation (16.20) can bewritten as
ai¼ s2fxigfi ½fi2=ð1 FiÞ Fig
bi¼ s3fðxigÞ2fiþ fi ½ðxigÞfi2=ð1 FiÞg=2
ci¼ s4fðxigÞ3fiþ ðxigÞfi ½ðxigÞfi2=ð1 FiÞ 2Fig=4
g ¼ b=s, and fiand Fiare evaluated at xig This matrix is used in equation (13.32) to
obtain the estimate of Avarð ^yÞ See Amemiya (1973) for details
Testing is easily carried out in a standard MLE framework Single exclusionrestrictions are tested using asymptotic t statistics once ^bjand its asymptotic standarderror have been obtained Multiple exclusion restrictions are easily tested using the
LR statistic, and some econometrics packages routinely compute the Wald statistic
Trang 11If the unrestricted model has so many variables that computation becomes an issue,the LM statistic is an attractive alternative.
The Wald statistic is the easiest to compute for testing nonlinear restrictions on b,
just as in binary response analysis, because the unrestricted model is just standardTobit
16.5 Reporting the Results
For data censoring applications, the quantities of interest are the ^bj and their dard errors (We might use these to compute elasticities, and so on.) We interpret theestimated model as if there were no data-censoring problem, because the populationmodel is a linear conditional mean The value of the log-likelihood function should
stan-be reported for any estimated model stan-because of its role in obtaining likelihood ratiostatistics We can test for omitted variables, including nonlinear functions of alreadyincluded variables, using either t tests or LR tests All of these rely on the homo-skedastic normal assumption in the underlying population
For corner solution applications, the same statistics can be reported, and, in tion, we should report estimated partial e¤ects on Eð y j x; y > 0Þ and Eð y j xÞ The
addi-formulas for these are given in Section 16.2, where b and s are replaced with their
MLEs Because these estimates depend on x, we must decide at what values of x toreport the partial e¤ects or elasticities As with probit, the average values of x can beused, or, if some elements of x are qualitative variables, we can assign them values ofparticular interest For the important elements of x, the partial e¤ects or elasticitiescan be estimated at a range of values, holding the other elements fixed For example,
if x1 is price, then we can compute equation (16.11) or (16.16), or the correspondingelasticities, for low, medium, and high prices, while keeping all other elements fixed
If x1 is a dummy variable, then we can obtain the di¤erence in estimates with x1¼ 1and x1¼ 0, holding all other elements of x fixed Standard errors of these estimatescan be obtained by the delta method, although the calculations can be tedious.Example 16.3 (Annual Hours Equation for Married Women): We use the Mroz(1987) data (MROZ.RAW ) to estimate a reduced form annual hours equation formarried women The equation is a reduced form because we do not include hourlywage o¤er as an explanatory variable The hourly wage o¤er is unlikely to be exog-enous, and, just as importantly, we cannot observe it when hours¼ 0 We will showhow to deal with both these issues in Chapter 17 For now, the explanatory variablesare the same ones appearing in the labor force participation probit in Example 15.2
Of the 753 women in the sample, 428 worked for a wage outside the home duringthe year; 325 of the women worked zero hours For the women who worked positive
Trang 12hours, the range is fairly broad, ranging from 12 to 4,950 Thus, annual hoursworked is a reasonable candidate for a Tobit model We also estimate a linear model(using all 753 observations) by OLS The results are in Table 16.1.
Not surprisingly, the Tobit coe‰cient estimates are the same sign as the sponding OLS estimates, and the statistical significance of the estimates is similar.(Possible exceptions are the coe‰cients on nwifeinc and kidsge6, but the t statisticshave similar magnitudes.) Second, though it is tempting to compare the magnitudes
corre-of the OLS estimates and the Tobit estimates, such comparisons are not very mative We must not think that, because the Tobit coe‰cient on kidslt6 is roughlytwice that of the OLS coe‰cient, the Tobit model somehow implies a much greaterresponse of hours worked to young children
infor-We can multiply the Tobit estimates by the adjustment factors in equations (16.11)and (16.16), evaluated at the estimates and the mean values of the xj (but where wesquare exper rather than use the average of the exper2
iÞ, to obtain the partial e¤ects
on the conditional expectations The factor in equation (16.11) is about 451 Forexample, conditional on hours being positive, a year of education (starting fromthe mean values of all variables) is estimated to increase expected hours by about
Table 16.1
OLS and Tobit Estimation of Annual Hours Worked
Dependent Variable: hours
(2.54)
8.81 (4.46)
(12.95)
80.65 (21.58)
(9.96)
131.56 (17.28)
(.325)
1.86 (0.54)
(4.36)
54.41 (7.42)
(58.85)
894.02 (111.88)
(23.18)
16.22 (38.64)
(270.78)
965.31 (446.44)
^
Trang 13.451(80.65) A 36.4 hours Using the approximation for one more young child gives a fall in expected hours by about (.451)(894.02) A 403.2 Of course, this figure does not
make sense for a woman working less than 403.2 hours It would be better to estimatethe expected values at two di¤erent values of kidslt6 and form the di¤erence, ratherthan using the calculus approximation
The factor in equation (16.16), again evaluated at the mean values of the xj, isabout 645 This result means that the estimated probability of a woman being in theworkforce, at the mean values of the covariates, is about 645 Therefore, the mag-nitudes of the e¤ects of each xj on expected hours—that is, when we account forpeople who initially do not work, as well as those who are initially working—is largerthan when we condition on hours > 0 We can multiply the Tobit coe‰cients, at leastthose on roughly continuous explanatory variables, by 645 to make them roughlycomparable to the OLS estimates in the first column In most cases the estimatedTobit e¤ect at the mean values are significantly above the corresponding OLSestimate For example, the Tobit e¤ect of one more year of education is about
.645(80.65) A 52.02, which is well above the OLS estimate of 28.76.
We have reported an R-squared for both the linear regression model and the Tobitmodel The R-squared for OLS is the usual one For Tobit, the R-squared is thesquare of the correlation coe‰cient between yi and ^yi, where ^yi¼ Fðxib=^^ sÞxi^þ
^
s
sfðxib=^^ sÞ is the estimate of Eð y j x ¼ xiÞ This statistic is motivated by the fact thatthe usual R-squared for OLS is equal to the squared correlation between the yi andthe OLS fitted values
Based on the R-squared measures, the Tobit conditional mean function fits thehours data somewhat better, although the di¤erence is not overwhelming However, weshould remember that the Tobit estimates are not chosen to maximize an R-squared—they maximize the log-likelihood function—whereas the OLS estimates produce thehighest R-squared given the linear functional form for the conditional mean
When two additional variables, the local unemployment rate and a binary city dicator, are included, the log likelihood becomes about 3,817.89 The likelihoodratio statistic is about 2(3,819.09 3,817.89) ¼ 2.40 This is the outcome of a w2
in-2variate under H0, and so the p-value is about 30 Therefore, these two variables arejointly insignificant
16.6 Specification Issues in Tobit Models
16.6.1 Neglected Heterogeneity
Suppose that we are initially interested in the model
y¼ maxð0; xb þ gq þ uÞ; uj x; q @ Normalð0; s2Þ ð16:24Þ
Trang 14where q is an unobserved variable that is assumed to be independent of x and has aNormalð0; t2Þ distribution It follows immediately that
y¼ maxð0; xb þ vÞ; vj x @ Normalð0; s2þ g2t2Þ ð16:25ÞThus, y conditional on x follows a Tobit model, and Tobit of y on x consistently
estimates b and h21s2þ g2t2 In data-censoring cases we are interested in b; g is of
no use without observing q, and g cannot be estimated anyway We have shown thatheterogeneity independent of x and normally distributed has no important con-sequences in data-censoring examples
Things are more complicated in corner solution examples because, at least initially,
we are interested in Eð y j x; qÞ or Eð y j x; q; y > 0Þ As we discussed in Sections 2.2.5and 15.7.1, we are often interested in the average partial e¤ects (APEs), where, say,
Eð y j x; qÞ is averaged over the population distribution of q, and then derivatives ordi¤erences with respect to elements of x are obtained From Section 2.2.5 we knowthat when the heterogeneity is independent of x, the APEs are obtained by finding
Eð y j xÞ [or Eð y j x; y > 0Þ] Naturally, these conditional means come from the tribution of y given x Under the preceding assumptions, it is exactly this distributionthat Tobit of y on x estimates In other words, we estimate the desired quantities—the APEs—by simply ignoring the heterogeneity This is the same conclusion wereached for the probit model in Section 15.7.1
dis-If q is not normal, then these arguments do not carry over because y given x doesnot follow a Tobit model But the flavor of the argument does A more di‰cult issuearises when q and x are correlated, and we address this in the next subsection.16.6.2 Endogenous Explanatory Variables
Suppose we now allow one of the variables in the Tobit model to be endogenous Themodel is
where ðu1; v2Þ are zero-mean normally distributed, independent of z If u1and v2 arecorrelated, then y2 is endogenous For identification we need the usual rank condi-
tion d220 0; Eðz0zÞ is assumed to have full rank, as always
If equation (16.26) represents a data-censoring problem, we are interested, as always,
in the parameters, d1and a1, as these are the parameters of interest in the uncensoredpopulation model For corner solution outcomes, the quantities of interest are moresubtle However, when the endogeneity of y2 is due to omitted variables or simulta-
neity, the parameters we need to estimate to obtain average partial e¤ects are d , a ,
Trang 15and s2¼ Varðu1Þ The reasoning is just as for the probit model in Section 15.7.2.Holding other factors fixed, the di¤erence in y1 when y2changes from y2to y2þ 1 ismax½0; z1d1þ a1ðy2þ 1Þ þ u1 max½0; z1d1þ a1y2þ u1
Averaging this expression across the distribution of u1 gives di¤erences in tions that have the form (16.14), with x¼ ½z1;ðy2þ 1Þ in the first case, x ¼ ðz1; y2Þ
expecta-in the second, and s¼ s1 Importantly, unlike in the data censoring case, we need toestimate s2
1 in order to estimate the partial e¤ects of interest (the APEs)
Before estimating this model by maximum likelihood, a procedure that requiresobtaining the distribution of ð y1; y2Þ given z, it is convenient to have a two-stepprocedure that also delivers a simple test for the endogeneity of y2 Smith andBlundell (1986) propose a two-step procedure that is analogous to the Rivers-Vuongmethod (see Section 15.7.2) for binary response models Under bivariate normality of
ðu1; v2Þ, we can write
where y1¼ h1= 22, h1¼ Covðu1; v2Þ, t2
2 ¼ Varðv2Þ, and e1is independent of v2with azero-mean normal distribution and variance, say, t2 Further, becauseðu1; v2Þ is in-dependent of z, e1 is independent of ðz; v2Þ Now, plugging equation (16.28) intoequation (16.26) gives
y1 ¼ maxð0; z1d1þ a1y2þ y1v2þ e1Þ ð16:29Þ
where e1j z; v2@ Normalð0; t2
1Þ It follows that, if we knew v2, we would just estimate
d1, a1, y1, and t2
1 by standard censored Tobit We do not observe v2 because it
depends on the unknown vector d2 However, we can easily estimate d2 by OLS in afirst stage The Smith-Blundell procedure is as follows:
Procedure 16.1: (a) Estimate the reduced form of y2 by OLS; this step gives ^d2.Define the reduced-form OLS residuals as ^vv2 ¼ y2 z ^d2
(b) Estimate a standard Tobit of y1 on z1, y2, and ^vv2 This step gives consistent
estimators of d1, a1, y1, and t2
1.The usual t statistic on ^vv2 reported by Tobit provides a simple test of the null
H0: y1¼ 0, which says that y2 is exogenous Further, under y1¼ 0, e1¼ u1, and sonormality of v2 plays no role: as a test for endogeneity of y2, the Smith-Blundellapproach is valid without any distributional assumptions on the reduced form of y2.Example 16.4 (Testing Exogeneity of Education in the Hours Equation): As an illus-tration, we test for endogeneity of educ in the reduced-form hours equation in Example16.3 We assume that motheduc, fatheduc, and huseduc are exogenous in the hours
Trang 16equation, and so these are valid instruments for educ We first obtain ^vv2as the OLSresiduals from estimating the reduced form for educ When ^vv2 is added to the Tobitmodel in Example 16.3 (without unem and city), its coe‰cient is 39.88 with tstatistic¼ 91 Thus, there is little evidence that educ is endogenous in the equation.The test is valid under the null hypothesis that educ is exogenous even if educ doesnot have a conditional normal distribution.
When y100, the second-stage Tobit standard errors and test statistics are notasymptotically valid because ^d2 has been used in place of d2 Smith and Blundell(1986) contain formulas for correcting the asymptotic variances; these can be derivedusing the formulas for two-step M-estimators in Chapter 12 It is easily seen that jointnormality of ðu1; v2Þ is not absolutely needed for the procedure to work It su‰cesthat u1 conditional on z and v2 is distributed as Normalðy1v2;t2
1Þ Still, this is a fairlyrestrictive assumption
When y100, the Smith-Blundell procedure does not allow us to estimate s2,which is needed to estimate average partial e¤ects in corner solution outcomes.Nevertheless, we can obtain consistent estimates of the average partial e¤ects byusing methods similar to those in the probit case Using the same reasoning in Sec-tion 15.7.2, the APEs are obtained by computing derivatives or di¤erences of
Ev2½mðz1d1þ a1y2þ y1v2;t21Þ ð16:30Þ
where mðz; s2Þ 1 Fðz=sÞz þ sfðz=sÞ and Ev2½ denotes expectation with respect tothe distribution of v2 Using the same argument as in Section 16.6.1, expression(16.30) can be written as mðz1d1þ a1y2;y12t22þ t2
1Þ Therefore, consistent estimators
of the APEs are obtained by taking, with respect to elements of ðz1; y2Þ, derivatives
or di¤erences of
mðz1^1þ ^a1y2; ^y12^22þ ^tt12Þ ð16:31Þwhere all estimates except ^tt2
2 come from step b of the Smith-Blundell procedure; ^tt2
2 issimply the usual estimate of the error variance from the first-stage OLS regression
As in the case of probit, obtaining standard errors for the APEs based on expression(16.31) and the delta method would be quite complicated An alternative procedure,where mðz1^1þ ^a1y2þ ^y1^i2; ^tt2Þ is averaged across i, is also consistent, but it doesnot exploit the normality of v2
A full maximum likelihood approach avoids the two-step estimation problem Thejoint distribution ofð y1; y2Þ given z is most easily found by using
fð y1; y2j zÞ ¼ f ð y1j y2; zÞ f ð y2j zÞ ð16:32Þ