The index function model cast in terms of underlying continuous latent variables provides the empirical counterpart of many theoretical models in labor economics.. Introduction The crit
Trang 12.1 Regression functions characterizations
2.2 Dummy endogenous variable models
Applications of the index function model
3.1 Models with the reservation wage property
3.2 Prototypical dummy endogenous variable models
3.3 Hours of work and labor supply
P Sloan Foundation This paper has benefited greatly from comments generously given by Ricardo Barros, Mark Gritz, Joe Hotz, and Frank Howland
Hmdhook of Econometrics, Volume 111, Edited by Z Griliches und M.D Intriligutor
0 Elsevier Science Publishers BV, I986
Trang 21918 J J Heckmun and T E MuCurdy
0 Introduction
In the past twenty years, the field of labor economics has been enriched by two developments: (a) the evolution of formal neoclassical models of the labor market and (b) the infusion of a variety of sources of microdata This essay outlines the econometric framework developed by labor economists who have built theoreti- cally motivated models to explain the new data
The study of female labor supply stimulated early research in labor economet- rics In any microdata study of female labor supply, two facts are readily apparent: that many women do not work, and that wages are often not available for nonworking women To account for the first fact in a theoretically coherent framework, it is necessary to model corner solutions (choices at the extensive margin) along with conventional interior solutions (choices at the intensive margin) and to develop an econometrics sufficiently rich to account for both types
of choices by agents Although there were precedents for the required type of econometric model in work in consumer theory by Tobin (1958) and his students [e.g Rosett (1959)], it is fair to say that labor economists have substantially improved the original Tobin framework and have extended it in various im- portant ways to accommodate a variety of models and types of data To account for the second fact that wages are missing in a nonrandom fashion for nonwork- ing women, it is necessary to develop models for censored random variables The research on censored regression models developed in labor economics had no precedent in econometrics and was largely neglected by statisticians (See the essay
by Griliches in this volume)
The econometric framework developed for the analysis of female labor supply underlies more recent models of job search [Yoon (1981), Kiefer and Neumann (1979), Flinn and Heckman (1982)], occupational choice [Roy (1951), Tinbergen (1951), Siow (1984), Willis and Rosen (1979), Heckman and Sedlacek (1984)], job turnover [Mincer and Jovanovic (1981) Borjas and Rosen (1981), Flinn (1984)], migration [Robinson and Tomes (1982)], unionism [Lee (1978) Strauss and Schmidt (1976), Robinson and Tomes (1984)] and training evaluation [Heckman and Robb (1985)]
All of the recent models presented in labor econometrics are special cases of an index function model The origins of this model can be traced to Karl Pearson’s (1901) work on the mathematical theory of evolution See D J Kevles (1985, p 31) for one discussion of Pearson’s work In Pearson’s framework, discrete and censored random variables are the manifestations of underlying continuous random variables subject to various sampling schemes Discrete random variables are indicators of whether or not certain latent continuous variables lie above or
Trang 3Ch 32: L.&or Econometrics 1919 below given thresholds Censored random variables are direct observations on the underlying random variables given that certain selection criteria are met Assum- ing that the underlying continuous random variables are normally distributed leads to the theory of biserial and tetrachoric correlation [See Kendall and Stuart (1967, Vol II), for a review of this theory.] Later work in mathematical psy- chology by Thurstone (1927) and Bock and Jones (1968) utilized the index function framework to produce mathematical models of choice among discrete alternatives and stimulated a considerable body of ongoing research in economics [See McFadden’s paper in Volume II for a survey of this work and Lord and Novick (1968) for an excellent discussion of index function models used in psychometrics]
The index function model cast in terms of underlying continuous latent variables provides the empirical counterpart of many theoretical models in labor economics For example, it is both natural and analytically convenient to for- mulate labor supply or job search models in terms of unobserved reservation wages which can often be plausibly modeled as continuous random variables When reservation wages exceed market wages, people do not work If the opposite occurs, people work and wages are observed A variety of models that are special cases of the reservation wage framework will be presented below in Section 3
The great virtue of research in labor econometrics is that the problems and the solutions in the field are the outgrowth of research on well-posed economic problems In this area, the economic problems lead and the proposed statistical solutions follow in response to specific theoretical and empirical challenges This imparts a vitality and originality to the field that is not found in many other branches of econometrics
One format for presenting recent developments in labor econometrics is to chart the history of the subject, starting with the earliest models, and leading up
to more recent developments This is the strategy we have pursued in previous joint work [Heckman and MaCurdy (1981); Heckman, Killingsworth and MaCurdy (1981)] The disadvantage of such a format is that basic statistical ideas become intertwined with specific economic models, and general econometric points are sometimes difficult to extract
This paper follows another format We first state the basic statistical and econometric principles We then apply them in a series of worked examples This format has obvious pedagogical advantages At the same time, it artihcially separates economic problems from econometric theory and does not convey the flow of research problems that stimulated the econometric models
This paper is in three parts: Part 1 presents a general introduction to the index function framework; Part 2 presents methods for estimating index function models; and Part 3 makes the discussion concrete by presenting a series of models
in labor economics that are special cases of the index function framework
Trang 4J J Heckman and T E MaCurdy
1 The index function model
1.1 Introduction
The critical assumption at the heart of index function models is that unobserved
or partially observed continuous random variables generate observed discrete, censored, and truncated random variables The goal of econometric analysis conducted for these models is to recover the parameters of the distributions of the underlying continuous random variables
The notion that continuous latent variables generate observed discrete, censored and truncated random variables is natural in many contexts For example, in the discrete choice literature surveyed by McFadden (1985), the difference between the utility of one option and the utility of another is often naturally interpreted as
a continuous random variable, especially if, as is sometimes plausible, utility depends on continuously distributed characteristics When the difference of utilities exceeds a threshold (zero in this example), the first option is selected The underlying utilities of choices are never directly observed
As another example, many models in labor economics are characterized by a
“reservation wage” property Unemployed persons continue to search until their reservation wage - a latent variable-is less their the offered wage The difference between reservation wages and offered wages is a continuous random variable if some of the characteristics generating reservation wages are continuous random variables The decision to stop searching is characterized by a continuous latent variable falling below a threshold (zero) Observed wages are censored random variables with the censoring rule characterized by a continuous random variable (the difference between reservation wages and market wages) crossing a threshold Further examples of index functions generated by economic models are presented
in Section 3
From the vantage point of context-free statistics, using continuous latent variables to generate discrete, censored or truncated random variables introduces unnecessary complications into the statistical analysis Despite its ancient heri- tage, the index function approach is no longer widely used or advocated in the modern statistics literature [See, e.g Bishop, Fienberg and Holland (1975) or Haberman (1978) Volumes I and II.]’ Given their disinterest in behavioral models, many statisticians prefer direct parameterizations of discrete data and censored data models that typically possess no behavioral interpretation Some statisticians have argued that econometric models that incorporate behavioral
‘Such models are still widely used in the psychometric literature See Lord and Novick (1968) or Bock and Jones (1968)
Trang 5Ch 3.?: Lohor Econometr~ts 1921
theory are needlessly complicated For this reason labor economics has been the locus of recent research activity on index function models
1.2 Some dejnitions and basic ideas
Index functions are defined as continuously distributed random variables It is helpful to distinguish two types of index functions: those corresponding to continuous random variables that are not directly observed in a given context (2) and those corresponding to continuous random variables that are partially observed (Y) in a sense to be made precise below In the subsequent discussion, the set ti represents the support (or the domain of definition) of (Y, Z); the set 0 denotes the support of Z, and * is the support of Y; &? is the Cartesian product
of \k and 0.2
I 2.1 Quan tal response models
We begin with the most elementary index function model This model ignores the existence of Y and focuses on discrete variables whose outcomes register the occurrence of various states of the world Let 0, be a nontrivial subset of 0 Although we do not directly observe Z, we know if
If this event occurs, we denote it by setting an indicator function ai equal to one More formally,
‘a, 0 and * and all partitions of these sets considered in this paper are assumed to be Bore1 sets
Trang 61922 J J Heckman und T E MaC’ur<p
sponds to the subspace of 2 defined by the inequalities
1.2.2 Models involving endogenous discrete and continuous random variables
We now consider a selection mechanism which records observations on Y only if (Y, Z) lies in some subspace of 0 More formally, we define the observed value of
A special case of this selection mechanism produces truncated random vari- ables Y * is a truncated random variable if the event (Y, Z) E a, implies that Y must lie in a strict subset of its support 9 Thus Y * is observed only in certain ranges of values of Y For example, negative income tax experiments sample only low income persons Letting Y be income, Y * is only observed in data from such experiments if Y is below the cut off point for inclusion of observations into the experiments
‘Exogenous random variables are always observed and have a marginal density that shares no parameters in common with the conditional distribution of the endogenous variables given the
Trang 7Observed values of Y produced by the general selection mechanism (1.2.4) without restrictions on the range of Y * are censored random variables As an example of a censored random variable, consider the analysis of Cain and Watts (1973) Let Y be hours of work, Z, be wage rates, and Z, denote unearned income, where the Z, are assumed to be unobserved in this context Negative income tax experiments observe Y only for low income people (i.e people for whom Z,Y + Z, is sufficiently low) While sampled hours of work - Y * -may take on all values assumed by Y, the density of Y * may differ greatly from the density of Y
A useful extension of the selection mechanism presented in eq (1.2.4) is a multi-state model which defines observed values of Y for various states of the world indexed by i, i = 1, , I For state i we define the observed value of Y as
where the Oi’s are subsets of 0, and I, ( 4 I) is the number of states in which Y
is observed In the remaining states (I - Zi in number), Y is not observed We define an indicator variable ai by
The variable Y * = ~fl=,~* equals Y if it is observed (i.e if ai = 1 for some i=l , , Ii), and Y * = 0 if any of the states i = Z, + 1, _ , Z occur In other words,
Y is observed when cj’=i6, = 1
To obtain specifications of various density functions that are useful in the econometrics of labor supply, rationing and state contingent demand theory, let f( y, z) be the joint density of (Y, Z) Denote the conditional support of Z when (Y, Z) E Qi as Oily which is defined so that, for any fixed Y = y E qlk,, the set of admissible Y values in fii, the event Z E Oily necessarily implies Si = 1; the set Oilv in general depends on Y = y In this notation, the density of y* conditional
Trang 81924 J J Heckman and T E MaCurdy
with
Pr(8, =l) = Pr((Y, Z) E tii) = /of(y, z)dydl,
where the notation Jo,,, and Jo denotes integration over the sets 0 given y and
9, respectively - i.e
and
The function gi( ) is the conditional density of Y given that selection rule (1.2.6)
is satisfied As a consequence of convention (1.2.7) the distribution of Yj* when
& = 0 has point mass at q* = 0 (i.e Pr(q* = 016, = 0) = 1)
The joint density of Y* and Si is
gi(y,*,ai) = [gi(y~)Pr(Gi=1)]“‘[J(y~)Pr(Si=0)]’-6’, i=l ,***, 1,
(l!2.9) where J( yi*) = 1 if y: = 0 and J( y,*) = 0 otherwise, where Pr( i_$ = 0) = 1 - Pr( ai
= l), and where we adopt the convention that zero raised to the zero-th power equals one (i.e when ai = 0 and yr 4 ‘k; so gi( y:) = 0, then [ gi( yT)Pr(6; = l)]’
= l)?
From (1.2.9) the conditional density of Y * given that state i = 1, , I, occurs
is
Y * is defined to be degenerate at zero if one of the other states i = I1 + 1, , I
occurs A compact expression for the conditional density of Y * is
~(y*l~1, , ST> = i&I [gi(Y*)ls87 if C ai= ’ (1.2.10)
i=l
4 We use the term “density” in the sense of the product measure d [ yl + K, ( y, )] X d [ K, (z ) + Kl (z )]
on R’ x[O,l] where dy is Lebesgue measure on R’ and K,(z) is the probability distribution that assigns the point a in R’ unit mass
Trang 9Ch 32: Labor Econometrics 1925
with Y * = 0 with probability one when 6; = 1 for some value of i = I, + 1, , I
The joint density of Y * and S,, , 6, is the product of the conditional density of Y* (1.2.10) and the joint probability of 6 ,, , 6,; i.e
~b*Jl, , sf> = lfil [gi(Y*)Pr(sz =l>l 8’i=++l [J(Y?)Pr(si=l)l SC
I
(1.2.11)
In some problems the particular state of the world in which an observation occurs is unknown (i.e the ai’s are not separately observed); it is only known that one of a subset of states has occurred Given information on Y *, one can determine whether or not one of the first Ii states has occurred-since Y * # 0 indicates ai = 1 for some i 2 I, and Y * = 0 indicates 6, = 1 for some i > II -but it may not be possible to determine the particular i for which Si = 1
For example, suppose that when Y * # 0, one only knows that either 8, = C)=,S, = 1 or S, = C!\
known that & = c!_
r-I,+ iSi = 1 Suppose further that when Y * = 0, it is only ,_[ +iSi =l The densities (1.2.10) and (1.2.11) cannot directly
be used as a basis for inference in this situation (Unless, of course, 1, = 1, 1, = 2, and I = 3.)
The densities appropriate for analyzing data on Y * and the 8,‘s are obtained
by conditioning on the available knowledge about states The desired densities are derived by computing the expected value of (1.2.10) to eliminate the individual 8,‘s that are not observed In particular, the marginal density of y* given 8, = 1 is given by the law of iterated expectations as
When 8, = 1, Y * is degenerate at zero Thus the density of Y * conditional on the
‘These derivations use the fact that the sets D, are qmtually exclusive so Pr( 8, = 1) = Ez, Pr( 6, = 1) and E( 13, ) 6, = 1J = Pr( 8, = 1 IS, = 1) = Pr( 8, = l)/Pr( 6, = l), with completely analogous results hold-
Trang 101926 J J Heckman and T E MaCur&
Densities of the form (1.2.8)-(1.2.13) appear repeatedly in the models for the analysis of labor supply presented in Section 3.3
All the densities in the preceding analysis can be modified to depend on exogenous variables X, as can the support of the selection region (i.e 9, = Q;(X)) Writing f( y, z IX) to denote the appropriate conditional density, only obvious notational modifications are required to introduce such variables
I 3 Sampling plans
A variety of different sampling plans are used to collect the data available to labor economists The econometric implications of data collected from such sampling plans have received a great deal of attention in the discrete choice and labor econometrics literatures In this subsection we define the concepts of simple random samples, truncated random samples, censored random samples, stratified random samples, and choice based samples To this end we let h(X) denote the
population density of the exogenous variables X, so that the joint density of (Y,S,X) is
Trang 11in place of the random variables
Next suppose that from a simple random sample, observations on (Y, 6, X) are retained only if these random variables lie in some open subset of the support of (Y, 6, X) More precisely suppose that observations on (Y, S, X) are retained only
if
where A is the support of random variables (Y, 6, X)
In the classical statistical literature [See, e.g Kendall and Stuart, Vol II, (1967)]
no regressors are assumed to appear in the model In this case, a sample is defined
to be censored if the number of observations not in A, is recorded (so S is known for all observations) If this information is not retained, the sample is truncated
When regressors are present, there are several ways to extend these definitions allowing either S or X to be recorded when (Y, 8, X) P A, In this paper we adopt the following conventions If information on (8, X) for all (Y, 6, X) e A, is retained (but Y is not known), we call the sample censored If information on (8, X) is not retained for (Y, S, X) P A,, the sample is truncated Note that in these definitions A, can consist of disconnected sets of A
One operational difference between censored and truncated samples is that for censored samples it is possible to consistently estimate the population probability that (Y, 6, X) E A,, whereas for truncated samples these probabilities cannot be consistently estimated as sample sizes become large In neither sample is it possible to directly estimate the conditional distribution of (Y, 6, X) given (Y, 6, X) GE A, using an empirical c.d.f for this subsample
‘It is possible to estimate this conditional distribution using the subsample generated by the rcguircmcnt that (Y, 6, X) E A, for certain specific functional form assumptions for F Such forms for
F are termed “recoverable” in the literature See Heckman and Singer (1986) for further discussion of
Trang 121928 J J Heckman and T E MaCurdy
In the special case in which the subset A, only restricts the support of X, (exogenous truncated and censored samples), the econometric analysis can pro- ceed conditional on X In light of the assumed exogeneity of X, the only possible econometric problem is a loss in efficiency of proposed estimators
Truncated and censored samples are special cases of the more general notion of
a strati$ed sample In place of the special sampling rule (1.3.3), in a general stratified sample, the rule for selecting independent observations is such that even
in an infinite sample the probability that (Y, 8, X) E Ai c A does not equal the population probability that (Y, 6, X) E A, where U f,iAi = A, and A i and A j are disjoint for all i # j It is helpful to further distinguish between exogenous&
stratijied and endogenously strati$ed samples
In an exogenously stratified sample, selection occurs solely on the X in the sense that the sample distribution of X does not converge to the population distribution of X even as the sample size is increased This may occur because data are systematically missing for X in certain regions of the support, or more generally because some subsets of the support of X are oversampled However, conditional on X, the sample distribution of (Y, 6 ] X) converges to the population distribution By virtue of the assumed exogeneity of X, such a sampling scheme creates no special econometric problems
In an endogenously stratified sample, selection occurs on (Y, 8) (and also possibly on the X), and the sampling rule is such that the sample distribution of (Y, S) does not converge to the population distribution F( Y, 8) (conditional or unconditional on X) This can occur because data are missing for certain values
of Y or 6 (or both), or because some subsets of the support of these random variables are oversampled The special case of an endogenously stratified sample
in which, conditional on (Y, S), the population density of X characterizes the data, i.e
‘Strictly speaking, the choice based sampling literature focuses on a model in which Y is integrated
Trang 13Ch 32: L&or Econometrics 1929
exogenous in such samples, and its distribution is informative on the structural parameters of the model
Truncated and censored samples are special cases of a general stratified sample
A truncated sample is produced from a general stratified sample for which the sampling weight for the event (Y, S, X) 4 A, is identically zero In a censored sample, the sampling weight for the event (Y, 6, X) GE A, is the same as the population probability of the event
Note that in a truncated sample, observed Y may or may not be a truncated random variable For example, if A, only restricts 6, and 6 does not restrict the support of Y, observed Y is a censored random variable On the other hand, if A, restricts the support of Y, observed Y is a truncated random variable Similarly in
a censored sample, Y may or may not be censored For example, if A, is defined only by a restriction on values that 6 can assume, and 6 does not restrict the support of Y, observed Y is censored If A, is defined by a restriction on the support of Y, observed Y is truncated even though the sample is censored An unfortunate and sometimes confusing nomenclature thus appears in the literature The concepts of censored and truncated random variables are to be carefully distinguished from the concepts of censored and truncated random samples Truncated and censored sample selection rule (1.3.3) is essentially identical to the selection rule (1.2.6) (augmented to include X in the manner suggested at the end of subsection 1.2) Thus the econometric analysis of models generated by rules such as (1.2.6) can be applied without modification to the analysis of models estimated on truncated and censored samples The same can be said of the econometric analysis of models fit on all stratified samples for which the sampling rule can be expressed as some restriction on the support of (Y, Z, 6, X) In the recent research in labor econometrics, all of the sample selection rules considered can be written in this form, and an analysis based on samples generated by (augmented) versions of (1.2.6) captures the essence of the recent literature.8
2 Estimation
The conventional approach to estimating the parameters of index function models postulates specific functional forms for f( y, z) or f(y, z IX) and estimates the parameters of these densities by the method of maximum likelihood or by the method of moments Pearson (1901) invoked a normality assumption in his original work on index function models and this assumption is still often used in
‘We note, however, that it is possible to construct examples of stratified sample selection rules that cannot be cast in this format For example, selection rules that weight various strata in different (nonzero) proportions than the population proportions cannot be cast in the form of selection rule
Trang 141930 J J Heckman and T E MaCur&
recent work in labor econometrics The normality assumption has come under attack in the recent literature because when implications of it have been subject to empirical test they have often been rejected
It is essential to separate conceptual ideas that are valid for any index function model from results special to the normal model Most of the conceptual frame- work underlying the normal index model is valid in a general nonnormal setting
In this section we focus on general ideas and refer the reader to specific papers in the literature where relevant details of normal models are presented
For two reasons we do not discuss estimation of index function models by the method of maximum likelihood First, once the appropriate densities are derived, there is little to say about the method beyond what already appears in the literature [See Amemiya (1985).] We devote attention to the derivation of the appropriate densities in Section 3 Second, it is our experience that the conditions required to secure identification of an index function model are more easily understood when stated in a regression or method of moments framework Discussions of identifiability that appeal ‘.o the nonsingularity of an information matrix have no intuitive appeal and cften degenerate into empty tautologies For these reasons we focus attention on regression and method of moments proce- dures
2.1 Regression function characterizations
We begin by presenting a regression function characterization of the econometric problems encountered in the analysis of data collected from truncated, censored and stratified samples and models with truncated and censored random variables
We start with a simple two equation linear regression specification for the underlying index functions and derive the conditional expectations of the ob- served counterparts of the index variables More elaborate models are then developed We next present several procedures for estimating the parameters of the regression specifications
2 I I A prototypical regression specijication
A special case of the index function framework set out in Section 1 writes Y and
Z as scalar random variables which are assumed to be linear functions of a common set of exogenous variables X and unobservables U and V respectively.’
‘) By exogenous variables we mean that X is observed and is distributed independently of (U, V) and that the parameters of the distribution of X are not functions of the parameters (fi, y) or the
Trang 15Z E 0, and state 0 is observed if Z 65 0, We later generalize the analysis to consider inclusion rules that depend explicitly on Y and we also consider multi-state models
The joint density of (U, V), denoted by f( U, o), depends on parameters 4 and may depend on the exogenous variables X Since elements of /3, y, and J/ may be zero, there is no loss of generality in assuming that a common X vector enters (2.1.1), (2.1.2) and the density of (U, V)
As in Section 1, we define the indicator function
if ZEOt;
otherwise
In a censored regression model in which Y is observed only if 8 = 1, we define
Y * = Y if S = 1 and use the convention that Y * = 0 if S = 0 In shorthand notation
The conditional expectation of Y given 8 = 1 and X is
E(YlS=l,X)=xp+M,
where
(2.1.3)
M=M(Xy,J,)=E(U16=1,X),
is the conditional expectation of U given that X and Z E 0, If the disturbance U
is independent of I’, M = 0 If the disturbances are not independent, M is in general a nontrivial function of X and the parameters of the model (y, 4) Note that since Y * = SY, by the law of iterated expectations
E(Y*~X)=E(Y*~6=O,X)Pr(G=O~X)+E(Y*~6=1,X)Pr(G=lIX)
Trang 161932 J J Heckmm und T E MaCurdy
Applying the analysis of Section 1, the conditional distribution of U given X and 2~0~ is
For example, consider a variable X, that appears in both equations (so the jth coefficients of /3 and y are nonzero) A regression of Y on X fit on samples restricted to satisfy 6 = 1 that does not include M as a regressor produces coefficients that do not converge to /3 Letting “ *” denote the OLS coefficient,
where L,, is the probability limit of the coefficient of Xj in a projection of M on X.” Note t/hat if a variable X, that does not appear in (2.1.1) is introduced into a least squares equation that omits M, the least squares coefficient converges to
plim Sk = LMxk,
so X, may proxy M
The essential feature of both examples is that in samples selected so that 6 = 1,
X is no longer exogenous with respect to the disturbance term lJ* ( = SU)
“‘It is not the case that L MX, = (aM/aX,), although the approximation may be very close See
Trang 17C/l 37: Labor Econometrics 1933 although it is defined to be exogenous with respect to U The distribution of U *
depends on X (see the expression for M below (2.1.3)) As X is varied, the mean
of the distribution of U * is changed Estimated regression coefficients combine the desired ceteris paribus effect of X on Y (holding U * fixed) with the effect of
changes in X on the mean of U *
Characterizing a sample as a subsample from a larger random sample gener- ated by having Z E 0, encompasses two distinct ideas that are sometimes confused in the literature The first idea is that of self-selection For example, in a simple model of labor supply an individual chooses either to work or not to work
An index function Z representing the difference between the utility of working and of not working can be used to characterize this decision From an initial random sample, a sample of workers is not random since Z 2 0 for each worker The second idea is a more general concept-that of sample selection- which includes the first idea as a special case From a simple random sample, some rule
is used to generate the sample used in an empirical analysis These rules may or may not be the consequences of choices made by the individuals being studied Econometric solutions to the general sample selection bias problem and the self-selection bias problem are identical Both the early work on female labor supply and the later analysis of “experimental data” generated from stratified samples sought to eliminate the effects of sample selection bias on estimated structural labor supply and earnings functions
It has been our experience that many statisticians and some econometricians find these ideas quite alien From the context-free view of mathematical statistics,
it seems odd to define a sample of workers as a selected sample if the object of the empirical analysis is to estimate hours of work equations “After all,” the argument is sometimes made, “nonworkers give us no information about the determinants of working hours.”
This view ignores the fact that meaningful behavioral theories postulate a common decision process used by all agents (e.g utility maximization) In neoclassical labor supply theory all agents are assumed to possess preference orderings over goods and leisure Some agents choose not to work, but non- workers still possess well-defined preference functions Equations like (2.1.1) are defined for all agents in the population and it is the estimation of the parameters
of the population distribution of preferences that is the goal of structural econo-
metric analysis Estimating functions on samples selected on the basis of choices biases the estimates of the parameters of the distribution of population prefer- ences unless explicit account is taken of the sample selection rule in the estima- tion procedure.”
“Many statisticians implicitly adopt the extreme view that nonworkers come from a different population than workers and that there is no commonality of decision processes and/or parameter values in the two populations In some contexts (e.g in a single cross section) these two views are
Trang 181934 J J Heckman and T E MaCurdy
2.1.2 SpeciJcation for selection corrections
In order to make the preceding theory empirically operational it is necessary to know M (up to a vector of estimable parameters) One way to acquire this information is to postulate a specific functional form for it directly Doing so makes clear that conventional regression corrections for sample selection bias depend critically on assumptions about the correct functional form of the underlying regression eq (2.1.1) and the functional form of M
The second and more commonly utilized approach used to generate M pos- tulates specific functional forms for the density of (U, V) and derives the conditional expectation of (I given S and X Since in practice this density is usually unknown, it is not obvious that this route for selecting M is any less ad hoc than the first
One commonly utilized assumption postulates a linear regression relationship for the conditional expectation of U given V:
Equation (2.1.8) implies that the selection term M can be written as
M = E(U)6 =I, X) = nY(I’16 =l, x) (2.1.9) Knowledge of the marginal distribution of V determines the functional form of the selection bias term
Letting f,,(u) denote the marginal density of V, it follows from the analysis of Section 1 that
Trang 19as {aV: V+(Xy*)/aE@,} using any u > 0 where y * = uy The normalization for E(V*) that we adopt depends on the particular distribution under consider- ation
Numerous choices for h,(u) have been advanced in the literature yielding a wide variety of functional forms for (2.1.12) Table 1 presents various specifica- tions of f,(u) and the implied specifications for E(V16 = 1, X) = E(V’] 1/>
- Xy, X) proposed in work by Heckman (1976b, 1979), Goldberger (1983) Lee (1982), and Olson (1980) Substituting the formulae for the truncated means presented in the third column of the table into relation (2.1.4) produces an array
of useful expressions for the sample selection term M All of the functions appearing in these formulae-including the gamma, the incomplete gamma, and the distribution functions - are available on most computers
Inserting any of these expressions for M into eqs (2.1.3) or (2.1.4) yields an explicit specification for the regression relation associated with Y (or Y *) given the selection rule generating the data In order to generate (2.1.9) one requires a formula for the probability that 6 =l given X to complete the specification for E( Y *) Formula (2.1.13) gives the specification of this probability in terms of the cumulative distribution function of V
In place of the linear conditional expectation (2.1.8), Lee (1982) suggests a more general nonlinear conditional expectation of U given I/ Drawing on well-known results in the statistics literature, Lee suggests application of Edge- worth-type expansions For the bivariate Gram-Charlier series expansion, the conditional expectation of ZJ given V and exogenous X is
B(V)
E(U( v, x) = pv+ -
Trang 21E(Ul v> = J’-P + (v’ - lh2/2) + (v3 - 3v)(1113 - 3~)/6 (2.1.16) For this specification, (2.1.15) reduces to
(2.1.17)
where $I( ) and @( ) are, respectively, the density function and the cumulative distribution functions associated with a standard normal distribution, and rt, r2, and r3 are parameters.12
‘*The requirement that V is normally distributed is not as restrictive as it may first appear In particular, suppose that the distribution of V, F,( ) is not normal Defining J( ) as the transforma- tion W1 0 F,, , the random variable J(V) is normally distributed with mean zero and a variance equal
to one Define a new unobserved dependent variable Z, by the equation
Trang 22An obvious generalization of (2.1.8) or (2.1.16) assumes that
Cosslett (1984) presents a more robust procedure that can be cast in the format
of eq (2.1.19) With his methods it is possibie to consistently estimate the distribution of V, the functions mk, the parameters TV, and K the number of terms in the expansion In independent work Gallant and Nychka (1984) present
a more robust procedure for correcting models for sample selection bias assuming that the joint density of (U, V) is twice continuously differentiable Their analysis does not require specifications like (2.1.8), (2.1.14) or (2.1.18) or prior specifica- tion of the distribution of V
2.1.3 Multi-state generalizations
Among many possible generalizations of the preceding analysis, one of the most empirically fruitful considers the situation in which the dependent variable Y is generated by a different linear equation for each state of the world This model includes the “switching regression” model of Quandt (1958, 1972) The occur- rence of a particular state of the world results from Z falling into one of the mutually exclusive and exhaustive subsets of 0, O,, i = 0, , I The event Z E 0, signals the occurrence of the ith state of the world We also suppose that Y is observed in states i = 1, , I and is not observed in state i = 0 In state i > 0, the equation for Y is
Trang 23Ch .{7: Lcrhor Econometrics 1939
where the U,‘s are error terms with E(U,) = 0 Define U = (U,, , U,), and let
determining Z The value of the discrete dependent variable
For the first case, the regression
i=l
In the second case considered here not all states of the world are observed by the econometrician It often happens that it is known if Y is observed, and the
Trang 241940 J .I Heckmun und T E MaCurdy
value of Y is known if it is observed, but it is not kno;vn which of a number of possible states has occurred In such a case, one might observe whether 6, = 1 or
8, = 0 (i.e whether cf_,6; = 0 or c:= ,S, = l), but not individual values of the 8,‘s for i = 1, , I Examples of such situations are given in our discussion of labor supply presented in Section 3.3
To determine the appropriate regression equation for Y in this second case, it is necessary to compute the expected value of Y given by (2.1.22) conditional on
8, = 0 and X This expectation is
i=l
where P, = Prob( Z E @ilX).‘3 Relation (2.1 26 1 the regression of Y on X for ) ‘s the case in which Y is observed but the particular state occupied by an observation is not observed
Using (2.1.22), and recalling that Y * = Y(l - 8,) is a censored random variable, the regression of Y * on X is
2.1.4 Generalization of the regression framework
Extensions of the basic framework presented above provide a rich structure for analyzing a wide variety of problems in labor econometrics We briefly consider three useful generalizations
The first relaxes the linearity assumption maintained in the specification of the equations determining the dependent variables Y and Z In eqs (2.1.1) and (2.1.2) substitute h y( X, p) for Xp and h,( X, y) for Xy where h y( , ) and 131n order to obtain (2.1.26) we use the fact that the 0,‘s are nonintersecting sets so that
,~,+,=l,X) =Prob( &=l~&,=l,X)
=Prob
(
Trang 25h z( , ) are known nonlinear functions of exogenous variables and parameters Modifying the preceding analysis and formulae to accommodate this change in specification only requires replacing the quantities Xb and Xy everywhere by the functions h r and h, A completely analogous modification of the multi-state model introduces nonlinear specifications for the conditional expectation of Y in the various states
A second generalization extends the preceding framework of Sections 2.1.1-2.1.3
by interpreting Y, Z and the errors U and V as vectors This extension enables the analyst to consider a multiplicity of behavioral functions as well as a broad range
of sampling rules No conceptual problems are raised by this generalization but severe computational problems must be faced Now the sets 0, are multidimen- sional Tallis (1961) derives the conditional means relevant for the linear multi- variate normal model, but it remains a challenge to find other multivariate specifications that yield tractable analytical results Moreover, work on estimating the multivariate normal model has just begun [e.g see Catsiapsis and Robinson (1982)] A current area of research is the development of computationally tractable specifications for the means of the disturbance vector lJ conditional on the occurrence of alternative states of the world
A third generalization allows the sample selection rule to depend directly on realized values of Y For this case, the sets Oi are replaced by the sets Oi where (Y, Z) E fij designates the occupation of state i The integrals in the preceding
formulae are now defined over the Oi In place of the expression for the selection term M in (2.1.7), use the more general formula
where
P, = / 12,fuo(~ - W, z - Xy)dzd_v,
is the probability that S, = 1 given X This formula specializes to the expression (2.1.7) for M when 9, = {(Y, Z): - CO 5 Y< 60 and Z E O,}, i.e when Z alone determines whether state 1 occurs
2.1.5 Methods for estimating the regression specifications
We next consider estimating the regression specifications associated with the elementary two-state model (2.1.1) and (2.1.2) This simple specification is by far the most widely used model encountered in the literature Estimation procedures
Trang 261942 J J Heckman and T E MaCur+
available for this two-state model can be directly generalized to more complicated models
For the two-state model, expression (2.1.3) implies that the regression equation for Y conditional on X and 6 = 1 is given by
and 0’ = (p’, r’, y’, #‘), eq (2.1.29) can be written as
Since the disturbance e has a zero mean conditional on X and 6 = 1 and is distributed independently across the observations in the truncated sample, under standard conditions [see Amemiya (1985)] nonlinear least squares estimators of the parameters of this equation are both consistent and asymptotically normally distributed
Trang 271943
In general, the disturbance e is heteroscedastic, and the functional form of the heteroscedasticity is unknown unless the joint density f,, is specified As a consequence, when calculating the large-sample covariance matrix of 8, it is necessary to use methods proposed by Eicker (1963, 1967) and White (1981) to consistently estimate this covariance matrix in the presence of arbitrary hetero- scedasticity The literature demonstrates that the estimator 8 is approximately normally distributed in large samples with the true value 8 as its mean and a variance-covariance matrix given by HP ‘RH-’ with
(2.1.32)
where N is the size of the truncated sample, a6,/&3], denotes the gradient vector of g for the n th observation evaluated at 8, and d, symbolizes the least square residual for observation n Thus
For censored samples, two regression methods are available for estimating the parameters p, r, y, and 4 First, one can apply the nonlinear least squares procedure just described to estimate regression eq (2.1.30) In particular, reinter- preting the function g as g( X, 8) = [X/3 + m( Xy, $)r](l- F,( - Xy; $)), it is straightforward to write eq (2.1.30) in the form of an equation analogous to (2.1.31) with Y* and E replacing Y and e Since the disturbance E has a zero mean conditional on X and is distributed independently across the observations making up the censored sample, under standard regularity conditions nonlinear least squares applied to this equation yields a consistent estimator 8 with a large-sample normal distribution To account for potential heteroscedasticity compute the asymptotic variance-covariance matrix of 8 using the formula in (2.1.33) with the matrices H and R calculated by summing over the N * observations of the censored sample
A second type of regression procedure can be implemented on censored samples A two-step procedure can be applied to estimate the equation for Y given by (2.1.29) In the first step, obtain consistent estimates of the parameters y and J/ from a discrete choice analysis which estimates the parameters of P, From these estimates it is possible to consistently estimate m (or the variables in the vector m) More specifically, define 0; = (y’, 1c/‘) as a parameter vector which uniquely determines m as a function of X The log likelihood function for the independently distributed discrete variables S,, given X,,, n = 1, _ , N * is
E [6,ln(l-F,,(-X,y;J/))+(1-6,)ln(F,,(-X,y;~))l (2.1.34)
il =
Trang 281944 J J Heckman und T E MuCurdy
Under general conditions [See Amemiya (1985) for one statement of these conditions], maximum likelihood estimators of y and 1c/ are consistent, and with maximum likelihood estimates fiZ one can construct Cz, = m( X,7, I/J) for each observation In step two of the proposed estimation procedure, replace the unobserved variable m in regression eq (2.1.29) by its constructed counterpart A and apply linear least-squares to the resulting equation using only data from the subsample in which Y and X are observed Provided that the model is identified, the second step produces estimators for the parameters S; = (p’, 7’) that are both consistent and asymptotically normally distributed
When calculating the appropriate large-sample covariance matrix for least squares estimator 8i, one must account for the fact that in general the dis- turbances of the regression equation are heteroscedastic and that the variables fi are estimated quantities A consistent estimator for the covariance matrix which accounts for both of these features is given by
where Q4 is the covariance matrix for I!& estimated by maximum likelihood [minus the inverse of the Hessian matrix of (2.1.34)], and the matrices Q,, Q2, and Q, are defined by
14To derive the expression for the matrix C given by (2.1.35) we use the following result Let L,, = L( 8, X,,) denote the n th observation on the gradient of the likelihood function (2.1.34) with respect to B,, with this gradient viewed as a function of the data and the true value of 8,; and let w,, and eon be KJ,, and e, evaluated at the true parameter values Then E( w,J,,e,l,,Lk 18, = 1, X,) =
Trang 29Ch 32: I.uhor Econometrics 1945
The large-sample distribution for the two-step estimator is thus
2.2 Dummy endogenous variable models
One specialization of the general model presented in Section 2.1 is of special importance in labor economics The multi-state equation system (2.1.20)-(2.1.22)
is at the heart of a variety of models of the impact of unions, training, occupational choice, schooling, the choice of region of residence and the choice of industry on wages These models have attracted considerable attention in the recent literature
This section considers certain aspects of model formulation for this class of models Simple consistent estimators are presented for an empirically interesting subclass of these models These estimators require fewer assumptions than are required for distribution dependent maximum likelihood methods or for the sample selection bias corrections (M functions) discussed in Section 2.1
In order to focus on essential ideas, we consider a two-equation, two-state model with a single scalar dummy right-hand side variable that can assume two values Y is assumed to be observed in both states so that we also abstract from censoring Generalization of this model to the vector case is performed in Heckman (1976a, 1978, Appendix), Schmidt (1981), and Lee (1981)
2.2 I Specification of a two-equation system
Two versions of the dummy endogenous variable model are commonly confused
in the literature: fixed coefficient models and random coefficient models These specifications should be carefully distinguished because different assumptions are required to consistently estimate the parameters of these two distinct models The fixed coefficient model requires fewer assumptions
In the fixed coefficient model
z=xy+v,
where
if Z20, otherwise,
(2.2.2)
Trang 30U and V are mean zero random disturbances, and X is exogenous with respect to
U Simultaneous equation bias is present in (2.2.1) when lJ is correlated with S
In the random coefficient model the effect of 6 on Y (holding U fixed) varies in the population In place of (2.2.1) we write
where E is a mean zero error term l5 E q uation (2.2.2) is unchanged except now V may be correlated with E as well as U The response to 6 = 7 differs in the population, with successively sampled observations assumed to be random draws from a common distribution for (17, E, V) In this model X is assumed to be exogenous with respect to (U, E) Regrouping terms, specification (2.2.3) may be rewritten as
Unless 6 is uncorrelated with E (which occurs in some interesting economic models - see Section 3.2) the expectation of the composite error term U + ~8 in (2.2.4) is nonzero because E(b) # 0 This aspect of the random coefficient model makes its econometric analysis fundamentally different from the econometric analysis of the fixed coefficient model Simultaneous equations bias is present in the random coefficient model if the composite error term in (2.2.4) is correlated with 6
Both the random coefficient model and the fixed coefficient model are special cases of the multi-state “switching” model presented in Section 2.1.3 Rewriting random coefficient specification (2.2.3) as
this equation is of the form of multi-state eq (2.1.22) The equivalence of (2.2.5) and (2.1.22) follows directly from specializing the multi-state framework so that: (i) 6,~ 0 (so th at there is no censoring and Y = Y *); (ii) I = 2 (which along with (i) implies that th ere are two states); (iii) 6 = 1 indicates the occurrence of state 1 and the events 6, = 1 and 6, = 0 (with 1 - 6 = 1 indicating the realization of state 2); and (iv) X& = Xfi, Vi = U, Xfi, = X/3 + a, and U, = U + E In this notation eq (2.2.3) may be written as