This chapter continues our treatment of linear, unobserved e¤ects panel data models.We first cover estimation of models where the strict exogeneity Assumption FE.1fails but sequential mom
Trang 1This chapter continues our treatment of linear, unobserved e¤ects panel data models.
We first cover estimation of models where the strict exogeneity Assumption FE.1fails but sequential moment conditions hold A simple approach to consistent esti-mation involves di¤erencing combined with instrumental variables methods We alsocover models with individual slopes, where unobservables can interact with explana-tory variables, and models where some of the explanatory variables are assumed to
be orthogonal to the unobserved e¤ect while others are not
The final section in this chapter briefly covers some non-panel-data settings whereunobserved e¤ects models and panel data estimation methods can be used
11.1 Unobserved E¤ects Models without the Strict Exogeneity Assumption
11.1.1 Models under Sequential Moment Restrictions
In Chapter 10 all the estimation methods we studied assumed that the explanatoryvariables were strictly exogenous (conditional on an unobserved e¤ect in the case offixed e¤ects and first di¤erencing) As we saw in the examples in Section 10.2.3, strictexogeneity rules out certain kinds of feedback from yit to future values of xit Gen-erally, random e¤ects, fixed e¤ects, and first di¤erencing are inconsistent if an ex-planatory variable in some time period is correlated with uit While the size of theinconsistency might be small—something we will investigate further—in other cases
it can be substantial Therefore, we should have general ways of obtaining consistentestimators as N! y with T fixed when the explanatory variables are not strictly
exogenous
The model of interest can still be written as
yit ¼ xitbþ ciþ uit; t¼ 1; 2; ; T ð11:1Þbut, in addition to allowing ciand xitto be arbitrarily correlated, we now allow uit to
be correlated with future values of the explanatory variables, ðxi; tþ1; xi; tþ2; ; xiTÞ
We saw in Example 10.3 that uit and xi; tþ1 must be correlated because xi; tþ1¼ yit.Nevertheless, there are many models, including the AR(1) model, for which it isreasonable to assume that uit is uncorrelated with current and past values of xit.Following Chamberlain (1992b), we introduce sequential moment restrictions:Eðuitj xit; xi; t1; ; xi1; ciÞ ¼ 0; t¼ 1; 2; ; T ð11:2ÞWhen assumption (11.2) holds, we will say that the xit are sequentially exogenousconditional on the unobserved e¤ect
Trang 2Given model (11.1), assumption (11.2) is equivalent to
Eð yitj xit; xi; t1; ; xi1; ciÞ ¼ Eð yitj xit; ciÞ ¼ xitbþ ci ð11:3Þwhich makes it clear what sequential exogeneity implies about the explanatory vari-ables: after xitand cihave been controlled for, no past values of xita¤ect the expectedvalue of yit This condition is more natural than the strict exogeneity assumption,which requires conditioning on future values of xitas well
Example 11.1 (Dynamic Unobserved E¤ects Model): An AR(1) model with tional explanatory variables is
and so xit1ðzit; yi; t1Þ Therefore, ðxit; xi; t1; ; xi1Þ ¼ ðzit; yi; t1; zi; t1; ; zi1; yi0Þ,and the sequential exogeneity assumption (11.3) requires
de-In this example, assumption (11.5) is an example of dynamic completeness tional on ci; we covered the unconditional version of dynamic completeness in Section7.8.2 It means that one lag of yit is su‰cient to capture the dynamics in the con-ditional expectation; neither further lags of yit nor lags of zit are important once
condi-ðzit; yi; t1; ciÞ have been controlled for In general, if xit contains yi; t1, then sumption (11.3) implies dynamic completeness conditional on ci
as-Assumption (11.3) does not require that zi; tþ1 ; ziT be uncorrelated with uit, sothat feedback is allowed from yit to ðzi; tþ1; ; ziTÞ If we think that zis is uncorre-lated with uitfor all s, then additional orthogonality conditions can be used Finally,
we do not need to restrict the value of r1 in any way because we are doing fixed-Tasymptotics; the arguments from Section 7.8.3 are also valid here
Example 11.2 (Static Model with Feedback): Consider a static panel data model
where z is strictly exogenous and w is sequentially exogenous:
Trang 3Eðuitj zi; wit; wi; t1; ; wi1; ciÞ ¼ 0 ð11:7ÞHowever, wit is influenced by past yit, as in this case:
wit¼ zitxþ r1yi; t1þ cciþ rit ð11:8ÞFor example, let yit be per capita condom sales in city i during year t, and let wit bethe HIV infection rate for year t Model (11.6) can be used to test whether condomusage is influenced by the spread of HIV The unobserved e¤ect ci contains city-specific unobserved factors that can a¤ect sexual conduct, as well as the incidence ofHIV Equation (11.8) is one way of capturing the fact that the spread of HIV is in-fluenced by past condom usage Generally, if Eðri; tþ1uitÞ ¼ 0, it is easy to show thatEðwi; tþ1uitÞ ¼ r1Eð yituitÞ ¼ r1Eðu2
itÞ > 0 under equations (11.7) and (11.8), and sostrict exogeneity fails unless r1¼ 0
Lagging variables that are thought to violate strict exogeneity can mitigate butdoes not usually solve the problem Suppose we use wi; t1in place of witin equation(11.6) because we think wit might be correlated with uit For example, let yit be thepercentage of flights canceled by airline i during year t, and let wi; t1be airline profitsduring the previous year In this case xi; tþ1 ¼ ðzi; tþ1; witÞ, and so xi; tþ1 is correlatedwith uit; this fact results in failure of strict exogeneity In the airline example this issuemay be important: poor airline performance this year (as measured by canceledflights) can a¤ect profits in subsequent years Nevertheless, the sequential exogeneitycondition (11.2) is reasonable
Keane and Runkle (1992) argue that panel data models for testing rationalexpectations using individual-level data generally do not satisfy the strict exogeneityrequirement But they do satisfy sequential exogeneity: in fact, in the conditioning set
in assumption (11.2), we can include all variables observed at time t 1
What happens if we apply the standard fixed e¤ects estimator when the strict geneity assumption fails? Generally,
exo-plimð ^bFEÞ ¼ b þ T1XT
t¼1Eð€xit0€xitÞ
T1XT t¼1Eð€xit0uitÞ
where €xit¼ xit xi, as in Chapter 10 (i is a random draw from the cross section).Now, under sequential exogeneity, Eð€xit0uitÞ ¼ E½ðxit xiÞ0uit ¼ EðxiuitÞ becauseEðx0
ituitÞ ¼ 0, and so T1PT
t¼1Eð€xit0uitÞ ¼ T1PT
t¼1EðxiuitÞ ¼ EðxiuiÞ We canbound the size of the inconsistency as a function of T if we assume that the time seriesprocess is appropriately stable and weakly dependent Under such assumptions,
T1PT
Eð€x0€xitÞ is bounded Further, VarðxiÞ and VarðuiÞ are of order T1 By the
Trang 4Cauchy-Schwartz inequality (for example, Davidson, 1994, Chapter 9), jEðxijuiÞj a
½ VarðxijÞVarðuiÞ1=2¼ OðT1Þ Therefore, under bounded moments and weak pendence assumptions, the inconsistency from using fixed e¤ects when the strictexogeneity assumption fails is of order T1 With large T the bias may be minimal.See Hamilton (1994) and Wooldridge (1994) for general discussions of weak depen-dence for time series processes
de-Hsiao (1986, Section 4.2) works out the inconsistency in the FE estimator for theAR(1) model The key stability condition su‰cient for the bias to be of order T1 is
jr1j < 1 However, for r1 close to unity, the bias in the FE estimator can be sizable,even with fairly large T Generally, if the processfxitg has very persistent elements—which is often the case in panel data sets—the FE estimator can have substantialbias
If our choice were between fixed e¤ects and first di¤erencing, we would tend toprefer fixed e¤ects because, when T > 2, FE can have less bias as N ! y To see
this point, write
plimð ^bFDÞ ¼ b þ T1XT
t¼1EðDxit0DxitÞ
T1XT t¼1EðDxit0DuitÞ
ð11:9Þ
Iffxitg is weakly dependent, so is fDxitg, and so the first average in equation (11.9) isbounded as a function of T (In fact, under stationarity, this average does not depend
on T.) Under assumption (11.2), we have
EðDxit0DuitÞ ¼ Eðxit0uitÞ þ Eðxi; t10 ui; t1Þ Eðxi; t10 uitÞ Eðxit0ui; t1Þ ¼ Eðxit0ui; t1Þwhich is generally di¤erent from zero Under stationarity, Eðx0
itui; t1Þ does not pend on t, and so the second average in equation (11.9) is constant This result showsnot only that the FD estimator is inconsistent, but also that its inconsistency does notdepend on T As we showed previously, the time demeaning underlying FE results inits bias being on the order of T1 But we should caution that this analysis assumesthat the original series, fðxit; yitÞ: t ¼ 1; ; Tg, is weakly dependent Without thisassumption, the inconsistency in the FE estimator cannot be shown to be of order
de-T1
If we make certain assumptions, we do not have to settle for estimators that areinconsistent with fixed T A general approach to estimating equation (11.1) underassumption (11.2) is to use a transformation to remove ci, but then search for in-strumental variables The FE transformation can be used provided that strictly ex-ogenous instruments are available (see Problem 11.9) For models under sequentialexogeneity assumptions, first di¤erencing is more attractive
Trang 5First di¤erencing equation (11.1) gives
so at time t we can use xo
i; t1 as potential instruments for Dxit, where
instru-is just identified
Rather than use changes in lagged xit as instruments, we can use lagged levels of
xit For example, choosingðxi; t1; xi; t2Þ as instruments at time t is no less e‰cientthan the procedure that uses Dxi; t1, as the latter is a linear combination of the for-mer It also gives K overidentifying restrictions that can be used to test assumption(11.2) (There will be fewer than K if xit contains time dummies.)
When T ¼ 2, b may be poorly identified The equation is D yi2¼ Dxi2bþ Dui2,and, under assumption (11.2), xi1 is uncorrelated with Dui2 This is a cross sectionequation that can be estimated by 2SLS using xi1 as instruments for Dxi2 The esti-mator in this case may have a large asymptotic variance because the correlationsbetween xi1, the levels of the explanatory variables, and the di¤erences Dxi2¼
xi2 xi1are often small Of course, whether the correlation is su‰cient to yield smallenough standard errors depends on the application
Trang 6Even with large T, the available IVs may be poor in the sense that they arenot highly correlated with Dxit As an example, consider the AR(1) model (11.4)without zit: yit¼ r1yi; t1þ ciþ uit;Eðuitj yi; t1; ; yi0; ciÞ ¼ 0, t ¼ 1; 2; ; T Dif-ferencing to eliminate cigives Dyit ¼ r1Dyi; t1þ Duit, t b 2 At time t, all elements of
ð yi; t2; ; yi0Þ are IV candidates because Duit is uncorrelated with yi; th, h b 2.Anderson and Hsiao (1982) suggested pooled IV with instruments yi; t2 or Dyi; t2,whereas Arellano and Bond (1991) proposed using the entire set of instruments in
a GMM procedure Now, suppose that r1¼ 1 and, in fact, there is no unobservede¤ect Then Dyi; t1 is uncorrelated with any variable dated at time t 2 or earlier,and so the elements of ð yi; t2; ; yi0Þ cannot be used as IVs for D yi; t1 What thisconclusion shows is that we cannot use IV methods to test H0: r1¼ 1 in the absence
of an unobserved e¤ect
Even if r1<1, IVs from ð yi; t2; ; yi0Þ tend to be weak if r1 is close to one.Recently, Arellano and Bover (1995) and Ahn and Schmidt (1995) suggested addi-tional orthogonality conditions that improve the e‰ciency of the GMM estimator,but these are nonlinear in the parameters (In Chapter 14 we will see how to use thesekinds of moment restrictions.) Blundell and Bond (1998) obtained additional linearmoment restrictions in the levels equation yit¼ r1yi; t1þ vit, vit¼ ciþ uit The ad-ditional restrictions are based on yi0 being drawn from a steady-state distribution,and they are especially helpful in improving the e‰ciency of GMM for r1 close toone (Actually, the Blundell-Bond orthogonality conditions are valid under weakerassumptions.) See also Hahn (1999) Of course, when r1¼ 1, it makes no sense toassume that there is a steady-state distribution In Chapter 13 we cover conditionalmaximum likelihood methods that can be applied to the AR(1) model
A general feature of pooled 2SLS procedures where the dimension of the IVs isconstant across t is that they do not use all the instruments available in each timeperiod; therefore, they cannot be expected to be e‰cient The optimal procedure is touse expression (11.13) as the instruments at time t in a GMM procedure Write thesystem of equations as
where xois defined in expression (11.13) Note that Zihas T 1 rows to correspond
Trang 7with the T 1 time periods in the system (11.15) Since each row contains di¤erentinstruments, di¤erent instruments are used for di¤erent time periods.
E‰cient estimation of b now proceeds in the GMM framework from Chapter 8
with instruments (11.16) Without further assumptions, the unrestricted weightingmatrix should be used In most applications there is a reasonable set of assumptionsunder which
where ei1Duiand W 1 Eðeiei0Þ Recall from Chapter 8 that assumption (11.17) is theassumption under which the GMM 3SLS estimator is the asymptotically e‰cientGMM estimator (see Assumption SIV.5) The full GMM analysis is not much moredi‰cult The traditional form of 3SLS estimator that first transforms the instrumentsshould not be used because it is not consistent under assumption (11.2)
As a practical matter, the column dimension of Zican be very large, making GMMestimation di‰cult In addition, GMM estimators—including 2SLS and 3SLS—usingmany overidentifying restrictions are known to have poor finite sample properties (see,for example, Tauchen, 1986; Altonji and Segal, 1996; and Ziliak, 1997) In practice, itmay be better to use a couple of lags rather than lags back to t¼ 1
Example 11.3 (Testing for Persistence in County Crime Rates): We use the data inCORNWELL.RAW to test for state dependence in county crime rates, after allow-ing for unobserved county e¤ects Thus, the model is equation (11.4) with yit1
logðcrmrteitÞ but without any other explanatory variables As instruments for Dyi; t1,
we useð yi; t2; yi; t3Þ Further, so that we do not have to worry about correcting thestandard error for possible serial correlation in Duit, we use just the 1986–1987 dif-ferenced equation The F statistic for joint significance of yi; t2; yi; t3 in the reducedform for D yi; t1 yields p-value¼ :023, although the R-squared is only 083 The2SLS estimates of the first-di¤erenced equation are
Dlogðc^rrmrteÞ ¼ :065
ð:040Þ
þ :212ð:497ÞDlogðcrmrteÞ1; N¼ 90
so that we cannot reject H0: r1¼ 0 ðt ¼ :427Þ:
11.1.2 Models with Strictly and Sequentially Exogenous Explanatory VariablesEstimating models with both strictly exogenous and sequentially exogenous variables
is not di‰cult For t¼ 1; 2; ; T, suppose that
Assume that z is uncorrelated with u for all s and t, but that u is uncorrelated with
Trang 8wis only for s a t; su‰cient is Eðuitj zi; wit; wi; t1; ; wi1Þ ¼ 0 This model coversmany cases of interest, including when witcontains a lagged dependent variable.After first di¤erencing we have
and the instruments available at time t are ðzi; wi; t1; ; wi1Þ In practice, so thatthere are not so many overidentifying restrictions, we might replace zi with Dzit andchoose something like ðDzit; wi; t1; wi; t2Þ as the instruments at time t Or, zit and acouple of lags of zit can be used In the AR(1) model (11.4), this approach wouldmean something likeðzit; zi; t1; zi; t2; yi; t2; yi; t3Þ We can even use leads of zit, such
as zi; tþ1, when zit is strictly exogenous Such choices are amenable to a pooled 2SLS
procedure to estimate g and d Of course, whether or not the usual 2SLS standard
errors are valid depends on serial correlation and variance properties of Duit theless, assuming that the changes in the errors are (conditionally) homoskedasticand serially uncorrelated is a reasonable start
Never-Example 11.4 (E¤ects of Enterprise Zones): Papke (1994) uses several di¤erentpanel data models to determine the e¤ect of enterprise zone designation on economicoutcomes for 22 communities in Indiana One model she uses is
yit ¼ ytþ r1yi; t1þ d1ezitþ ciþ uit ð11:20Þwhere yit is the log of unemployment claims The coe‰cient of interest is on thebinary indicator ezit, which is unity if community i in year t was designated as anenterprise zone The model holds for the years 1981 to 1988, with yi0corresponding
to 1980, the first year of data Di¤erencing gives
Dyit¼ xtþ r1Dyi; t1þ d1Dezitþ Duit ð11:21ÞThe di¤erenced equation has new time intercepts, but as we are not particularlyinterested in these, we just include year dummies in equation (11.21)
Papke estimates equation (11.21) by 2SLS, using Dyi; t2 as an instrument for
Dyi; t1; because of the lags used, equation (11.21) can be estimated for six years ofdata The enterprise zone indicator is assumed to be strictly exogenous in equation(11.20), and so Dezit acts as its own instrument Strict exogeneity of ezit is valid be-cause, over the years in question, each community was a zone in every year followinginitial designation: future zone designation did not depend on past performance.The estimated equation in first di¤erences is
Dlog ^ððuclmsÞ ¼ ^xtþ :165
ð:288Þ
DlogðuclmsÞ1 :219
ð:106ÞDez
Trang 9where the intercept and year dummies are supressed for brevity Based on the usualpooled 2SLS standard errors, ^r1is not significant (or practially very large), while ^dd1iseconomically large and statistically significant at the 5 percent level.
If the uit in equation (11.20) are serially uncorrelated, then, as we saw in Chapter
10, Duit must be serially correlated Papke found no important di¤erences when thestandard error for ^dd1was adjusted for serial correlation and heteroskedasticity
In the pure AR(1) model, using lags of yitas an instrument for Dyi; t1 means that
we are assuming the AR(1) model captures all of the dynamics If further lags of
yit are added to the structural model, then we must go back even further to obtaininstruments If strictly exogenous variables appear in the model along with yi; t1—such as in equation (11.4)—then lags of zit are good candidates as instruments for
Dyi; t1 Much of the time inclusion of yi; t1(or additional lags) in a model with otherexplanatory variables is intended to simply control for another source of omittedvariables bias; Example 11.4 falls into this class
Things are even trickier in finite distributed lag models Consider the patents-R&Dmodel of Example 10.2: after first di¤erencing, we have
This approach identifies the parameters under the assumptions made, but it isproblematic What if we have the distributed lag dynamics wrong, so that six lags,rather than five, belong in the structural model? Then choosing additional lags of
RDitas instruments fails If DRDitis su‰ciently correlated with the elements of zisforsome s, then using all of zias instruments can help Generally, some exogenous factorseither in zitor from outside the structural equation are needed for a convincing analysis.11.1.3 Models with Contemporaneous Correlation between Some ExplanatoryVariables and the Idiosyncratic Error
Consider again model (11.18), where zit is strictly exogenous in the sense that
Trang 10but where we allow wit to be contemporaneously correlated with uit This correlationcan be due to any of the three problems that we studied earlier: omission of an im-portant time-varying explanatory variable, measurement error in some elements of
wit, or simultaneity between yit and one or more elements of wit We assume thatequation (11.18) is the equation of interest In a simultaneous equations model withpanel data, equation (11.18) represents a single equation A system approach is alsopossible See, for example, Baltagi (1981); Cornwell, Schmidt, and Wyhowski (1992);and Kinal and Lahiri (1993)
Example 11.5 (E¤ects of Smoking on Earnings): A panel data model to examinethe e¤ects of cigarette smoking on earnings is
logðwageitÞ ¼ zitgþ d1cigsitþ ciþ uit ð11:24Þ(For an empirical analysis, see Levine, Gustafson, and Velenchik, 1997.) As always,
we would like to know the causal e¤ect of smoking on hourly wage For ness, assume cigsit is measured as average packs per day This equation has a causalinterpretation: holding fixed the factors in zit and ci, what is the e¤ect of an exoge-nous change in cigarette smoking on wages? Thus equation (11.24) is a structuralequation
concrete-The presence of the individual heterogeneity, ci, in equation (11.24) recognizes thatcigarette smoking might be correlated with individual characteristics that also a¤ectwage An additional problem is that cigsit might also be correlated with uit, some-thing we have not allowed so far In this example the correlation could be from avariety of sources, but simultaneity is one possibility: if cigarettes are a normal good,then, as income increases—holding everything else fixed—cigarette consumptionincreases Therefore, we might add another equation to equation (11.24) that reflectsthat cigsitmay depend on income, which clearly depends on wage If equation (11.24)
is of interest, we do not need to add equations explicitly, but we must find some strumental variables
in-To get an estimable model, we must first deal with the presence of ci, since it might
be correlated with zit as well as cigsit In the general model (11.18), either the FE or
FD transformations can be used to eliminate cibefore addressing the correlation tween wit and uit If we first di¤erence, as in equation (11.19), we can use the entirevector zi as valid instruments in equation (11.19) because zit is strictly exogenous.Neither wit nor wi; t1 is valid as instruments at time t, but it could be that wi; t2 isvalid, provided we assume that uitis uncorrelated with wisfor s < t This assumptionmeans that wit has only a contemporaneous e¤ect on yit, something that is likely to
be-be false in example 11.5 [If smoking a¤ects wages, the e¤ects are likely to be-be
Trang 11deter-mined by prior smoking behavior as well as current smoking behavior If we include
a measure of past smoking behavior in equation (11.24), then this must act as its owninstrument in a di¤erenced equation, and so using cigsis for s < t as IVs becomesuntenable.]
Another thought is to use lagged values of yit as instruments, but this approache¤ectively rules out serial correlation in uit In the wage equation (11.24), it wouldmean that lagged wage does not predict current wage, once ciand the other variablesare controlled for If this assumption is false, using lags of yit is not a valid way ofidentifying the parameters
If ziis the only valid set of instruments for equation (11.18), the analysis probablywill not be convincing: it relies on Dwit being correlated with some linear combina-tion of zi other than Dzit Such partial correlation is likely to be small, resulting inpoor IV estimators; see Problem 11.2
Perhaps the most convincing possibility for obtaining additional instruments is tofollow the standard SEM approach from Chapter 9: use exclusion restrictions in thestructural equations For example, we can hope to find exogenous variables that donot appear in equation (11.24) but that do a¤ect cigarette smoking The local price ofcigarettes (or level of cigarette taxes) is one possibility Such variables can usually beconsidered strictly exogenous, unless we think people change their residence based onthe price of cigarettes
If we di¤erence equation (11.24) we get
DlogðwageitÞ ¼ Dzitgþ d1Dcigsitþ Duit ð11:25ÞNow, for each t, we can study identification of this equation just as in the cross sec-tional case: we must first make sure the order condition holds, and then argue (ortest) that the rank condition holds Equation (11.25) can be estimated using a pooled2SLS analysis, where corrections to standard errors and test statistics for hetero-skedasticity or serial correlation might be warranted With a large cross section, aGMM system procedure that exploits general heteroskedasticity and serial correla-tion in Duit can be used instead
Example 11.6 (E¤ects of Prison Population on Crime Rates): In order to estimatethe causal e¤ect of prison population increases on crime rates at the state level, Levitt(1996) uses instances of prison overcrowding litigation as instruments for the growth
in prison population The equation Levitt estimates is in first di¤erences We canwrite an underlying unobserved e¤ects model as
logðcrimeitÞ ¼ ytþ b1logð prisonitÞ þ xitgþ ciþ uit ð11:26Þ
Trang 12where yt denotes di¤erent time intercepts and crime and prison are measured per100,000 people (The prison population variable is measured on the last day of theprevious year.) The vector xit contains other controls listed in Levitt, including mea-sures of police per capita, income per capita, unemployment rate, and race, metro-politan, and age distribution proportions.
Di¤erencing equation (11.26) gives the equation estimated by Levitt:
DlogðcrimeitÞ ¼ xtþ b1Dlogð prisonitÞ þ Dxitgþ Duit ð11:27ÞSimultaneity between crime rates and prison population, or, more precisely, in thegrowth rates, makes OLS estimation of equation (11.27) generally inconsistent Usingthe violent crime rate and a subset of the data from Levitt (in PRISON.RAW, for theyears 1980 to 1993, for 51 14 ¼ 714 total observations), the OLS estimate of b1 is
:181 (se ¼ :048) We also estimate the equation by 2SLS, where the instruments forDlogð prisonÞ are two binary variables, one for whether a final decision was reached
on overcrowding litigation in the current year and one for whether a final decisionwas reached in the previous two years The 2SLS estimate of b1is1:032 (se ¼ :370).Therefore, the 2SLS estimated e¤ect is much larger; not surprisingly, it is much lessprecise, too Levitt (1996) found similar results when using a longer time period andmore instruments
A di¤erent approach to estimating SEMs with panel data is to use the fixed e¤ectstransformation and then to apply an IV technique such as pooled 2SLS A simpleprocedure is to estimate the time-demeaned equation (10.46) by pooled 2SLS, wherethe instruments are also time demeaned This is equivalent to using 2SLS in thedummy variable formulation, where the unit-specific dummy variables act as theirown instruments See Problem 11.9 for a careful analysis of this approach Foster andRosenzweig (1995) use the within transformation along with IV to estimate household-level profit functions for adoption of high-yielding seed varieties in rural India Ayresand Levitt (1998) apply 2SLS to a time-demeaned equation to estimate the e¤ect ofLojack electronic theft prevention devices on city car-theft rates
The FE transformation precludes the use of lagged values of wit among theinstruments, for essentially the same reasons discussed for models with sequentiallyexogenous explanatory variables: uit will be correlated with the time-demeaned in-struments Therefore, if we make assumptions on the dynamics in the model thatensure that uitis uncorrelated with wis, s < t, di¤erencing is preferred in order to usethe extra instruments
Di¤erencing or time demeaning followed by some sort of IV procedure is usefulwhen uitcontains an important, time-varying omitted variable that is correlated with
Trang 13uit The same considerations for choosing instruments in the simultaneity context arerelevant in the omitted variables case as well In some cases, wis, s < t 1, can beused as instruments at time t in a first-di¤erenced equation (11.18); in other cases, wemight not want identification to hinge on using lagged exploratory variables as IVs.For example, suppose that we wish to study the e¤ects of per student spending on testscores, using three years of data, say 1980, 1985, and 1990 A structural model at theschool level is
avgscoreit¼ ytþ zitgþ d1spendingitþ ciþ uit ð11:28Þwhere zit contains other school and student characteristics In addition to worryingabout the school fixed e¤ect ci, uitcontains average family income for school i at time
t (unless we are able to collect data on income); average family income is likely to becorrelated with spendingit After di¤erencing away ci, we need an instrument forDspendingit One possibility is to use exogenous changes in property taxes that arosebecause of an unexpected change in the tax laws [Such changes occurred in California
in 1978 (Proposition 13) and in Michigan in 1994 (Proposal A).] Using lagged ing changes as IVs is probably not a good idea, as spending might a¤ect test scoreswith a lag
spend-The third form of endogeneity, measurement error, can also be solved by nating ci and finding appropriate IVs Measurement error in panel data was studied
elimi-by Solon (1985) and Griliches and Hausman (1986) It is widely believed in metrics that the di¤erencing and FE transformations exacerbate measurement errorbias (even though they eliminate heterogeneity bias) However, it is important toknow that this conclusion rests on the classical errors-in-variables model under strictexogeneity, as well as on other assumptions
econo-To illustrate, consider a model with a single explanatory variable,
under the strict exogeneity assumption
Eðuitj xi; xi; ciÞ ¼ 0; t¼ 1; 2; ; T ð11:30Þwhere xit denotes the observed measure of the unobservable xit Condition (11.30)embodies the standard redundancy condition—that xit does not matter once x
it iscontrolled for—in addition to strict exogeneity of the unmeasured and measuredregressors Denote the measurement error as rit¼ xit x
it Assuming that rit is correlated with xit—the key CEV assumption—and that variances and covariancesare all constant across t, it is easily shown that, as N ! y, the plim of the pooled
un-OLS estimator is
Trang 14r If xitand ci are positively correlated and b > 0, thetwo sources of bias tend to cancel each other out.
Now assume that risis uncorrelated with xitfor all t and s, and for simplicity pose that T ¼ 2 If we first di¤erence to remove cibefore performing OLS we obtain
it; xi; t1 Þ and rr¼ Corrðrit; ri; t1Þ, where we have used the fact thatCovðrit; ri; t1Þ ¼ s2
rel-Of course, we can never know whether the bias in equation (11.31) is larger thanthat in equation (11.32), or vice versa Also, both expressions are based on the CEVassumptions, and then some If there is little correlation between Dxit and Drit, themeasurement error bias from first di¤erencing may be small, but the small correlation
is o¤set by the fact that di¤erencing can considerably reduce the variation in theexplanatory variables
Consistent estimation in the presence of measurement error is possible under tain assumptions Consider the more general model
cer-y ¼ zitgþ dwþ ciþ uit; t¼ 1; 2; ; T ð11:33Þ
Trang 15where wit is measured with error Write rit ¼ wit w
it, and assume strict exogeneityalong with redundancy of wit:
Eðuitj zi; wi; wi; ciÞ ¼ 0; t¼ 1; 2; ; T ð11:34ÞReplacing witwith witand first di¤erencing gives
The standard CEV assumption in the current context can be stated as
Eðritj zi; wi; ciÞ ¼ 0; t¼ 1; 2; ; T ð11:36Þwhich implies that rit is uncorrelated with zis, wis for all t and s (As always in thecontext of linear models, assuming zero correlation is su‰cient for consistency, butnot for usual standard errors and test statistics to be valid.) Under assumption (11.36)(and other measurement error assumptions), Dritis correlated with Dwit To apply an
IV method to equation (11.35), we need at least one instrument for Dwit As in theomitted variables and simultaneity contexts, we may have additional variables out-side the model that can be used as instruments Analogous to the cross section case(as in Chapter 5), one possibility is to use another measure on w
it, say hit If themeasurement error in hit is orthogonal to the measurement error in wis, all t and s,then Dhit is a natural instrument for Dwit in equation (11.35) Of course, we can usemany more instruments in equation (11.35), as any linear combination of ziand hiisuncorrelated with the composite error under the given assumptions
Alternatively, a vector of variables hit may exist that are known to be redundant
in equation (11.33), strictly exogenous, and uncorrelated with ris for all s If Dhit iscorrelated with Dwit, then an IV procedure, such as pooled 2SLS, is easy to apply Itmay be that in applying something like pooled 2SLS to equation (11.35) results inasymptotically valid statistics; this imposes serial independence and homoskedasticityassumptions on Duit Generally, however, it is a good idea to use standard errors andtest statistics robust to arbitrary serial correlation and heteroskedasticity, or to use afull GMM approach that e‰ciently accounts for these An alternative is to use the
FE transformation, as explained in Problem 11.9 Ziliak, Wilson, and Stone (1999)find that, for a model explaining cyclicality of real wages, the FD and FE estimatesare di¤erent in important ways The di¤erences largely disappear when IV methodsare used to account for measurement error in the local unemployment rate
So far, the solutions to measurement error in the context of panel data haveassumed nothing about the serial correlation in rit Suppose that, in addition to as-sumption (11.34), we assume that the measurement error is serially uncorrelated:
Trang 16Assumption (11.37) opens up a solution to the measurement error problem withpanel data that is not available with a single cross section or independently pooledcross sections Under assumption (11.36), rit is uncorrelated with wis for all t and s.Thus, if we assume that the measurement error rit is serially uncorrelated, then rit isuncorrelated with wisfor all t 0 s Since, by the strict exogeneity assumption, Duit isuncorrelated with all leads and lags of zit and wit, we have instruments readily avail-able For example, wi; t2 and wi; t3 are valid as instruments for Dwit in equation(11.35); so is wi; tþ1 Again, pooled 2SLS or some other IV procedure can be usedonce the list of instruments is specified for each time period However, it is important
to remember that this approach requires the ritto be serially uncorrelated, in addition
to the other CEV assumptions
The methods just covered for solving measurement error problems all assume strictexogeneity of all explanatory variables Naturally, things get harder when measure-ment error is combined with models with only sequentially exogenous explanatoryvariables Nevertheless, di¤erencing away the unobserved e¤ect and then selectinginstruments—based on the maintained assumptions—generally works in models with
a variety of problems
11.1.4 Summary of Models without Strictly Exogenous Explanatory VariablesBefore leaving this section, it is useful to summarize the general approach we havetaken to estimate models that do not satisfy strict exogeneity: first, a transformation
is used to eliminate the unobserved e¤ect; next, instruments are chosen for the enous variables in the transformed equation In the previous subsections we havestated various assumptions, but we have not catalogued them as in Chapter 10,largely because there are so many variants For example, in Section 11.1.3 we sawthat di¤erent assumptions lead to di¤erent sets of instruments The importance ofcarefully stating assumptions—such as (11.2), (11.34), (11.36), and (11.37)—cannot
endog-be overstated
First di¤erencing, which allows for more general violations of strict exogeneitythan the within transformation, has an additional benefit: it is easy to test the first-di¤erenced equation for serial correlation after pooled 2SLS estimation The testsuggested in Problem 8.10 is immediately applicable with the change in notation thatall variables are in first di¤erences Arellano and Bond (1991) propose tests for serialcorrelation in the original errors, fuit: t¼ 1; ; Tg; the tests are based on GMMestimation When the original model has a lagged dependent variable, it makes moresense to test for serial correlation in fuitg: models with lagged dependent variablesare usually taken to have errors that are serially uncorrelated, in which case the first-di¤erenced errors must be serially correlated As Arellano and Bond point out, serial
Trang 17correlation infuitg generally invalidates using lags of yitas IVs in the first-di¤erencedequation Of course, one might ask why we would be interested in r1in model (11.4)
iffuitg is generally serially correlated
11.2 Models with Individual-Specific Slopes
The unobserved e¤ects models we have studied up to this point all have an additiveunobserved e¤ect that has the same partial e¤ect on yit in all time periods Thisassumption may be too strong for some applications We now turn to models thatallow for individual-specific slopes
11.2.1 A Random Trend Model
Consider the following extension of the standard unobserved e¤ects model:
yit ¼ ciþ gitþ xitbþ uit; t¼ 1; 2; ; T ð11:38ÞThis is sometimes called a random trend model, as each individual, firm, city, and so
on is allowed to have its own time trend The individual-specific trend is an additionalsource of heterogeneity If yit is the natural log of a variable, as is often the case ineconomic studies, then giis (roughly) the average growth rate over a period (holdingthe explanatory variables fixed) Then equation (11.38) is referred to a random growthmodel; see, for example, Heckman and Hotz (1989)
In many applications of equation (11.38) we want to allowðci; giÞ to be arbitrarilycorrelated with xit (Unfortunately, allowing this correlation makes the name ‘‘ran-dom trend model’’ conflict with our previous usage of random versus fixed e¤ects.)For example, if one element of xit is an indicator of program participation, equation(11.38) allows program participation to depend on individual-specific trends (orgrowth rates) in addition to the level e¤ect, ci We proceed without imposing restric-tions on correlations among ðci; gi; xitÞ, so that our analysis is of the fixed e¤ectsvariety A random e¤ects approach is also possible, but it is more cumbersome; seeProblem 11.5
For the random trend model, the strict exogeneity assumption on the explanatoryvariables is
Trang 18One approach to estimating b is to di¤erence away ci:
Dyit¼ giþ Dxitbþ Duit; t¼ 2; 3; ; T ð11:41Þwhere we have used the fact that git giðt 1Þ ¼ gi Now equation (11.41) is justthe standard unobserved e¤ects model we studied in Chapter 10 The key strict exo-geneity assumption, EðDuitj gi;Dxi2; ;DxiTÞ ¼ 0, t ¼ 2; 3; ; T, holds under as-sumption (11.39) Therefore, we can apply fixed e¤ects or first-di¤erencing methods
to equation (11.41) in order to estimate b.
In di¤erencing the equation to eliminate ciwe lose one time period, so that tion (11.41) applies to T 1 time periods To apply FE or FD methods to equation(11.41) we must have T 1 b 2, or T b 3 In other words, b can be estimated con-
equa-sistently in the random trend model only if T b 3
Whether we prefer FE or FD estimation of equation (11.41) depends on theproperties offDuit: t¼ 2; 3; ; Tg As we argued in Section 10.6, in some cases it isreasonable to assume that the first di¤erence offuitg is serially uncorrelated, in whichcase the FE method applied to equation (11.41) is attractive If we make the as-sumption that the uit are serially uncorrelated and homoskedastic (conditional on xi,
ci, gi), then FE applied to equation (11.41) is still consistent and asymptotically mal, but not e‰cient The next subsection covers that case explicitly
nor-Example 11.7 (Random Growth Model for Analyzing Enterprise Zones): Papke(1994) estimates a random growth model to examine the e¤ects of enterprise zones onunemployment claims:
enter-Friedberg (1998) provides an example, using state-level panel data on divorce ratesand divorce laws, that shows how important it can be to allow for state-specifictrends Without state-specific trends, she finds no e¤ect of unilateral divorce laws ondivorce rates; with state-specific trends, the estimated e¤ect is large and statisticallysignificant The estimation method Friedberg uses is the one we discuss in the nextsubsection
In using the random trend or random growth model for program evaluation, itmay make sense to allow the trend or growth rate to depend on program participa-
Trang 19tion: in addition to shifting the level of y, program participation may also a¤ect therate of change In addition to progit, we would include progit t in the model:
If fuitg contains substantial serial correlation—more than a random walk—thendi¤erencing equation (11.41) might be more attractive Denote the second di¤erence
When T ¼ 3, second di¤erencing is the same as first di¤erencing and then ing fixed e¤ects Second di¤erencing results in a single cross section on the second-di¤erenced data, so that if the second-di¤erence error is homoskedastic conditional
apply-on xi, the standard OLS analysis on the cross section of second di¤erences is priate Hoxby (1996) uses this method to estimate the e¤ect of teachers’ unions oneducation production using three years of census data
appro-If xit contains a time trend, then Dxit contains the same constant for t¼2; 3; ; T , which then gets swept away in the FE or FD transformation applied toequation (11.41) Therefore, xitcannot have time-constant variables or variables thathave exact linear time trends for all cross section units
11.2.2 General Models with Individual-Specific Slopes
We now consider a more general model with interactions between time-varying planatory variables and some unobservable, time-constant variables:
Trang 20ex-yit ¼ zitaiþ xitbþ uit; t¼ 1; 2; ; T ð11:43Þwhere zit is 1 J, ai is J 1, xit is 1 K, and b is K 1 The standard unobserved
e¤ects model is a special case with zit11; the random trend model is a special casewith zit¼ zt¼ ð1; tÞ
Equation (11.43) allows some time-constant unobserved heterogeneity, contained
in the vector ai, to interact with some of the observable explanatory variables Forexample, suppose that progit is a program participation indicator and yit is an out-come variable The model
esti-In the general model, we initially focus on estimating b and then turn to estimation
of a¼ EðaiÞ, which is the vector of average partial e¤ects for the covariates zit Thestrict exogeneity assumption is the natural extension of assumption (11.39):
assumptionFE.10: Eðuitj zi; xi; aiÞ ¼ 0, t ¼ 1; 2; ; T
Along with equation (11.43), Assumption FE.10is equivalent to
Eð yitj zi1; ; ziT; xi1; ; xiT; aiÞ ¼ Eð yitj zit; xit; aiÞ ¼ zitaiþ xitb
which says that, once zit, xit, and aihave been controlled for,ðzis; xisÞ for s 0 t do not
help to explain yit
Define Zi as the T J matrix with tth row zit, and similarly for the T K matrix
Xi Then equation (11.43) can be written as