Estimating Econometric Models with Fixed Effects

The parameters of the linear model with fixed individual effects can be estimated by the 'least squares dummy variable' LSDV or 'within groups' estimator, which we denote bLSDV.. However

Trang 1

Estimating Econometric Models with Fixed Effects

William Greene*

Department of Economics, Stern School of Business,

New York University, April, 2001

Abstract

The application of nonlinear fixed effects models in econometrics has often been avoided for two reasons, one methodological, one practical The methodological question centers on an incidental parameters problem that raises questions about the statistical properties of the estimator The practical one relates to the difficulty of estimating nonlinear models with possibly thousands of coefficients This note will demonstrate that the second is, in fact, a nonissue, and that in a very large number of models of interest

to practitioners, estimation of the fixed effects model is quite feasible even in panels with huge numbers of groups The models are fully parametric, and all parameters of interest are estimable

Keywords: Panel data, fixed effects, computation.

JEL classification: C1, C4

* 44 West 4th St., New York, NY 10012, USA, Telephone: 001-212-998-0876; fax: 01-212-995-4218; e-mail: wgreene@stern.nyu.edu, URL www.stern.nyu.edu/~wgreene This paper has benefited from discussions with George Jakubson (who suggested one of the main results in this paper), Martin Spiess, and Scott Thompson and from seminar groups at The University of Texas, University of Illinois, and New York University Any remaining errors are my own

Trang 2

1 Introduction

The fixed effects model is a useful specification for accommodating individual heterogeneity in panel data But, it has been problematic for two reasons In most cases, the estimator is inconsistent owing to the incidental parameters problem How serious this problem

is in practical terms remains to be established - there is only a very small amount of received evidence - but the theoretical result is unambiguous A second problem is purely practical With current technology, the computation of the model parameters and appropriate standard errors, with all its nuisance parameters, appears to be impractical This note focuses on the second of these, and shows that in a large number of interesting cases, the difficulty is only apparent We will focus on a single result, computation of the estimator, and rely on some well known algebraic results to establish it No formal statistical results are derived here or suggested with Monte Carlo results, as in general, the results are already known The one statistical question noted above is left for further research

The paper proceeds as follows In Section 2, the general modeling framework is presented, departing from the linear model to more complicated specifications The formal results for the estimator and computational procedures for obtaining appropriate standard errors are presented in Section 3 Section 4 suggests two possible new applications Conclusions are drawn in Section 5

2 Models with Fixed Effects

The linear regression model with fixed effects is

y it = xit + i + t + it , t = 1, ,T(i), i = 1, ,N,

E[it|xi1,xi2, ,xiT(i)] = 0,

Var[it|xi1,xi2, ,xiT(i)] = 2

We have assumed the strictly exogenous regressors case in the conditional moments, [see Woolridge (1995)] We have not assumed equal sized groups in the panel The vector  is a set

of parameters of primary interest, i is the group specific heterogeneity We have included time

specific effects but, they are only tangential in what follows Since the number of periods is usually fairly small, these can usually be accommodated simply by adding a set of time specific

dummy variables to the model Our interest here is in the case in which N is too large to do likewise for the group effects For example in analyzing census based data sets, N might number

in the tens of thousands The analysis of two way models, both fixed and random effects, has been well worked out in the linear case [See, e.g., Baltagi (1995) and Baltagi, et al (2001).] A full extension to the nonlinear models considered in this paper remains for further research

The parameters of the linear model with fixed individual effects can be estimated by the

'least squares dummy variable' (LSDV) or 'within groups' estimator, which we denote bLSDV This

is computed by least squares regression of y it * = (y it - y ) on the same transformation of x. it where the averages are group specific means The individual specific dummy variable coefficients can

be estimated using group specific averages of residuals [See, e.g., Greene (2000, Chapter 14).] The slope parameters can also be estimated using simple first differences Under the

assumptions, bLSDV is a consistent estimator of  However, the individual effects, i, are each

estimated with the T(i) group specific observations Since T(i) might be small, and is, moreover, fixed, the estimator, a i,LSDV , is inconsistent But, the inconsistency of a i,LSDV, is not transmitted to

Trang 3

bLSDV because y is a sufficient statistic The LSDV estimator b .i LSDV is not a function of a i,LSDV There are a few nonlinear models in which a like result appears

We will define a nonlinear model by the density for an observed random variable, y it,

f(y it | xi1,xi2, ,xiT(i) ) = g(y it, xit + i, )

where  is a vector of ancillary parameters such as a scale parameter, an overdispersion parameter

in the Poisson model or the threshold parameters in an ordered probit model We have narrowed

our focus to linear index function models For the present, we also rule out dynamic effects; y i,t-1

does not appear on the right hand side of the equation [See, e.g., Arellano and Bond (1991), Arellano and Bover (1995), Ahn and Schmidt (1995), Orme (1999), Heckman and MaCurdy (1980)] However, it does appear that extension of the fixed effects model to dynamic models may well be practical This, and multiple equation models, such as VAR's are left for later extensions [See Holtz-Eakin (1988) and Holtz-Eakin, Newey and Rosen (1988, 1989).] Lastly,

note that only the current data appear directly in the density for the current y it We will also be limiting attention to parametric approaches to modeling The density is assumed to be fully defined This makes maximum likelihood the estimator of choice

The likelihood function for a sample of N observations is

L = i N1 T t1i) g(y it,'xit i,)

The likelihood equations,

0







L

log

, logL 0,i 1, ,N

i



log

0

L

do not have explicit solutions for the parameter estimates in terms of the data and must, therefore,

be solved iteratively In principle, maximization can proceed simply by creating and including a complete set of dummy variables in the model But, at some point, this approach becomes unusable with current technology We are interested in a method that would accommodate a

panel with, say, 50,000 groups, which would mandate estimating a total of 50,000 + K + K

parameters What makes this impractical is a second derivatives matrix (or some approximation

to it) with 50,000 rows and columns But, that consideration is misleading, a proposition we will return to presently

The proliferation of parameters is a practical shortcoming of the fixed effects model The 'incidental parameters problem' is a methodoligical issue If  and  were known, then, the solution for i would be based on only the T(i) observations for group i (see below for an application) This implies that the asymptotic variance for a i is O[1/T(i)] and, since T(i) is fixed,

a i is inconsistent In fact,  is not known; in general in nonlinear settings, the estimator will be a function of the estimator of i , a i,ML Therefore bML, MLE of  is a function of a random variable

which does not converge to a constant as N  , so neither does b ML There is a small sample bias as well The example is unrealistic, but Hsiao (1993, 1996) shows that in a binary logit

model with a single regressor that is a dummy variable and a panel in which T(i) = 2 for all

groups, the small sample bias is +100% No general results exist for the small sample bias in more realistic settings Heckman (1981) found in a Monte Carlo study of a probit model that the bias of the slope estimator in a fixed effects model was toward zero and on the order of 10%

when T(i) = 8 and N = 100 On this basis, it is often noted that in samples at least this large, the

Trang 4

small sample bias is probably not too severe In many microeconometric applications, T(i) is

considerably larger than this, so for practical purposes, there is good cause for optimism

3 Computation of the Fixed Effects Estimator

In the linear case, regression using group mean deviations sweeps out the fixed effects The slope estimator is not a function of the fixed effects which implies that it (unlike the

estimator of the fixed effect) is consistent There are a few analogous cases of nonlinear models

that have been identified in the literature Among them are the binomial logit model,

g(y it , xit + i) = [(2yit - 1)(xit + i)]

where (.) is the cdf for the logistic distribution [see Chamberlain (1980)] In this case, t y it is a sufficient statistic, and estimation in terms of the conditional density provides a consistent estimator of  [See Greene (2000) for discussion.] Three other models which have this property are the Poisson and negative binomial regressions [See Hausman, Hall, and Griliches (1984)] and the exponential regression model

g(y it , xit + i) = (1/it )exp(-y it/it), it = exp(xit + i ), y it  0

[See Munkin and Trivedi (2000) and Greene (2001).] In these models, there is a solution to the likelihood equation for  that is not a function of i Consider the Poisson regression model with fixed effects - the result for the exponential model is essentially the same - for which

log g(y it , , x it + i) = -it + y it log it - log y it!

where it = exp(xit + i) = exp(i)exp(xit) Then,

log L = ) exp( )exp( ' ) ( ' ) log !

1

i T t

N

The likelihood equation for i, logL/i = 0, implies a solution

exp(i) =

( ) 1 ( )

1 exp( 'x )

T i it t

T i

it t

y





Thus, the maximum likelihood estimator of  is not a function of i There are other models with loglinear conditional mean functions, however these are too few and specialized to serve as the benchmark case for a modeling framework In the vast majority of cases of interest to practitioners, including those based on transformations of normally distributed variables such as the probit and tobit models, this method will be unusable

Heckman and MaCurdy (1980) suggested a 'zig-zag' sort of approach to maximization of the log likelihood function, dummy variable coefficients and all Consider the probit model For known set of fixed effect coefficients,  = (1, ,N), estimation of  is straightforward The log

likelihood conditioned on these values (denoted a i), would be

log L|a1, ,a N =  i N1 T i t( )1 log [(2 y it  1 'xit a i)

Trang 5

This can be treated as a cross section estimation problem since with known , there is no connection between observations even within a group With given estimate of  (denoted b) the conditional log likelihood function for each i,

log L i|b = T t1i) log(2y it  1)(z it i)

where z it = bxit is now a known function Maximizing this function is straightforward (if

tedious, since it must be done for each i) Heckman and MaCurdy suggested iterating back and

forth between these two estimators until convergence is achieved There is no guarantee that this back and forth procedure will converge to the true maximum of the log likelihood function because the Hessian is not block diagonal Whether either estimator is even consistent in the

dimension of N (that is, of ) depends on the initial estimator being consistent, and it is unclear

how one should obtain that consistent initial estimator There is no maximum likelihood estimator for i for any group in which the dependent variable is all 1s or all 0s, - the likelihood

equation for log L i has no solution if there is no within group variation in y it This feature of the model carries over to the tobit and binomial logit models, as the authors noted In the Poisson

and negative binomial models, any group which has y it = 0 for all t contributes a 0 to the log

likelihood function so its group specific effect is not identified either Finally, irrespective of its probability limit, the estimated covariance matrix for the estimator of  will be too small, again because the Hessian is not block diagonal The estimator at the  step does not obtain the correct submatrix of the information matrix

Many of the models we have studied involve an ancillary parameter vector,  No generality is gained by treating  separately from , so at this point, we will simply group them

in the single parameter vector  = [,] Denote the gradient of the log likelihood by

g =



logL

=











) , , , ( log

) 1 1

i it it i

T t

N i

y

(a K1 vector)

gi =

i

L





log =

i

i it it i

T t

y g









) , , , ( log

) 1

x



(a scalar)

g = [g1, , g N] (an N1 vector)

g = [g, g] (a (K+N)1 vector).

The full (K+N)  (K+N) Hessian is







NN N

N

h

h h

0 0 0 '

0

0 0

'

0 0

'

22 2

11 1

2 1

h

h h

h H







where

Trang 6

H =

'

) , , , ( log

2 ) 1











i it it i

T t

N i

y

(a K K matrix)

hi =

i

i it it i

T t

y g









, , ) ,

( log

2 ) 1

x

(N K  1 vectors)

2 ) 1

) , , , ( log

i

i it it i

T t

y g









x



(N scalars).

Newton's method of maximizing the log likelihood produces the iteration

k









=

1











k

- H-1k1gk-1 =

1











k

+ 













where subscript 'k' indicates the updated value and 'k-1' indicates a computation at the current

value Let H denote the upper left KK submatrix of H-1 and define the NN matrix H and

KN H likewise Isolating  , then, we have the iteration



k = k-1 - [H g + H g]k-1 = k-1 + 

Using the partitioned inverse formula [e.g., Greene (2000, equation 2-74)], we have

H = [H - HH-1H]-1

The fact that H is diagonal makes this computation simple Collecting the terms,

H =

1

1 1 '



















ii

N

i h h h

H

Thus, the upper left part of the inverse of the Hessian can be computed by summation of vectors

and matrices of order K We also require H Once again using the partitioned inverse formula, this would be

H = -H H H-1

As before, the diagonality of H makes this straightforward Combining terms, we find

 = - H ( g - H -1



H g)

= -

1

1 1 '



















 

k i i ii

N

i h h h

H

1









 







 

k

i ii

i N

i h

g

h g

Trang 7

Turning now to the update for , we use the same results for the partitioned matrices Thus,

 = - [H g + H g]k-1

Using Greene's (2-74) once again, we have

H = H-1 (I + HHHH-1)

H = -H HH = --1 H-1HH

Therefore,

 = - H-1(I + HHHH-1)g + H-1(I + HHHH-1)HH g-1 

= -H-1(g + H)

Since H is diagonal,

i = - 1  i i'

ii

g

h  h .

Neither update vector requires storage or inversion of a (K+N)(K+N) matrix; each is a function of sums of scalars and K1 vectors of first derivatives and mixed second derivatives.1

The practical implication is that calculation of fixed effects models is a computation only of order

K Storage requirements for  and  are linear in N, not quadratic Even for huge panels of

tens of thousands of units, this is well within the capacity of even modest desktop computers In experiments, we have found this method effective for probit models with 10,000 effects (An analyst using this procedure for a tobit model reported success with nearly 15,000 coefficients.) (The amount of computation is not particularly large either, though with the current vintage of 2+ GFLOP processors, computation time for econometric estimation problems is usually not an issue.)

The estimator of the asymptotic covariance matrix for the MLE of  is -H, the upper left

submatrix of -H-1 This is a sum of K  K matrices, and will be of the form of a moment matrix which is easily computed (see the application below) Thus, the asymptotic covariance matrix for the estimated coefficient vector is easily obtained in spite of the size of the problem The

asymptotic covariance matrix of a is

-(H - HH H-1 )-1 = -H-1 - H-1H  {H - H-1 H-1H} -1 HH-1

It is (presumably) not possible to store the asymptotic covariance matrix for the fixed effects estimators (unless there are relatively few of them) But, by expanding the summations where

needed and exploiting the diagonality of H, we find that the individual terms are

1 The iteration for the slope estimator is suggested in the context of the binary logit model by Chamberlain (1980, page 227) A formal derivation of  and  was presented by George Jakubson of Cornell University in an undated memo, "Fixed Effects (Maximum Likelihood) in Nonlinear Models."

Trang 8

1 1



1( ) h i 'H h j

i j

 

  

 

 

Once again, the only matrix to be inverted is K  K, not NN (and, it is already in hand) so this can

be computed by summation It involves only K1 vectors and repeated use of the same KK

inverse matrix Likewise, the asymptotic covariance matrix of the slopes and the constant terms can

be arranged in a computationally feasible format;

Asy.Cov[c,a ] = Asy.Var[c] H -1



This involves NN and KN matrices, but it is simplifies to

Asy.Cov[c,a i ] = Asy.Var[c] i

ii

h



h

4 Applications

To illustrate the preceding, we examine two applications, the binomial probit (and logit) model(s) and a sample selection model (With trivial modification, the first of these will extend

to many other models, as shown below.)2

4.1 Binary Choice and Simple Index Function Models

For a binomial probit model with dependent variable z it,

g(z it , xit + i) = [(2zit - 1)(xit + i)] = (qit r it) = (ait)

and

log L =  i N1 T i t( )1 log [q it'xit i)]

Define the first and second derivatives of log g(z it , xit + i) with respect to (xit + i) as

( )

it it it

a q a





it = -a it it - it, -1 < it < 0

The derivatives of the log likelihood for the probit model are

2 We assume in the following that none of the groups have y it always equal to 1 or 0 In practice, one would have to determine this as part of the estimation process It should be noted for the practitioner that this condition is not trivially obvious during estimation The usual criteria for convergence, such as small  will appear to be met while the associated i is still finite even in the presence of degenerate groups

Trang 9

g i = T i t( )1 qit it

g =  N i1 T i t( )1 qitit itx ,

h ii = T i t ( )1 it,

hi = T i t ( )1 it itx ,

H = N i1 T i t( )1 it it it'

For convenience, let

i = T i t ( )1 it

and

i

~

x = hi / h ii = T i t( )1 it it / T i t( )1 it

Note that x~i is a weighted within group mean of the regressor vectors.

The update vectors and computation of the slope and group effect estimates follows the template given earlier After a bit of manipulation, we find the asymptotic covariance matrix for the slope parameters is

Asy.Var[bMLE] = [-H]-1- =

-1 ( )



The resemblance to the 'within' moment matrix from the analysis of variance context is notable and convenient Inserting the parts and collecting terms produces

 =

1 ( )



   x x~ x x~   N1 T i( )1 it it xit x~i

and

i = T i( )1  it it/ i + '~i

Denote the matrix in the preceding as

V = -[H]-1 = Asy.Var[bMLE]

Then,

Asy.Cov[a i ,a j] = ( ) ( )

+ 'i j = + ij

s

x V x

Finally,

Trang 10

Asy.Cov[bMLE ,a i] = -Vx~i

Each of these involves a moderate amount of computation, but can easily be obtained with

existing software and, most important for our purposes, involves computations that are linear in N and K We note as well that the preceding extends directly to any other simple index function

model, such as the binomial logit model [change derivatives it to (1- it) and it to -it(1 - it) where it is the logit CDF] and the Poisson regression model [replace it with (y it - m it) and it

with -m it where m it = exp(xit + i)] Extension to models that involve ancillary parameters, such

as the tobit model, are a bit more complicated, but not excessively so

The preceding provides the estimator and asymptotic variances for all estimated parameters in the model For inference purposes, note that the unconditional log likelihood function is computed Thus, a test for homogeneity is straightforward using the likelihood ratio test Finally, one would normally want to compute marginal effects for the estimated probit model The conditional mean in the model is

E[z it | xit] = (xit + i)

so the slopes in the model are

it it

it

E z



x

In many applications, marginal effects are computed at the means of the data The heterogeneity

in the fixed effects presents a complication Using the sample mean of the fixed effects estimators, the estimator would be

_

it

E z



_ x

b b x

In order to compute the appropriate asymptotic standard errors for these estimates, we need the asymptotic covariance matrix for the estimated parameters The asymptotic covariance matrix for

the slope estimator is already in hand, so what remains is Asy.Cov[b,a] and Asy.Var[a] For the former,

AsyCov[b,a] = 1 N1 ~

i i





while, by summation, we obtain

Asy.Var[a] = 12 1 1 1 1

+

ij

i

s

These would be assembled in a (K+1)(K+1) matrix, say V* The asymptotic covariance matrix

for the estimated marginal effects would be

Asy.Var[] = GV*G

Tiêu đề	Estimating Econometric Models with Fixed Effects
Tác giả	William Greene
Trường học	New York University
Chuyên ngành	Econometrics
Thể loại	Research Paper
Năm xuất bản	2001
Thành phố	New York

Định dạng
Số trang	14
Dung lượng	240 KB