Book Econometric Analysis of Cross Section and Panel Data By Wooldridge - Chapter 7 pot

; xiT0 Þ0.7.3.2 Asymptotic Properties of System OLS Given the model in equation 7.9, we can state the key orthogonality condition for consistent estimation of b by system ordinary least

Trang 1

7.1 Introduction

This chapter begins our analysis of linear systems of equations The ﬁrst method ofestimation we cover is system ordinary least squares, which is a direct extension ofOLS for single equations In some important special cases the system OLS estimatorturns out to have a straightforward interpretation in terms of single-equation OLSestimators But the method is applicable to very general linear systems of equations

We then turn to a generalized least squares (GLS) analysis Under certain sumptions, GLS—or its operationalized version, feasible GLS—will turn out to beasymptotically more e‰cient than system OLS However, we emphasize in this chapterthat the e‰ciency of GLS comes at a price: it requires stronger assumptions thansystem OLS in order to be consistent This is a practically important point that isoften overlooked in traditional treatments of linear systems, particularly those whichassume that explanatory variables are nonrandom

as-As with our single-equation analysis, we assume that a random sample is availablefrom the population Usually the unit of observation is obvious—such as a worker, ahousehold, a ﬁrm, or a city For example, if we collect consumption data on variouscommodities for a sample of families, the unit of observation is the family (not acommodity)

The framework of this chapter is general enough to apply to panel data models.Because the asymptotic analysis is done as the cross section dimension tends to in-finity, the results are explicitly for the case where the cross section dimension is largerelative to the time series dimension (For example, we may have observations on Nfirms over the same T time periods for each firm Then, we assume we have a randomsample of firms that have data in each of the T years.) The panel data model coveredhere, while having many useful applications, does not fully exploit the replicabilityover time In Chapters 10 and 11 we explicitly consider panel data models that con-tain time-invariant, unobserved e¤ects in the error term

7.2 Some Examples

We begin with two examples of systems of equations These examples are fairly eral, and we will see later that variants of them can also be cast as a general linearsystem of equations

gen-Example 7.1 (Seemingly Unrelated Regressions): The population model is a set of

G linear equations,

Trang 2

where xg is 1 Kg and bg is Kg 1, g ¼ 1; 2; ; G In many applications xg is the

same for all g (in which case the bg necessarily have the same dimension), but thegeneral model allows the elements and the dimension of xg to vary across equations.Remember, the system (7.1) represents a generic person, ﬁrm, city, or whatever fromthe population The system (7.1) is often called Zellner’s (1962) seemingly unrelatedregressions (SUR) model (for cross section data in this case) The name comes from

the fact that, since each equation in the system (7.1) has its own vector bg, it appearsthat the equations are unrelated Nevertheless, correlation across the errors in di¤er-ent equations can provide links that can be exploited in estimation; we will see thispoint later

As a speciﬁc example, the system (7.1) might represent a set of demand functionsfor the population of families in a country:

housing¼ b10þ b11houseprcþ b12foodprcþ b13clothprcþ b14income

In this example, G¼ 3 and xg (a 1 7 vector) is the same for g ¼ 1; 2; 3

When we need to write the equations for a particular random draw from the ulation, yg, xg, and ug will also contain an i subscript: equation g becomes yig¼

pop-xigbgþ uig For the purposes of stating assumptions, it does not matter whether ornot we include the i subscript The system (7.1) has the advantage of being less clut-tered while focusing attention on the population, as is appropriate for applications.But for derivations we will often need to indicate the equation for a generic crosssection unit i

When we study the asymptotic properties of various estimators of the bg, theasymptotics is done with G ﬁxed and N tending to inﬁnity In the household demandexample, we are interested in a set of three demand functions, and the unit of obser-

Trang 3

vation is the family Therefore, inference is done as the number of families in thesample tends to inﬁnity.

The assumptions that we make about how the unobservables ug are related to theexplanatory variablesðx1; x2; ; xGÞ are crucial for determining which estimators of

the bg have acceptable properties Often, when system (7.1) represents a structuralmodel (without omitted variables, errors-in-variables, or simultaneity), we can as-sume that

Eðugj x1; x2; ; xGÞ ¼ 0; g¼ 1; ; G ð7:2ÞOne important implication of assumption (7.2) is that ug is uncorrelated with theexplanatory variables in all equations, as well as all functions of these explanatoryvariables When system (7.1) is a system of equations derived from economic theory,assumption (7.2) is often very natural For example, in the set of demand functionsthat we have presented, xg1 x is the same for all g, and so assumption (7.2) is the

same as Eðugj xgÞ ¼ Eðugj xÞ ¼ 0

If assumption (7.2) is maintained, and if the xgare not the same across g, then anyexplanatory variables excluded from equation g are assumed to have no e¤ect onexpected yg once xghas been controlled for That is,

Eð ygj x1; x2; xGÞ ¼ Eð ygj xgÞ ¼ xgbg; g¼ 1; 2; ; G ð7:3ÞThere are examples of SUR systems where assumption (7.3) is too strong, but stan-dard SUR analysis either explicitly or implicitly makes this assumption

Our next example involves panel data

Example 7.2 (Panel Data Model): Suppose that for each cross section unit we serve data on the same set of variables for T time periods Let xt be a 1 K vectorfor t¼ 1; 2; ; T, and let b be a K 1 vector The model in the population is

where yt is a scalar For example, a simple equation to explain annual family savingover a ﬁve-year span is

savt¼ b0þ b1inctþ b2agetþ b3eductþ ut; t¼ 1; 2; ; 5

where inct is annual income, educt is years of education of the household head, andaget is age of the household head This is an example of a linear panel data model It

is a static model because all explanatory variables are dated contemporaneously withsavt

The panel data setup is conceptually very di¤erent from the SUR example In ample 7.1, each equation explains a di¤erent dependent variable for the same cross

Trang 4

Ex-section unit Here we only have one dependent variable we are trying to explain—sav—but we observe sav, and the explanatory variables, over a ﬁve-year period.(Therefore, the label ‘‘system of equations’’ is really a misnomer for panel dataapplications At this point, we are using the phrase to denote more than one equation

in any context.) As we will see in the next section, the statistical properties of mators in SUR and panel data models can be analyzed within the same structure.When we need to indicate that an equation is for a particular cross section unit iduring a particular time period t, we write yit¼ xitbþ uit We will omit the i sub-script whenever its omission does not cause confusion

esti-What kinds of exogeneity assumptions do we use for panel data analysis? Onepossibility is to assume that ut and xt are orthogonal in the conditional mean sense:

We call this contemporaneous exogeneity of xt because it only restricts the ship between the disturbance and explanatory variables in the same time period It isvery important to distinguish assumption (7.5) from the stronger assumption

relation-Eðutj x1; x2; ; xTÞ ¼ 0; t¼ 1; ; T ð7:6Þwhich, combined with model (7.4), is identical to Eð ytj x1; x2; ; xTÞ ¼ Eð ytj xtÞ.Assumption (7.5) places no restrictions on the relationship between xs and ut for

s 0 t, while assumption (7.6) implies that each utis uncorrelated with the explanatoryvariables in all time periods When assumption (7.6) holds, we say that the explana-tory variablesfx1; x2; ; xt; ; xTg are strictly exogenous

To illustrate the di¤erence between assumptions (7.5) and (7.6), let xt1ð1; yt1Þ.Then assumption (7.5) holds if Eð ytj yt1; yt2; ; y0Þ ¼ b0þ b1yt1, which imposesﬁrst-order dynamics in the conditional mean However, assumption (7.6) must failsince xtþ1 ¼ ð1; ytÞ, and therefore Eðutj x1; x2; ; xTÞ ¼ Eðutj y0; y1; ; yT1Þ ¼ utfor t¼ 1; 2; ; T 1 (because ut¼ yt b0 b1yt1Þ

Assumption (7.6) can fail even if xt does not contain a lagged dependent variable.Consider a model relating poverty rates to welfare spending per capita, at the citylevel A ﬁnite distributed lag (FDL) model is

povertyt ¼ ytþ d0welfaretþ d1welfaret1þ d2welfaret2þ ut ð7:7Þwhere we assume a two-year e¤ect The parameter yt simply denotes a di¤erent ag-gregate time e¤ect in each year It is reasonable to think that welfare spending reacts

to lagged poverty rates An equation that captures this feedback is

Trang 5

Even if equation (7.7) contains enough lags of welfare spending, assumption (7.6)would be violated if r100 in equation (7.8) because welfaretþ1 depends on ut and

xtþ1includes welfaretþ1

How we go about consistently estimating b depends crucially on whether we

maintain assumption (7.5) or the stronger assumption (7.6) Assuming that the xitareﬁxed in repeated samples is e¤ectively the same as making assumption (7.6)

7.3 System OLS Estimation of a Multivariate Linear System

7.3.1 Preliminaries

We now analyze a general multivariate model that contains the examples in Section7.2, and many others, as special cases Assume that we have independent, identicallydistributed cross section observationsfðXi; yiÞ: i ¼ 1; 2; ; Ng, where Xiis a G Kmatrix and yi is a G 1 vector Thus, yi contains the dependent variables for all Gequations (or time periods, in the panel data case) The matrix Xi contains the ex-planatory variables appearing anywhere in the system For notational clarity we in-clude the i subscript for stating the general model and the assumptions

The multivariate linear model for a random draw from the population can beexpressed as

where b is the K 1 parameter vector of interest and ui is a G 1 vector of observables Equation (7.9) explains the G variables yi1; ; yiG in terms of Xi andthe unobservables ui Because of the random sampling assumption, we can state allassumptions in terms of a generic observation; in examples, we will often omit the isubscript

un-Before stating any assumptions, we show how the two examples introduced inSection 7.2 ﬁt into this framework

Example 7.1 (SUR, continued): The SUR model (7.1) can be expressed as inequation (7.9) by deﬁning yi¼ ð yi1; yi2; ; yiGÞ0, ui¼ ðui1; ui2; ; uiGÞ0, and

@

1CC

Trang 6

Note that the dimension of Xi is G ðK1þ K2þ þ KGÞ, so we deﬁne K 1

K1þ þ KG

Example 7.2 (Panel Data, continued): The panel data model (7.6) can be expressed

as in equation (7.9) by choosing Xito be the T K matrix Xi¼ ðx0

i1; xi20; ; xiT0 Þ0.7.3.2 Asymptotic Properties of System OLS

Given the model in equation (7.9), we can state the key orthogonality condition for

consistent estimation of b by system ordinary least squares (SOLS).

assumptionSOLS.1: EðXi0uiÞ ¼ 0

Assumption SOLS.1 appears similar to the orthogonality condition for OLS analysis

of single equations What it implies di¤ers across examples because of the equation nature of equation (7.9) For most applications, Xihas a su‰cient number

multiple-of elements equal to unity so that Assumption SOLS.1 implies that EðuiÞ ¼ 0, and weassume zero mean for the sake of discussion

It is informative to see what Assumption SOLS.1 entails in the previous examples.Example 7.1 (SUR, continued): In the SUR case, Xi0ui¼ ðxi1ui1; ; xiGuiGÞ0, and

so Assumption SOLS.1 holds if and only if

Like assumption (7.5), assumption (7.12) allows xis and uit to be correlated when

s 0 t; in fact, assumption (7.12) is weaker than assumption (7.5) Therefore,

As-sumption SOLS.1 does not impose strict exogeneity in panel data contexts

Assumption SOLS.1 is the weakest assumption we can impose in a regression

framework to get consistent estimators of b As the previous examples show,

As-sumption SOLS.1 allows some elements of Xi to be correlated with elements of ui.Much stronger is the zero conditional mean assumption

Trang 7

which implies, among other things, that every element of Xi and every element of uiare uncorrelated [Of course, assumption (7.13) is not as strong as assuming that uiand Xi are actually independent.] Even though assumption (7.13) is stronger thanAssumption SOLS.1, it is, nevertheless, reasonable in some applications.

Under Assumption SOLS.1 the vector b satisﬁes

or EðXi0XiÞb ¼ EðXi0yiÞ For each i, Xi0yi is a K 1 random vector and Xi0Xi is a

K K symmetric, positive semideﬁnite random matrix Therefore, EðX0

iXiÞ is always

a K K symmetric, positive semideﬁnite nonrandom matrix (the expectation here isdeﬁned over the population distribution of Xi) To be able to estimate b we need to

assume that it is the only K 1 vector that satisﬁes assumption (7.14)

assumptionSOLS.2: A 1 EðXi0XiÞ is nonsingular (has rank K )

Under Assumptions SOLS.1 and SOLS.2 we can write b as

which shows that Assumptions SOLS.1 and SOLS.2 identify the vector b The ogy principle suggests that we estimate b by the sample analogue of assumption (7.15) Deﬁne the system ordinary least squares (SOLS) estimator of b as

Xi0yi

!

ð7:16Þ

For computing ^b b using matrix language programming, it is sometimes useful to write

^¼ ðX0XÞ1X0Y, where X 1ðX10; X20; ; XN0Þ0 is the NG K matrix of stacked X

and Y 1ðy10; y20; ; yN0Þ0is the NG 1 vector of stacked observations on the yi Forasymptotic derivations, equation (7.16) is much more convenient In fact, the con-sistency of ^b b can be read o¤ of equation (7.16) by taking probability limits We

summarize with a theorem:

theorem 7.1 (Consistency of System OLS): Under Assumptions SOLS.1 andSOLS.2, ^b !p b.

It is useful to see what the system OLS estimator looks like for the SUR and paneldata examples

Example 7.1 (SUR, continued): For the SUR model,

Trang 8

Xi0yi¼XN i¼1

xi10 yi1

xi20 yi2

xiG0 yiG

0BB

@

1CCA

Straightforward inversion of a block diagonal matrix shows that the OLS estimatorfrom equation (7.16) can be written as ^b¼ ð ^b10; ^b20; ; ^bG0Þ0, where each ^bg is just thesingle-equation OLS estimator from the gth equation In other words, system OLS

estimation of a SUR model (without restrictions on the parameter vectors bg) isequivalent to OLS equation by equation Assumption SOLS.2 is easily seen to hold ifEðxig0xigÞ is nonsingular for all g

Example 7.2 (Panel Data, continued): In the panel data case,

XT t¼1

EðPT

t¼1xit0xitÞ ¼ K

In the general system (7.9), the system OLS estimator does not necessarily have aninterpretation as OLS equation by equation or as pooled OLS As we will see inSection 7.7 for the SUR setup, sometimes we want to impose cross equation restric-

tions on the bg, in which case the system OLS estimator has no simple interpretation.While OLS is consistent under Assumptions SOLS.1 and SOLS.2, it is not neces-sarily unbiased Assumption (7.13), and the ﬁnite sample assumption rankðX0XÞ ¼

K, do ensure unbiasedness of OLS conditional on X [This conclusion follows cause, under independent sampling, Eðu j X ; X ; ; X Þ ¼ Eðu j XÞ ¼ 0 under as-

Trang 9

be-sumption (7.13).] We focus on the weaker Asbe-sumption SOLS.1 because asbe-sumption(7.13) is often violated in economic applications, something we will see especially inour panel data analysis.

For inference, we need to ﬁnd the asymptotic variance of the OLS estimator underessentially the same two assumptions; technically, the following derivation requiresthe elements of Xi0uiui0Xito have ﬁnite expected absolute value From (7.16) and (7.9)write

ð ^b bÞ that is a nonrandom linear combination of a

par-tial sum that satisﬁes the CLT Equations (7.18) and (7.20) and the asymptoticequivalence lemma imply

ffiffiffiffiffi

N

p

We summarize with a theorem

theorem 7.2 (Asymptotic Normality of SOLS): Under Assumptions SOLS.1 andSOLS.2, equation (7.21) holds

Trang 10

The asymptotic variance of ^b b is

Xi0^i^i0Xi

!

XN i¼1

H0: bj¼ 0 Sometimes the t statistics are treated as being distributed as tNGK, which

is asymptotically valid because NG K should be large

The estimator in matrix (7.26) is another example of a robust variance matrix mator because it is valid without any second-moment assumptions on the errors ui(except, as usual, that the second moments are well deﬁned) In a multivariate setting

esti-it is important to know what this robustness allows First, the G G unconditional

variance matrix, W 1 Eðuiui0Þ, is entirely unrestricted This fact allows cross equationcorrelation in an SUR system as well as di¤erent error variances in each equation

In panel data models, an unrestricted W allows for arbitrary serial correlation and

Trang 11

time-varying variances in the disturbances A second kind of robustness is that theconditional variance matrix, Varðuij XiÞ, can depend on Xi in an arbitrary, unknownfashion The generality a¤orded by formula (7.26) is possible because of the N! y

asymptotics

In special cases it is useful to impose more structure on the conditional and conditional variance matrix of ui in order to simplify estimation of the asymptoticvariance We will cover an important case in Section 7.5.2 Essentially, the key re-striction will be that the conditional and unconditional variances of uiare the same.There are also some special assumptions that greatly simplify the analysis of thepooled OLS estimator for panel data; see Section 7.8

un-7.3.3 Testing Multiple Hypotheses

Testing multiple hypotheses in a very robust manner is easy once ^VV in matrix (7.26)has been obtained The robust Wald statistic for testing H0: Rb¼ r, where R is Q Kwith rank Q and r is Q 1, has its usual form, W ¼ ðR ^b rÞ0ðR ^VVR0Þ1ðR ^b rÞ.Under H0, W @a w2

Q In the SUR case this is the easiest and most robust way oftesting cross equation restrictions on the parameters in di¤erent equations using sys-tem OLS In the panel data setting, the robust Wald test provides a way of testing

multiple hypotheses about b without assuming homoskedasticity or serial

indepen-dence of the errors

7.4 Consistency and Asymptotic Normality of Generalized Least Squares

7.4.1 Consistency

System OLS is consistent under fairly weak assumptions, and we have seen how toperform robust inference using OLS If we strengthen Assumption SOLS.1 and addassumptions on the conditional variance matrix of ui, we can do better using a gen-eralized least squares procedure As we will see, GLS is not usually feasible because itrequires knowing the variance matrix of the errors up to a multiplicative constant.Nevertheless, deriving the consistency and asymptotic distribution of the GLS esti-mator is worthwhile because it turns out that the feasible GLS estimator is asymp-totically equivalent to GLS

We start with the model (7.9), but consistency of GLS generally requires a strongerassumption than Assumption SOLS.1 We replace Assumption SOLS.1 with the as-sumption that each element of ui is uncorrelated with each element of Xi We canstate this succinctly using the Kronecker product:

Trang 12

assumptionSGLS.1: EðXin uiÞ ¼ 0.

Typically, at least one element of Xi is unity, so in practice Assumption SGLS.1implies that EðuiÞ ¼ 0 We will assume uihas a zero mean for our discussion but not

in proving any results

Assumption SGLS.1 plays a crucial role in establishing consistency of the GLSestimator, so it is important to recognize that it puts more restrictions on the ex-planatory variables than does Assumption SOLS.1 In other words, when we allowthe explanatory variables to be random, GLS requires a stronger assumption thansystem OLS in order to be consistent Su‰cient for Assumption SGLS.1, but notnecessary, is the zero conditional mean assumption (7.13) This conclusion followsfrom a standard iterated expectations argument

For GLS estimation of multivariate equations with i.i.d observations, the moment matrix of ui plays a key role Deﬁne the G G symmetric, positive semi-deﬁnite matrix

As mentioned in Section 7.3.2, we call W the unconditional variance matrix of ui [Inthe rare case that EðuiÞ 0 0, W is not the variance matrix of ui, but it is always theappropriate matrix for GLS estimation.] It is important to remember that expression(7.27) is deﬁnitional: because we are using random sampling, the unconditional vari-ance matrix is necessarily the same for all i

In place of Assumption SOLS.2, we assume that a weighted version of the expectedouter product of Xi is nonsingular

assumptionSGLS.2: W is positive deﬁnite and EðXi0W1XiÞ is nonsingular

For the general treatment we assume that W is positive deﬁnite, rather than justpositive semideﬁnite In applications where the dependent variables across equationssatisfy an adding up constraint—such as expenditure shares summing to unity—anequation must be dropped to ensure that W is nonsingular, a topic we return to inSection 7.7.3 As a practical matter, Assumption SGLS.2 is not very restrictive Theassumption that the K K matrix EðXi0W1XiÞ has rank K is the analogue of As-sumption SOLS.2

The usual motivation for the GLS estimator is to transform a system of equationswhere the error has nonscalar variance-covariance matrix into a system where theerror vector has a scalar variance-covariance matrix We obtain this by multiplyingequation (7.9) by W1=2:

Trang 13

this estimator b Then

X0i yi

!

¼ XN i¼1

Xi0W1Xi

!1

XN i¼1

We can write b using full matrix notation as b¼ ½X0ðINnW1ÞX1

½X0ðINnW1ÞY, where X and Y are the data matrices deﬁned in Section 7.3.2 and

IN is the N N identity matrix But for establishing the asymptotic properties of b,

it is most convenient to work with equation (7.29)

We can establish consistency of b under Assumptions SGLS.1 and SGLS.2 bywriting

Now we must show that plim N1PN

i¼1Xi0W1ui¼ 0 By the WLLN, it is su‰cientthat EðXi0W1uiÞ ¼ 0 This is where Assumption SGLS.1 comes in We can argue thispoint informally because W1Xiis a linear combination of Xi, and since each element

of Xiis uncorrelated with each element of ui, any linear combination of Xi is related with ui We can also show this directly using the algebra of Kronecker prod-ucts and vectorization For conformable matrices D, E, and F, recall that vecðDEFÞ

uncor-¼ ðF0n DÞ vecðEÞ, where vecðCÞ is the vectorization of the matrix C [That is, vecðCÞ

is the column vector obtained by stacking the columns of C from ﬁrst to last; seeTheil (1983).] Therefore, under Assumption SGLS.1,

vec EðXi0W1uiÞ ¼ E½ðu0

in Xi0Þ vecðW1Þ ¼ E½ðuin XiÞ0 vecðW1Þ ¼ 0

Trang 14

where we have also used the fact that the expectation and vec operators can beinterchanged We can now read the consistency of the GLS estimator o¤ of equation(7.30) We do not state this conclusion as a theorem because the GLS estimator itself

is rarely available

The proof of consistency that we have sketched fails if we only make AssumptionSOLS.1: EðXi0uiÞ ¼ 0 does not imply EðXi0W1uiÞ ¼ 0, except when W and Xi havespecial structures If Assumption SOLS.1 holds but Assumption SGLS.1 fails, thetransformation in equation (7.28) generally induces correlation between Xi and ui.This can be an important point, especially for certain panel data applications If we

are willing to make the zero conditional mean assumption (7.13), bcan be shown to

be unbiased conditional on X

7.4.2 Asymptotic Normality

We now sketch the asymptotic normality of the GLS estimator under AssumptionsSGLS.1 and SGLS.2 and some weak moment conditions The first step is familiar:ffiffiffiffiffi

Trang 15

7.5 Feasible GLS

7.5.1 Asymptotic Properties

Obtaining the GLS estimator b requires knowing W up to scale That is, we must beable to write W¼ s2C where C is a known G G positive deﬁnite matrix and s2 isallowed to be an unknown constant Sometimes C is known (one case is C¼ IG), butmuch more often it is unknown Therefore, we now turn to the analysis of feasibleGLS (FGLS) estimation

In FGLS estimation we replace the unknown matrix W with a consistent estimator.Because the estimator of W appears highly nonlinearly in the expression for theFGLS estimator, deriving ﬁnite sample properties of FGLS is generally di‰cult.[However, under essentially assumption (7.13) and some additional assumptions,including symmetry of the distribution of ui, Kakwani (1967) showed that the distri-

bution of the FGLS is symmetric about b, a property which means that the FGLS

is unbiased if its expected value exists; see also Schmidt (1976, Section 2.5).] Theasymptotic properties of the FGLS estimator are easily established as N ! y be-

cause, as we will show, its ﬁrst-order asymptotic properties are identical to those ofthe GLS estimator under Assumptions SGLS.1 and SGLS.2 It is for this purposethat we spent some time on GLS After establishing the asymptotic equivalence, wecan easily obtain the limiting distribution of the FGLS estimator Of course, GLS istrivially a special case of FGLS, where there is no ﬁrst-stage estimation error

We assume we have a consistent estimator, ^WW, of W:

estimator of b, which we denote b^ b b in this section to avoid confusion We already^

showed that b^ b b is consistent for b under Assumptions SOLS.1 and SOLS.2, and^

therefore under Assumptions SGLS.1 and SOLS.2 (In what follows, we assume thatAssumptions SOLS.2 and SGLS.2 both hold.) By the WLLN, plimðN1PN

Trang 16

where ^^i1 yi Xi^

b^

b

b are the SOLS residuals We can show that this estimator is

sistent for W under Assumptions SGLS.1 and SOLS.2 and standard moment ditions First, write

so that

^i^0

i ¼ uiui0 uiðb^ b^ bÞ0Xi0 Xiðb^ b^ bÞui0þ Xiðb^ b^ bÞð b^ b^ bÞ0Xi0 ð7:39ÞTherefore, it su‰ces to show that the averages of the last three terms converge inprobability to zero Write the average of the vec of the ﬁrst term as N1PN

i¼1ðXin uiÞ

ðb^ b^ bÞ, which is opð1Þ because plimðb^ b^ bÞ ¼ 0 and N1PN

i¼1ðXin uiÞ !p 0 Thethird term is the transpose of the second For the last term in equation (7.39), notethat the average of its vec can be written as

a di¤erent estimator of W is often used that exploits these restrictions As with ^W

in equation (7.37), such estimators typically use the system OLS residuals in somefashion and lead to consistent estimators assuming the structure of W is correctlyspeciﬁed The advantage of equation (7.37) is that it is consistent for W quite gener-ally However, if N is not very large relative to G, equation (7.37) can have poor ﬁnitesample properties

Given ^WW, the feasible GLS (FGLS) estimator of b is

Trang 17

We have already shown that the (infeasible) GLS estimator is consistent underAssumptions SGLS.1 and SGLS.2 Because ^WW converges to W, it is not surprisingthat FGLS is also consistent Rather than show this result separately, we verify thestronger result that FGLS has the same limiting distribution as GLS.

The limiting distribution of FGLS is obtained by writing

Xi0W^1ui

!

ð7:43ÞNow

Xi0W1ui

!

þ opð1Þ ð7:44ÞThe first term in equation (7.44) is just ffiffiffiffiffi

N

p

ðb bÞ, where b is the GLS estimator

We can write equation (7.44) as

statement is much stronger than simply saying that b and ^b b are both consistent for

b There are many estimators, such as system OLS, that are consistent for b but are

not ffiffiffiffiffi

N

p

-equivalent to b

The asymptotic equivalence of ^b b and bhas practically important consequences The

most important of these is that, for performing asymptotic inference about b using

^

b, we do not have to worry that ^WW is an estimator of W Of course, whether theasymptotic approximation gives a reasonable approximation to the actual distribu-tion of ^b b is di‰cult to tell With large N, the approximation is usually pretty good.

Trang 18

But if N is small relative to G, ignoring estimation of W in performing inference

about b can be misleading.

We summarize the limiting distribution of FGLS with a theorem

theorem 7.3 (Asymptotic Normality of FGLS): Under Assumptions SGLS.1 andSGLS.2,

ffiffiffiffiffi

N

p

where A is deﬁned in equation (7.31) and B is deﬁned in equation (7.33)

In the FGLS context a consistent estimator of A is

Xi0W^1^i^i0W^1Xi

!

XN i¼1

Xi0W^1Xi

!1ð7:49Þ

This is the extension of the White (1980b) heteroskedasticity-robust asymptotic ance estimator to the case of systems of equations; see also White (1984) This esti-mator is valid under Assumptions SGLS.1 and SGLS.2; that is, it is completelyrobust

vari-7.5.2 Asymptotic Variance of FGLS under a Standard Assumption

Under the assumptions so far, FGLS really has nothing to o¤er over SOLS In dition to being computationally more di‰cult, FGLS is less robust than SOLS Sowhy is FGLS used? The answer is that, under an additional assumption, FGLS is

Trang 19

ad-asymptotically more e‰cient than SOLS (and other estimators) First, we state theweakest condition that simpliﬁes estimation of the asymptotic variance for FGLS.For reasons to be seen shortly, we call this a system homoskedasticity assumption.assumptionSGLS.3: EðXi0W1uiui0W1XiÞ ¼ EðXi0W1XiÞ, where W 1 Eðuiui0Þ.Another way to state this assumption is, B¼ A, which, from expression (7.46), sim-pliﬁes the asymptotic variance As stated, Assumption SGLS.3 is somewhat di‰cult

to interpret When G¼ 1, it reduces to Assumption OLS.3 When W is diagonal and

Xihas either the SUR or panel data structure, Assumption SGLS.3 implies a kind ofconditional homoskedasticity in each equation (or time period) Generally, Assump-tion SGLS.3 puts restrictions on the conditional variances and covariances of ele-ments of ui A su‰cient (though certainly not necessary) condition for AssumptionSGLS.3 is easier to interpret:

If Eðuij XiÞ ¼ 0, then assumption (7.50) is the same as assuming Varðuij XiÞ ¼VarðuiÞ ¼ W, which means that each variance and each covariance of elementsinvolving uimust be constant conditional on all of Xi This is a very natural way ofstating a system homoskedasticity assumption, but it is sometimes too strong.When G¼ 2, W contains three distinct elements, s2¼ Eðu2

i1Þ, s2¼ Eðu2

i2Þ, and

s12¼ Eðui1ui2Þ These elements are not restricted by the assumptions we have made.(The inequality js12j < s1s2 must always hold for W to be a nonsingular covariancematrix.) However, assumption (7.50) requires Eðu2

i1j XiÞ ¼ s2

1, Eðu2 i2j XiÞ ¼ s2

2, andEðui1ui2j XiÞ ¼ s12: the conditional variances and covariance must not depend on Xi.That assumption (7.50) implies Assumption SGLS.3 is a consequence of iteratedexpectations:

EðXi0W1uiui0W1XiÞ ¼ E½EðXi0W1uiui0W1Xij XiÞ

¼ E½Xi0W1Eðuiui0j XiÞW1Xi ¼ EðXi0W1WW1XiÞ

¼ EðXi0W1XiÞWhile assumption (7.50) is easier to intepret, we use Assumption SGLS.3 for statingthe next theorem because there are cases, including some dynamic panel data models,where Assumption SGLS.3 holds but assumption (7.50) does not

theorem 7.4 (Usual Variance Matrix for FGLS): Under Assumptions SGLS.1–SGLS.3, the asymptotic variance of the FGLS estimator is Avarð ^bÞ ¼ A1=N 1

½EðX0W1XiÞ1=N

Trang 20

We obtain an estimator of Avarð ^bÞ by using our consistent estimator of A:

Assumption (7.50) also has important e‰ciency implications One consequence ofProblem 7.2 is that, under Assumptions SGLS.1, SOLS.2, SGLS.2, and (7.50), theFGLS estimator is more e‰cient than the system OLS estimator We can actually saymuch more: FGLS is more e‰cient than any other estimator that uses the ortho-gonality conditions EðXin uiÞ ¼ 0 This conclusion will follow as a special case ofTheorem 8.4 in Chapter 8, where we deﬁne the class of competing estimators If

we replace Assumption SGLS.1 with the zero conditional mean assumption (7.13),then an even stronger e‰ciency result holds for FGLS, something we treat inSection 8.6

7.6 Testing Using FGLS

Asymptotic standard errors are obtained in the usual fashion from the asymptoticvariance estimates We can use the nonrobust version in equation (7.51) or, evenbetter, the robust version in equation (7.49), to construct t statistics and conﬁdenceintervals Testing multiple restrictions is fairly easy using the Wald test, which alwayshas the same general form The important consideration lies in choosing the asymp-totic variance estimate, ^VV Standard Wald statistics use equation (7.51), and thisapproach produces limiting chi-square statistics under the homoskedasticity assump-tion SGLS.3 Completely robust Wald statistics are obtained by choosing ^VV as inequation (7.49)

If Assumption SGLS.3 holds under H0, we can deﬁne a statistic based on theweighted sums of squared residuals To obtain the statistic, we estimate the model

with and without the restrictions imposed on b, where the same estimator of W,

usu-ally based on the unrestricted SOLS residuals, is used in obtaining the restricted andunrestricted FGLS estimators Let ~ui denote the residuals from constrained FGLS(with Q restrictions imposed on ~b b) using variance matrix ^WW It can be shown that,under H0and Assumptions SGLS.1–SGLS.3,

Tiêu đề	Estimating Systems of Equations by OLS and GLS in Wooldridge's Chapter 7 pot
Trường học	University of Wooldridge
Chuyên ngành	Econometrics
Thể loại	Textbook Chapter

Định dạng
Số trang	40
Dung lượng	259,15 KB