; xiT0 Þ0.7.3.2 Asymptotic Properties of System OLS Given the model in equation 7.9, we can state the key orthogonality condition for consistent estimation of b by system ordinary least
Trang 17.1 Introduction
This chapter begins our analysis of linear systems of equations The first method ofestimation we cover is system ordinary least squares, which is a direct extension ofOLS for single equations In some important special cases the system OLS estimatorturns out to have a straightforward interpretation in terms of single-equation OLSestimators But the method is applicable to very general linear systems of equations
We then turn to a generalized least squares (GLS) analysis Under certain sumptions, GLS—or its operationalized version, feasible GLS—will turn out to beasymptotically more e‰cient than system OLS However, we emphasize in this chapterthat the e‰ciency of GLS comes at a price: it requires stronger assumptions thansystem OLS in order to be consistent This is a practically important point that isoften overlooked in traditional treatments of linear systems, particularly those whichassume that explanatory variables are nonrandom
as-As with our single-equation analysis, we assume that a random sample is availablefrom the population Usually the unit of observation is obvious—such as a worker, ahousehold, a firm, or a city For example, if we collect consumption data on variouscommodities for a sample of families, the unit of observation is the family (not acommodity)
The framework of this chapter is general enough to apply to panel data models.Because the asymptotic analysis is done as the cross section dimension tends to in-finity, the results are explicitly for the case where the cross section dimension is largerelative to the time series dimension (For example, we may have observations on Nfirms over the same T time periods for each firm Then, we assume we have a randomsample of firms that have data in each of the T years.) The panel data model coveredhere, while having many useful applications, does not fully exploit the replicabilityover time In Chapters 10 and 11 we explicitly consider panel data models that con-tain time-invariant, unobserved e¤ects in the error term
7.2 Some Examples
We begin with two examples of systems of equations These examples are fairly eral, and we will see later that variants of them can also be cast as a general linearsystem of equations
gen-Example 7.1 (Seemingly Unrelated Regressions): The population model is a set of
G linear equations,
Trang 2where xg is 1 Kg and bg is Kg 1, g ¼ 1; 2; ; G In many applications xg is the
same for all g (in which case the bg necessarily have the same dimension), but thegeneral model allows the elements and the dimension of xg to vary across equations.Remember, the system (7.1) represents a generic person, firm, city, or whatever fromthe population The system (7.1) is often called Zellner’s (1962) seemingly unrelatedregressions (SUR) model (for cross section data in this case) The name comes from
the fact that, since each equation in the system (7.1) has its own vector bg, it appearsthat the equations are unrelated Nevertheless, correlation across the errors in di¤er-ent equations can provide links that can be exploited in estimation; we will see thispoint later
As a specific example, the system (7.1) might represent a set of demand functionsfor the population of families in a country:
housing¼ b10þ b11houseprcþ b12foodprcþ b13clothprcþ b14income
In this example, G¼ 3 and xg (a 1 7 vector) is the same for g ¼ 1; 2; 3
When we need to write the equations for a particular random draw from the ulation, yg, xg, and ug will also contain an i subscript: equation g becomes yig¼
pop-xigbgþ uig For the purposes of stating assumptions, it does not matter whether ornot we include the i subscript The system (7.1) has the advantage of being less clut-tered while focusing attention on the population, as is appropriate for applications.But for derivations we will often need to indicate the equation for a generic crosssection unit i
When we study the asymptotic properties of various estimators of the bg, theasymptotics is done with G fixed and N tending to infinity In the household demandexample, we are interested in a set of three demand functions, and the unit of obser-
Trang 3vation is the family Therefore, inference is done as the number of families in thesample tends to infinity.
The assumptions that we make about how the unobservables ug are related to theexplanatory variablesðx1; x2; ; xGÞ are crucial for determining which estimators of
the bg have acceptable properties Often, when system (7.1) represents a structuralmodel (without omitted variables, errors-in-variables, or simultaneity), we can as-sume that
Eðugj x1; x2; ; xGÞ ¼ 0; g¼ 1; ; G ð7:2ÞOne important implication of assumption (7.2) is that ug is uncorrelated with theexplanatory variables in all equations, as well as all functions of these explanatoryvariables When system (7.1) is a system of equations derived from economic theory,assumption (7.2) is often very natural For example, in the set of demand functionsthat we have presented, xg1 x is the same for all g, and so assumption (7.2) is the
same as Eðugj xgÞ ¼ Eðugj xÞ ¼ 0
If assumption (7.2) is maintained, and if the xgare not the same across g, then anyexplanatory variables excluded from equation g are assumed to have no e¤ect onexpected yg once xghas been controlled for That is,
Eð ygj x1; x2; xGÞ ¼ Eð ygj xgÞ ¼ xgbg; g¼ 1; 2; ; G ð7:3ÞThere are examples of SUR systems where assumption (7.3) is too strong, but stan-dard SUR analysis either explicitly or implicitly makes this assumption
Our next example involves panel data
Example 7.2 (Panel Data Model): Suppose that for each cross section unit we serve data on the same set of variables for T time periods Let xt be a 1 K vectorfor t¼ 1; 2; ; T, and let b be a K 1 vector The model in the population is
where yt is a scalar For example, a simple equation to explain annual family savingover a five-year span is
savt¼ b0þ b1inctþ b2agetþ b3eductþ ut; t¼ 1; 2; ; 5
where inct is annual income, educt is years of education of the household head, andaget is age of the household head This is an example of a linear panel data model It
is a static model because all explanatory variables are dated contemporaneously withsavt
The panel data setup is conceptually very di¤erent from the SUR example In ample 7.1, each equation explains a di¤erent dependent variable for the same cross
Trang 4Ex-section unit Here we only have one dependent variable we are trying to explain—sav—but we observe sav, and the explanatory variables, over a five-year period.(Therefore, the label ‘‘system of equations’’ is really a misnomer for panel dataapplications At this point, we are using the phrase to denote more than one equation
in any context.) As we will see in the next section, the statistical properties of mators in SUR and panel data models can be analyzed within the same structure.When we need to indicate that an equation is for a particular cross section unit iduring a particular time period t, we write yit¼ xitbþ uit We will omit the i sub-script whenever its omission does not cause confusion
esti-What kinds of exogeneity assumptions do we use for panel data analysis? Onepossibility is to assume that ut and xt are orthogonal in the conditional mean sense:
We call this contemporaneous exogeneity of xt because it only restricts the ship between the disturbance and explanatory variables in the same time period It isvery important to distinguish assumption (7.5) from the stronger assumption
relation-Eðutj x1; x2; ; xTÞ ¼ 0; t¼ 1; ; T ð7:6Þwhich, combined with model (7.4), is identical to Eð ytj x1; x2; ; xTÞ ¼ Eð ytj xtÞ.Assumption (7.5) places no restrictions on the relationship between xs and ut for
s 0 t, while assumption (7.6) implies that each utis uncorrelated with the explanatoryvariables in all time periods When assumption (7.6) holds, we say that the explana-tory variablesfx1; x2; ; xt; ; xTg are strictly exogenous
To illustrate the di¤erence between assumptions (7.5) and (7.6), let xt1ð1; yt1Þ.Then assumption (7.5) holds if Eð ytj yt1; yt2; ; y0Þ ¼ b0þ b1yt1, which imposesfirst-order dynamics in the conditional mean However, assumption (7.6) must failsince xtþ1 ¼ ð1; ytÞ, and therefore Eðutj x1; x2; ; xTÞ ¼ Eðutj y0; y1; ; yT1Þ ¼ utfor t¼ 1; 2; ; T 1 (because ut¼ yt b0 b1yt1Þ
Assumption (7.6) can fail even if xt does not contain a lagged dependent variable.Consider a model relating poverty rates to welfare spending per capita, at the citylevel A finite distributed lag (FDL) model is
povertyt ¼ ytþ d0welfaretþ d1welfaret1þ d2welfaret2þ ut ð7:7Þwhere we assume a two-year e¤ect The parameter yt simply denotes a di¤erent ag-gregate time e¤ect in each year It is reasonable to think that welfare spending reacts
to lagged poverty rates An equation that captures this feedback is
Trang 5Even if equation (7.7) contains enough lags of welfare spending, assumption (7.6)would be violated if r100 in equation (7.8) because welfaretþ1 depends on ut and
xtþ1includes welfaretþ1
How we go about consistently estimating b depends crucially on whether we
maintain assumption (7.5) or the stronger assumption (7.6) Assuming that the xitarefixed in repeated samples is e¤ectively the same as making assumption (7.6)
7.3 System OLS Estimation of a Multivariate Linear System
7.3.1 Preliminaries
We now analyze a general multivariate model that contains the examples in Section7.2, and many others, as special cases Assume that we have independent, identicallydistributed cross section observationsfðXi; yiÞ: i ¼ 1; 2; ; Ng, where Xiis a G Kmatrix and yi is a G 1 vector Thus, yi contains the dependent variables for all Gequations (or time periods, in the panel data case) The matrix Xi contains the ex-planatory variables appearing anywhere in the system For notational clarity we in-clude the i subscript for stating the general model and the assumptions
The multivariate linear model for a random draw from the population can beexpressed as
where b is the K 1 parameter vector of interest and ui is a G 1 vector of observables Equation (7.9) explains the G variables yi1; ; yiG in terms of Xi andthe unobservables ui Because of the random sampling assumption, we can state allassumptions in terms of a generic observation; in examples, we will often omit the isubscript
un-Before stating any assumptions, we show how the two examples introduced inSection 7.2 fit into this framework
Example 7.1 (SUR, continued): The SUR model (7.1) can be expressed as inequation (7.9) by defining yi¼ ð yi1; yi2; ; yiGÞ0, ui¼ ðui1; ui2; ; uiGÞ0, and
@
1CC
Trang 6Note that the dimension of Xi is G ðK1þ K2þ þ KGÞ, so we define K 1
K1þ þ KG
Example 7.2 (Panel Data, continued): The panel data model (7.6) can be expressed
as in equation (7.9) by choosing Xito be the T K matrix Xi¼ ðx0
i1; xi20; ; xiT0 Þ0.7.3.2 Asymptotic Properties of System OLS
Given the model in equation (7.9), we can state the key orthogonality condition for
consistent estimation of b by system ordinary least squares (SOLS).
assumptionSOLS.1: EðXi0uiÞ ¼ 0
Assumption SOLS.1 appears similar to the orthogonality condition for OLS analysis
of single equations What it implies di¤ers across examples because of the equation nature of equation (7.9) For most applications, Xihas a su‰cient number
multiple-of elements equal to unity so that Assumption SOLS.1 implies that EðuiÞ ¼ 0, and weassume zero mean for the sake of discussion
It is informative to see what Assumption SOLS.1 entails in the previous examples.Example 7.1 (SUR, continued): In the SUR case, Xi0ui¼ ðxi1ui1; ; xiGuiGÞ0, and
so Assumption SOLS.1 holds if and only if
Like assumption (7.5), assumption (7.12) allows xis and uit to be correlated when
s 0 t; in fact, assumption (7.12) is weaker than assumption (7.5) Therefore,
As-sumption SOLS.1 does not impose strict exogeneity in panel data contexts
Assumption SOLS.1 is the weakest assumption we can impose in a regression
framework to get consistent estimators of b As the previous examples show,
As-sumption SOLS.1 allows some elements of Xi to be correlated with elements of ui.Much stronger is the zero conditional mean assumption
Trang 7which implies, among other things, that every element of Xi and every element of uiare uncorrelated [Of course, assumption (7.13) is not as strong as assuming that uiand Xi are actually independent.] Even though assumption (7.13) is stronger thanAssumption SOLS.1, it is, nevertheless, reasonable in some applications.
Under Assumption SOLS.1 the vector b satisfies
or EðXi0XiÞb ¼ EðXi0yiÞ For each i, Xi0yi is a K 1 random vector and Xi0Xi is a
K K symmetric, positive semidefinite random matrix Therefore, EðX0
iXiÞ is always
a K K symmetric, positive semidefinite nonrandom matrix (the expectation here isdefined over the population distribution of Xi) To be able to estimate b we need to
assume that it is the only K 1 vector that satisfies assumption (7.14)
assumptionSOLS.2: A 1 EðXi0XiÞ is nonsingular (has rank K )
Under Assumptions SOLS.1 and SOLS.2 we can write b as
which shows that Assumptions SOLS.1 and SOLS.2 identify the vector b The ogy principle suggests that we estimate b by the sample analogue of assumption (7.15) Define the system ordinary least squares (SOLS) estimator of b as
Xi0yi
!
ð7:16Þ
For computing ^b b using matrix language programming, it is sometimes useful to write
^¼ ðX0XÞ1X0Y, where X 1ðX10; X20; ; XN0Þ0 is the NG K matrix of stacked X
and Y 1ðy10; y20; ; yN0Þ0is the NG 1 vector of stacked observations on the yi Forasymptotic derivations, equation (7.16) is much more convenient In fact, the con-sistency of ^b b can be read o¤ of equation (7.16) by taking probability limits We
summarize with a theorem:
theorem 7.1 (Consistency of System OLS): Under Assumptions SOLS.1 andSOLS.2, ^b !p b.
It is useful to see what the system OLS estimator looks like for the SUR and paneldata examples
Example 7.1 (SUR, continued): For the SUR model,
Trang 8Xi0yi¼XN i¼1
xi10 yi1
xi20 yi2
xiG0 yiG
0BB
@
1CCA
Straightforward inversion of a block diagonal matrix shows that the OLS estimatorfrom equation (7.16) can be written as ^b¼ ð ^b10; ^b20; ; ^bG0Þ0, where each ^bg is just thesingle-equation OLS estimator from the gth equation In other words, system OLS
estimation of a SUR model (without restrictions on the parameter vectors bg) isequivalent to OLS equation by equation Assumption SOLS.2 is easily seen to hold ifEðxig0xigÞ is nonsingular for all g
Example 7.2 (Panel Data, continued): In the panel data case,
XT t¼1
XT t¼1
EðPT
t¼1xit0xitÞ ¼ K
In the general system (7.9), the system OLS estimator does not necessarily have aninterpretation as OLS equation by equation or as pooled OLS As we will see inSection 7.7 for the SUR setup, sometimes we want to impose cross equation restric-
tions on the bg, in which case the system OLS estimator has no simple interpretation.While OLS is consistent under Assumptions SOLS.1 and SOLS.2, it is not neces-sarily unbiased Assumption (7.13), and the finite sample assumption rankðX0XÞ ¼
K, do ensure unbiasedness of OLS conditional on X [This conclusion follows cause, under independent sampling, Eðu j X ; X ; ; X Þ ¼ Eðu j XÞ ¼ 0 under as-
Trang 9be-sumption (7.13).] We focus on the weaker Asbe-sumption SOLS.1 because asbe-sumption(7.13) is often violated in economic applications, something we will see especially inour panel data analysis.
For inference, we need to find the asymptotic variance of the OLS estimator underessentially the same two assumptions; technically, the following derivation requiresthe elements of Xi0uiui0Xito have finite expected absolute value From (7.16) and (7.9)write
ð ^b bÞ that is a nonrandom linear combination of a
par-tial sum that satisfies the CLT Equations (7.18) and (7.20) and the asymptoticequivalence lemma imply
ffiffiffiffiffi
N
p
We summarize with a theorem
theorem 7.2 (Asymptotic Normality of SOLS): Under Assumptions SOLS.1 andSOLS.2, equation (7.21) holds
Trang 10The asymptotic variance of ^b b is
Xi0^i^i0Xi
!
XN i¼1
H0: bj¼ 0 Sometimes the t statistics are treated as being distributed as tNGK, which
is asymptotically valid because NG K should be large
The estimator in matrix (7.26) is another example of a robust variance matrix mator because it is valid without any second-moment assumptions on the errors ui(except, as usual, that the second moments are well defined) In a multivariate setting
esti-it is important to know what this robustness allows First, the G G unconditional
variance matrix, W 1 Eðuiui0Þ, is entirely unrestricted This fact allows cross equationcorrelation in an SUR system as well as di¤erent error variances in each equation
In panel data models, an unrestricted W allows for arbitrary serial correlation and
Trang 11time-varying variances in the disturbances A second kind of robustness is that theconditional variance matrix, Varðuij XiÞ, can depend on Xi in an arbitrary, unknownfashion The generality a¤orded by formula (7.26) is possible because of the N! y
asymptotics
In special cases it is useful to impose more structure on the conditional and conditional variance matrix of ui in order to simplify estimation of the asymptoticvariance We will cover an important case in Section 7.5.2 Essentially, the key re-striction will be that the conditional and unconditional variances of uiare the same.There are also some special assumptions that greatly simplify the analysis of thepooled OLS estimator for panel data; see Section 7.8
un-7.3.3 Testing Multiple Hypotheses
Testing multiple hypotheses in a very robust manner is easy once ^VV in matrix (7.26)has been obtained The robust Wald statistic for testing H0: Rb¼ r, where R is Q Kwith rank Q and r is Q 1, has its usual form, W ¼ ðR ^b rÞ0ðR ^VVR0Þ1ðR ^b rÞ.Under H0, W @a w2
Q In the SUR case this is the easiest and most robust way oftesting cross equation restrictions on the parameters in di¤erent equations using sys-tem OLS In the panel data setting, the robust Wald test provides a way of testing
multiple hypotheses about b without assuming homoskedasticity or serial
indepen-dence of the errors
7.4 Consistency and Asymptotic Normality of Generalized Least Squares
7.4.1 Consistency
System OLS is consistent under fairly weak assumptions, and we have seen how toperform robust inference using OLS If we strengthen Assumption SOLS.1 and addassumptions on the conditional variance matrix of ui, we can do better using a gen-eralized least squares procedure As we will see, GLS is not usually feasible because itrequires knowing the variance matrix of the errors up to a multiplicative constant.Nevertheless, deriving the consistency and asymptotic distribution of the GLS esti-mator is worthwhile because it turns out that the feasible GLS estimator is asymp-totically equivalent to GLS
We start with the model (7.9), but consistency of GLS generally requires a strongerassumption than Assumption SOLS.1 We replace Assumption SOLS.1 with the as-sumption that each element of ui is uncorrelated with each element of Xi We canstate this succinctly using the Kronecker product:
Trang 12assumptionSGLS.1: EðXin uiÞ ¼ 0.
Typically, at least one element of Xi is unity, so in practice Assumption SGLS.1implies that EðuiÞ ¼ 0 We will assume uihas a zero mean for our discussion but not
in proving any results
Assumption SGLS.1 plays a crucial role in establishing consistency of the GLSestimator, so it is important to recognize that it puts more restrictions on the ex-planatory variables than does Assumption SOLS.1 In other words, when we allowthe explanatory variables to be random, GLS requires a stronger assumption thansystem OLS in order to be consistent Su‰cient for Assumption SGLS.1, but notnecessary, is the zero conditional mean assumption (7.13) This conclusion followsfrom a standard iterated expectations argument
For GLS estimation of multivariate equations with i.i.d observations, the moment matrix of ui plays a key role Define the G G symmetric, positive semi-definite matrix
As mentioned in Section 7.3.2, we call W the unconditional variance matrix of ui [Inthe rare case that EðuiÞ 0 0, W is not the variance matrix of ui, but it is always theappropriate matrix for GLS estimation.] It is important to remember that expression(7.27) is definitional: because we are using random sampling, the unconditional vari-ance matrix is necessarily the same for all i
In place of Assumption SOLS.2, we assume that a weighted version of the expectedouter product of Xi is nonsingular
assumptionSGLS.2: W is positive definite and EðXi0W1XiÞ is nonsingular
For the general treatment we assume that W is positive definite, rather than justpositive semidefinite In applications where the dependent variables across equationssatisfy an adding up constraint—such as expenditure shares summing to unity—anequation must be dropped to ensure that W is nonsingular, a topic we return to inSection 7.7.3 As a practical matter, Assumption SGLS.2 is not very restrictive Theassumption that the K K matrix EðXi0W1XiÞ has rank K is the analogue of As-sumption SOLS.2
The usual motivation for the GLS estimator is to transform a system of equationswhere the error has nonscalar variance-covariance matrix into a system where theerror vector has a scalar variance-covariance matrix We obtain this by multiplyingequation (7.9) by W1=2:
Trang 13this estimator b Then
X0i yi
!
¼ XN i¼1
Xi0W1Xi
!1
XN i¼1
We can write b using full matrix notation as b¼ ½X0ðINnW1ÞX1
½X0ðINnW1ÞY, where X and Y are the data matrices defined in Section 7.3.2 and
IN is the N N identity matrix But for establishing the asymptotic properties of b,
it is most convenient to work with equation (7.29)
We can establish consistency of b under Assumptions SGLS.1 and SGLS.2 bywriting
Now we must show that plim N1PN
i¼1Xi0W1ui¼ 0 By the WLLN, it is su‰cientthat EðXi0W1uiÞ ¼ 0 This is where Assumption SGLS.1 comes in We can argue thispoint informally because W1Xiis a linear combination of Xi, and since each element
of Xiis uncorrelated with each element of ui, any linear combination of Xi is related with ui We can also show this directly using the algebra of Kronecker prod-ucts and vectorization For conformable matrices D, E, and F, recall that vecðDEFÞ
uncor-¼ ðF0n DÞ vecðEÞ, where vecðCÞ is the vectorization of the matrix C [That is, vecðCÞ
is the column vector obtained by stacking the columns of C from first to last; seeTheil (1983).] Therefore, under Assumption SGLS.1,
vec EðXi0W1uiÞ ¼ E½ðu0
in Xi0Þ vecðW1Þ ¼ E½ðuin XiÞ0 vecðW1Þ ¼ 0
Trang 14where we have also used the fact that the expectation and vec operators can beinterchanged We can now read the consistency of the GLS estimator o¤ of equation(7.30) We do not state this conclusion as a theorem because the GLS estimator itself
is rarely available
The proof of consistency that we have sketched fails if we only make AssumptionSOLS.1: EðXi0uiÞ ¼ 0 does not imply EðXi0W1uiÞ ¼ 0, except when W and Xi havespecial structures If Assumption SOLS.1 holds but Assumption SGLS.1 fails, thetransformation in equation (7.28) generally induces correlation between Xi and ui.This can be an important point, especially for certain panel data applications If we
are willing to make the zero conditional mean assumption (7.13), bcan be shown to
be unbiased conditional on X
7.4.2 Asymptotic Normality
We now sketch the asymptotic normality of the GLS estimator under AssumptionsSGLS.1 and SGLS.2 and some weak moment conditions The first step is familiar:ffiffiffiffiffi
Trang 157.5 Feasible GLS
7.5.1 Asymptotic Properties
Obtaining the GLS estimator b requires knowing W up to scale That is, we must beable to write W¼ s2C where C is a known G G positive definite matrix and s2 isallowed to be an unknown constant Sometimes C is known (one case is C¼ IG), butmuch more often it is unknown Therefore, we now turn to the analysis of feasibleGLS (FGLS) estimation
In FGLS estimation we replace the unknown matrix W with a consistent estimator.Because the estimator of W appears highly nonlinearly in the expression for theFGLS estimator, deriving finite sample properties of FGLS is generally di‰cult.[However, under essentially assumption (7.13) and some additional assumptions,including symmetry of the distribution of ui, Kakwani (1967) showed that the distri-
bution of the FGLS is symmetric about b, a property which means that the FGLS
is unbiased if its expected value exists; see also Schmidt (1976, Section 2.5).] Theasymptotic properties of the FGLS estimator are easily established as N ! y be-
cause, as we will show, its first-order asymptotic properties are identical to those ofthe GLS estimator under Assumptions SGLS.1 and SGLS.2 It is for this purposethat we spent some time on GLS After establishing the asymptotic equivalence, wecan easily obtain the limiting distribution of the FGLS estimator Of course, GLS istrivially a special case of FGLS, where there is no first-stage estimation error
We assume we have a consistent estimator, ^WW, of W:
estimator of b, which we denote b^ b b in this section to avoid confusion We already^
showed that b^ b b is consistent for b under Assumptions SOLS.1 and SOLS.2, and^
therefore under Assumptions SGLS.1 and SOLS.2 (In what follows, we assume thatAssumptions SOLS.2 and SGLS.2 both hold.) By the WLLN, plimðN1PN
Trang 16where ^^i1 yi Xi^
b^
b
b are the SOLS residuals We can show that this estimator is
sistent for W under Assumptions SGLS.1 and SOLS.2 and standard moment ditions First, write
so that
^i^0
i ¼ uiui0 uiðb^ b^ bÞ0Xi0 Xiðb^ b^ bÞui0þ Xiðb^ b^ bÞð b^ b^ bÞ0Xi0 ð7:39ÞTherefore, it su‰ces to show that the averages of the last three terms converge inprobability to zero Write the average of the vec of the first term as N1PN
i¼1ðXin uiÞ
ðb^ b^ bÞ, which is opð1Þ because plimðb^ b^ bÞ ¼ 0 and N1PN
i¼1ðXin uiÞ !p 0 Thethird term is the transpose of the second For the last term in equation (7.39), notethat the average of its vec can be written as
a di¤erent estimator of W is often used that exploits these restrictions As with ^W
in equation (7.37), such estimators typically use the system OLS residuals in somefashion and lead to consistent estimators assuming the structure of W is correctlyspecified The advantage of equation (7.37) is that it is consistent for W quite gener-ally However, if N is not very large relative to G, equation (7.37) can have poor finitesample properties
Given ^WW, the feasible GLS (FGLS) estimator of b is
Trang 17We have already shown that the (infeasible) GLS estimator is consistent underAssumptions SGLS.1 and SGLS.2 Because ^WW converges to W, it is not surprisingthat FGLS is also consistent Rather than show this result separately, we verify thestronger result that FGLS has the same limiting distribution as GLS.
The limiting distribution of FGLS is obtained by writing
Xi0W^1ui
!
ð7:43ÞNow
Xi0W1ui
!
þ opð1Þ ð7:44ÞThe first term in equation (7.44) is just ffiffiffiffiffi
N
p
ðb bÞ, where b is the GLS estimator
We can write equation (7.44) as
statement is much stronger than simply saying that b and ^b b are both consistent for
b There are many estimators, such as system OLS, that are consistent for b but are
not ffiffiffiffiffi
N
p
-equivalent to b
The asymptotic equivalence of ^b b and bhas practically important consequences The
most important of these is that, for performing asymptotic inference about b using
^
b, we do not have to worry that ^WW is an estimator of W Of course, whether theasymptotic approximation gives a reasonable approximation to the actual distribu-tion of ^b b is di‰cult to tell With large N, the approximation is usually pretty good.
Trang 18But if N is small relative to G, ignoring estimation of W in performing inference
about b can be misleading.
We summarize the limiting distribution of FGLS with a theorem
theorem 7.3 (Asymptotic Normality of FGLS): Under Assumptions SGLS.1 andSGLS.2,
ffiffiffiffiffi
N
p
where A is defined in equation (7.31) and B is defined in equation (7.33)
In the FGLS context a consistent estimator of A is
Xi0W^1^i^i0W^1Xi
!
XN i¼1
Xi0W^1Xi
!1ð7:49Þ
This is the extension of the White (1980b) heteroskedasticity-robust asymptotic ance estimator to the case of systems of equations; see also White (1984) This esti-mator is valid under Assumptions SGLS.1 and SGLS.2; that is, it is completelyrobust
vari-7.5.2 Asymptotic Variance of FGLS under a Standard Assumption
Under the assumptions so far, FGLS really has nothing to o¤er over SOLS In dition to being computationally more di‰cult, FGLS is less robust than SOLS Sowhy is FGLS used? The answer is that, under an additional assumption, FGLS is
Trang 19ad-asymptotically more e‰cient than SOLS (and other estimators) First, we state theweakest condition that simplifies estimation of the asymptotic variance for FGLS.For reasons to be seen shortly, we call this a system homoskedasticity assumption.assumptionSGLS.3: EðXi0W1uiui0W1XiÞ ¼ EðXi0W1XiÞ, where W 1 Eðuiui0Þ.Another way to state this assumption is, B¼ A, which, from expression (7.46), sim-plifies the asymptotic variance As stated, Assumption SGLS.3 is somewhat di‰cult
to interpret When G¼ 1, it reduces to Assumption OLS.3 When W is diagonal and
Xihas either the SUR or panel data structure, Assumption SGLS.3 implies a kind ofconditional homoskedasticity in each equation (or time period) Generally, Assump-tion SGLS.3 puts restrictions on the conditional variances and covariances of ele-ments of ui A su‰cient (though certainly not necessary) condition for AssumptionSGLS.3 is easier to interpret:
If Eðuij XiÞ ¼ 0, then assumption (7.50) is the same as assuming Varðuij XiÞ ¼VarðuiÞ ¼ W, which means that each variance and each covariance of elementsinvolving uimust be constant conditional on all of Xi This is a very natural way ofstating a system homoskedasticity assumption, but it is sometimes too strong.When G¼ 2, W contains three distinct elements, s2¼ Eðu2
i1Þ, s2¼ Eðu2
i2Þ, and
s12¼ Eðui1ui2Þ These elements are not restricted by the assumptions we have made.(The inequality js12j < s1s2 must always hold for W to be a nonsingular covariancematrix.) However, assumption (7.50) requires Eðu2
i1j XiÞ ¼ s2
1, Eðu2 i2j XiÞ ¼ s2
2, andEðui1ui2j XiÞ ¼ s12: the conditional variances and covariance must not depend on Xi.That assumption (7.50) implies Assumption SGLS.3 is a consequence of iteratedexpectations:
EðXi0W1uiui0W1XiÞ ¼ E½EðXi0W1uiui0W1Xij XiÞ
¼ E½Xi0W1Eðuiui0j XiÞW1Xi ¼ EðXi0W1WW1XiÞ
¼ EðXi0W1XiÞWhile assumption (7.50) is easier to intepret, we use Assumption SGLS.3 for statingthe next theorem because there are cases, including some dynamic panel data models,where Assumption SGLS.3 holds but assumption (7.50) does not
theorem 7.4 (Usual Variance Matrix for FGLS): Under Assumptions SGLS.1–SGLS.3, the asymptotic variance of the FGLS estimator is Avarð ^bÞ ¼ A1=N 1
½EðX0W1XiÞ1=N
Trang 20We obtain an estimator of Avarð ^bÞ by using our consistent estimator of A:
Assumption (7.50) also has important e‰ciency implications One consequence ofProblem 7.2 is that, under Assumptions SGLS.1, SOLS.2, SGLS.2, and (7.50), theFGLS estimator is more e‰cient than the system OLS estimator We can actually saymuch more: FGLS is more e‰cient than any other estimator that uses the ortho-gonality conditions EðXin uiÞ ¼ 0 This conclusion will follow as a special case ofTheorem 8.4 in Chapter 8, where we define the class of competing estimators If
we replace Assumption SGLS.1 with the zero conditional mean assumption (7.13),then an even stronger e‰ciency result holds for FGLS, something we treat inSection 8.6
7.6 Testing Using FGLS
Asymptotic standard errors are obtained in the usual fashion from the asymptoticvariance estimates We can use the nonrobust version in equation (7.51) or, evenbetter, the robust version in equation (7.49), to construct t statistics and confidenceintervals Testing multiple restrictions is fairly easy using the Wald test, which alwayshas the same general form The important consideration lies in choosing the asymp-totic variance estimate, ^VV Standard Wald statistics use equation (7.51), and thisapproach produces limiting chi-square statistics under the homoskedasticity assump-tion SGLS.3 Completely robust Wald statistics are obtained by choosing ^VV as inequation (7.49)
If Assumption SGLS.3 holds under H0, we can define a statistic based on theweighted sums of squared residuals To obtain the statistic, we estimate the model
with and without the restrictions imposed on b, where the same estimator of W,
usu-ally based on the unrestricted SOLS residuals, is used in obtaining the restricted andunrestricted FGLS estimators Let ~ui denote the residuals from constrained FGLS(with Q restrictions imposed on ~b b) using variance matrix ^WW It can be shown that,under H0and Assumptions SGLS.1–SGLS.3,