The Generalized Instrumental Variables Estimator

Một phần của tài liệu A guide to modern econometrics, 5th edition (Trang 177 - 185)

In Section 5.3 we considered the linear model where for each explanatory variable exactly one instrument is available, which could equal the variable itself if it were assumed exoge- nous. In this section we generalize this by allowing the use of an arbitrary number of instruments.

5.6.1 Multiple Endogenous Regressors with an Arbitrary Number of Instruments

Let us, in general, consider the following model

yi=xi𝛽+𝜀i, (5.54)

wherexiis of dimensionK. The OLS estimator is based upon theKmoment conditions E{𝜀ixi} =E{(yixi𝛽)xi} =0.

More generally, let us assume that there areRinstruments available in the vectorzi, which may overlap with xi. The relevant moment conditions are then given by the following Rrestrictions

E{𝜀izi} =E{(yixi𝛽)zi} =0. (5.55)

k k IfR=K, we are back in the previous situation and the instrumental variables estimator

can be solved from the sample moment conditions 1

N

N i=1

(yixî𝛽IV)zi=0 and we obtain

̂𝛽IV = ( N

i=1

zixi )−1∑N

i=1

ziyi.

If the model is written in matrix notation

y=X𝛽+𝜀

and the matrixZ is the N×R matrix of values for the instruments, this instrumental variables estimator can also be written as

̂𝛽IV = (ZX)−1Zy. (5.56)

IfR>Kthere are more instruments than regressors. In this case it is not possible to solve for an estimate of𝛽by replacing (5.55) with its sample counterpart. The reason for this is that there would be more equations than unknowns. Instead of dropping instruments (and losing efficiency), one therefore chooses𝛽in such a way that theRsample moments

1 N

N i=1

(yixi𝛽)zi

are as close as possible to zero. This is done by minimizing the following quadratic form QN(𝛽) =

[ 1 N

N i=1

(yixi𝛽)zi ]

WN [

1 N

N i=1

(yixi𝛽)zi ]

, (5.57)

where WN is an R×R positive definite symmetric matrix. This matrix is a weighting matrix and tells us how much weight to attach to which (linear combinations of the) sample moments. In general it may depend upon the sample sizeNbecause it may itself be an estimate. For the asymptotic properties of the resulting estimator for𝛽, the probability limit ofWN, denoted byW=plimWN, is important. This matrixWshould be positive definite and symmetric. Using matrix notation for convenience, we can rewrite (5.57) as

QN(𝛽) = [1

NZ(yX𝛽) ]

WN [1

NZ(yX𝛽) ]

. (5.58)

Differentiating this with respect to𝛽(see Appendix A) gives the first-order conditions

−2XZWNZy+2XZWNZX̂𝛽IV=0, which in turn imply

XZWNZy=XZWNZX̂𝛽IV. (5.59)

k k

THE GENERALIZED INSTRUMENTAL VARIABLES ESTIMATOR 165

This is a system with K equations andK unknown elements in ̂𝛽IV, where XZ is of dimensionK×RandZyisR×1. Provided the matrixXZis of rankK, the solution to (5.59) is

̂𝛽IV = (XZWNZX)−1XZWNZy, (5.60) which, in general, depends upon the weighting matrixWN.

IfR=K, the matrixXZis square and (by assumption) invertible. This allows us to write

̂𝛽IV= (ZX)−1WN−1(XZ)−1XZWNZy

= (ZX)−1Zy,

which corresponds to (5.56), the weighting matrix being irrelevant. In this situation, the number of moment conditions is exactly equal to the number of parameters to be esti- mated. One can think of this as a situation where 𝛽is ‘exactly identified’ because we have just enough information (i.e. moment conditions) to estimate𝛽. An immediate con- sequence of this is that the minimum of (5.58) is zero, implying that all sample moments can be set to zero by choosing𝛽appropriately. That is,QN(̂𝛽IV)is equal to zero. In this case ̂𝛽IVdoes not depend uponWNand the same estimator is obtained regardless of the choice of weighting matrix.

IfR<K, the number of parameters to be estimated exceeds the number of moment conditions. In this case𝛽is ‘underidentified’ (not identified) because there is insufficient information (i.e. moment conditions) from which to estimate 𝛽 uniquely. Technically, this means that the inverse in (5.60) does not exist, and an infinite number of solutions satisfy the first-order conditions in (5.59). Unless we can come up with additional moment conditions, this identification problem is fatal in the sense that no consistent estimator for 𝛽exists. Any estimator is necessarily inconsistent.

IfR>K, the number of moment conditions exceeds the number of parameters to be estimated. As a result,𝛽is ‘overidentified’ because there is more information than is nec- essary to obtain a consistent estimate of𝛽. In this case we have a range of estimators for𝛽, corresponding to alternative choices for the weighting matrixWN. As long as the weight- ing matrix is (asymptotically) positive definite, the resulting estimators are all consistent for𝛽. The idea behind the consistency result is that we are minimizing a quadratic loss function in a set of sample moments that asymptotically converge to the corresponding population moments, which are equal to zero for the true parameter values. This is the basic principle behind the so-called method of moments, which will be discussed in more detail in Section 5.8.

Different weighting matricesWNlead to different consistent estimators with generally different asymptotic covariance matrices. This allows us to choose an optimal weighting matrix that leads to the most efficient instrumental variables estimator. It can be shown that the optimal weighting matrix is proportional to the inverse of the covariance matrix of the sample moments. Intuitively, this means that sample moments with a small variance, which consequently provide accurate information about the unknown parameters in𝛽, get more weight in estimation than the sample moments with a large variance. Essentially, this is the same idea as the weighted least squares approach discussed in Chapter 4,

k k albeit that the weights now reflect different sample moments rather than different

observations.

Of course the covariance matrix of the sample moments 1

N

N i=1

𝜀izi

depends upon the assumptions we make about𝜀i andzi. If, as before, we assume that 𝜀i isIID(0, 𝜎2)and independent ofzi, the asymptotic covariance matrix of the sample moments is given by

𝜎zz=𝜎2plim 1 N

N i=1

zizi.

Consequently, an optimal weighting matrix is obtained as WNopt=

( 1 N

N i=1

zizi )−1

= (1

NZZ )−1

, and the resulting IV estimator is

̂𝛽IV = (XZ(ZZ)−1ZX)−1XZ(ZZ)−1Zy. (5.61) This is the expression that is found in most textbooks (see, e.g., Greene, 2012, Section 8.3). The estimator is sometimes referred to as thegeneralized instrumental variables estimator(GIVE). It is also known as the two-stage least squares or 2SLS estimator (see below). If𝜀iis heteroskedastic or exhibits autocorrelation, the optimal weighting matrix should be adjusted accordingly. How this is done follows from the general discussion in Section 5.8.

The asymptotic distribution of ̂𝛽IV is given by

√N(̂𝛽IV𝛽)→N(0, 𝜎2(ΣxzΣ−1zzΣzx)−1),

which is the same expression as given in Section 5.3. The only difference is in the dimen- sions of the matricesΣxzandΣzz. An estimator for the covariance matrix is easily obtained by replacing the asymptotic limits with their small-sample counterparts. This gives

V{̂ ̂𝛽IV} = ̂𝜎2(XZ(ZZ)−1ZX)−1, (5.62) where the estimator for𝜎2is obtained from the IV residuals ̂𝜀i=yixî𝛽IV as

̂𝜎2= 1 NK

N i=1

̂𝜀2i.

Starting from (5.61) is it also relatively easy to derive the asymptotic covariance matrix of ̂𝛽IV in the case where the error terms are not homoskedastic. A heteroskedasticity- consistent covariance matrix can be estimated in a similar fashion as discussed in Subsection 4.3.4 (see Davidson and MacKinnon, 2004, Section 8.5).

k k

THE GENERALIZED INSTRUMENTAL VARIABLES ESTIMATOR 167

5.6.2 Two-stage Least Squares and the Keynesian Model Again

The estimator in (5.61) is often used in the context of a simultaneous equations system and then has the name of thetwo-stage least squares (2SLS) estimator. Essentially, this interpretation says that the same estimator can be obtained in two steps, both of which can be estimated by least squares. In the first step the reduced form is estimated by OLS (i.e.

a regression of each endogenous regressor upon all instruments). In the second step the original structural equations are estimated by OLS, while replacing all endogenous vari- ables on the right-hand side with their predicted values from the reduced form equations.

To illustrate this, let the reduced form of the kth explanatory variable be given by (in vector notation)

xk=Z𝜋k+𝑣k.

OLS in this equation produces predicted valuesk=Z(ZZ)−1Zxk. Ifxkis a column in Z, we will automatically havek=xk. Consequently, the matrix of explanatory variables in the second step can be written as which has the columnsk,k=1, . . . ,K, where

=Z(ZZ)−1ZX. The OLS estimator in the second step is thus given by

̂𝛽IV = (X)̂ −1y, (5.63) which can easily be shown to be identical to (5.61). The advantage of this approach is that the estimator can be computed using standard OLS software. In the second step, OLS is applied to the original model where all endogenous regressors are replaced by their pre- dicted values on the basis of the instruments. It is a common mistake that the instruments themselves are included in the second stage. This is incorrect. One should include the fitted values from the reduced forms, which are linear combinations of all instruments.

While the two-stage approach reproduces the IV estimator, the second stage does not automatically provide the correct standard errors (see Maddala and Lahiri, 2009, Section 9.6, for details).

The use of also allows us to write the generalized instrumental variables estimator in terms of the standard formula in (5.56) if we redefine our matrix of instruments. If we use theKcolumns of as instruments in the standard formula (5.56), we obtain

̂𝛽IV = (X)−1y,

which is identical to (5.61). It shows that one can also interpret as the matrix of instru- ments (which is sometimes done).

To go back to our Keynesian model, let us now assume that the economy includes a government and a private sector, with private investmentz2tand government expenditures z3t, both of which are assumed exogenous. The definition equation now reads

x2t=yt+z2t+z3t.

This implies that both z2t and z3t are now valid instruments to use for income x2t in the consumption function. Although it is possible to define simple IV estimators sim- ilarly to (5.51) using either z2t or z3t as instrument, the most efficient estimator uses

k k both instruments simultaneously. The generalized instrumental variables estimator is thus

given by

̂𝛽IV = (XZ(ZZ)−1ZX)−1XZ(ZZ)−1Zy,

where the rows inZandXare given byzt= (1,z2t,z3t)andxt= (1,x2t), respectively.

5.6.3 Specification Tests

The results on consistency and the asymptotic distribution of the generalized instru- mental variables estimator are based on the assumption that the model is correctly specified. As the estimator is only based on the model’s moment conditions, it is required that the moment conditions be correct. It is therefore important to test whether the data are consistent with these moment conditions. In the ‘exactly identified’ case, (1∕N)∑

î𝜀izi=0 by construction, regardless of whether or not the population moment conditions are true. Consequently, one cannot derive a useful test from the corresponding sample moments. Put differently, theseK=R identifying restrictions are not testable.

However, if 𝛽 is overidentified, it is clear that only K (linear combinations) of the R elements in (1∕N)∑

i ̂𝜀izi are set equal to zero. If the population moment conditions were true, one would expect the elements in the vector(1∕N)∑

i ̂𝜀iziall to be sufficiently close to zero (as they should converge to zero asymptotically). This provides a basis for a test of the model specification. It can be shown that (under (5.55)) the statistic (based on the GIV estimator with the optimal weighting matrix)

𝜉=NQN(̂𝛽IV) = ( N

i=1

̂𝜀izi )(

̂𝜎2

N i=1

zizi

)−1( N

i=1

̂𝜀izi )

(5.64) has an asymptotic Chi-squared distribution withRK degrees of freedom. Note that the number of degrees of freedom equals the number of moment conditions minus the number of parameters to be estimated. This is the case because onlyRKof the sample moment conditions(1∕N)∑

i ̂𝜀iziare free on account of theKrestrictions imposed by the first-order conditions for ̂𝛽IVin (5.59). A test based on (5.64) is usually referred to as an overidentifying restrictions testor Sargan test. A simple way to compute (5.64) is by takingN times theR2of an auxiliary regression of IV residuals ̂𝜀i upon the full set of instrumentszi. If the test rejects, the specification of the model is rejected in the sense that the sample evidence is inconsistent with the joint validity of allRmoment conditions.

Without additional information it is not possible to determine which of the moments are incorrect, that is, which of the instruments are invalid.13Roberts and Whited (2013) are therefore critical on the usefulness of this test because it assumes that a sufficient number of instruments are valid, yet which ones and why is left unspecified. Moreover, the test may lack power if many instruments are used that are uncorrelated with𝜀ibut add little explanatory power to the reduced forms.

If a subset of the instruments is known to satisfy the moment conditions, it is possible to test the validity of the remaining instruments or moments provided that the model is identified on the basis of the nonsuspect instruments. Assume thatR1 ≥Kmoment con- ditions are nonsuspect and we want to test the validity of the remainingRR1moment

13Suppose a pub allows you to buy three beers but pay for only two. Can you tell which of the three beers is the free one?

k k

THE GENERALIZED INSTRUMENTAL VARIABLES ESTIMATOR 169

conditions. To compute the test statistic, estimate the model using allRinstruments and compute the overidentifying restrictions test statistic𝜉. Next, estimate the model using only the R1 nonsuspect instruments. Typically, this will lead to a lower value for the overidentifying restrictions test,𝜉1, say. The test statistic to test the suspect moment con- ditions is easily obtained as𝜉𝜉1, which, under the null hypothesis, has an approximate Chi-squared distribution withRR1degrees of freedom. In the special case thatR1 =K, this test reduces to the overidentifying restrictions test in (5.64), and the test statistic is independent of the choice of theR1instruments that are said to be nonsuspect.

5.6.4 Weak Instruments

A problem with instrumental variables estimation that has received considerable atten- tion recently is that of ‘weak instruments’. The problem is that the properties of the IV estimator can be very poor, and the estimator can be severely biased, if the instruments exhibit only weak correlation with the endogenous regressor(s). In these cases, the nor- mal distribution provides a very poor approximation to the true distribution of the IV estimator, even if the sample size is large. As a result, the standard IV estimator is biased, its standard errors are misleading and hypothesis tests are unreliable. To illustrate the problem, let us consider the IV estimator for the case of a single regressor and a con- stant. Ifi=xidenotes the regressor values in deviation from the sample mean, and similarly foriand̃zi, the IV estimator for𝛽2can be written as (compare (5.51))

̂𝛽2,IV = (1∕N)∑N i=1̃zĩyi (1∕N)∑N

i=1̃zĩxi.

If the instrument is valid (and under weak regularity conditions), the estimator is consis- tent and converges to

𝛽2 = cov{zi,yi} cov{zi,xi}.

However, if the instrument is not correlated with the regressor, the denominator of this expression is zero. In this case, the IV estimator is inconsistent and the asymptotic distri- bution of ̂𝛽2,IV deviates substantially from a normal distribution. The instrument is weak if there is some correlation betweenziandxi, but not enough to make the asymptotic nor- mal distribution provide a good approximation in finite (potentially very large) samples.

For example, Bound, Jaeger and Baker (1995) show that part of the results of Angrist and Krueger (1991), who use quarter of birth to instrument for schooling in a wage equation, suffers from the weak instruments problem. Even with samples of more than 300 000 (!) individuals, the IV estimator appeared to be unreliable and misleading.

To figure out whether you have weak instruments, it is useful to examine the reduced- form regression and evaluate the explanatory power of the additional instruments that are not included in the equation of interest. Consider the linear model with one endogenous regressor

yi=x1i𝛽1+x2i𝛽2+𝜀i,

where E{x1i𝜀i} =0 and where additional instrumentsz2i(forx2i) satisfyE{z2i𝜀i} =0.

The appropriate reduced form is given by

x2i=x1i𝜋1+z2i𝜋2+𝑣i.

k k If𝜋2=0, the instruments inz2iare irrelevant and the IV estimator is inconsistent. If𝜋2

is ‘close to zero’, the instruments are weak. The value of theF-statistic for 𝜋2 =0 is a measure for the information content contained in the instruments. Staiger and Stock (1997) provide a theoretical analysis of the properties of the IV estimator and provide some guidelines about how large theF-statistic should be for the IV estimator to have good properties. As a simple rule-of-thumb, Stock and Watson (2007, Chapter 12) sug- gest that you do not have to worry about weak instruments if theF-statistic exceeds 10. The implicit null hypothesis here is not that𝜋2 =0, but that the bias in the result- ing IV estimator is ‘small’. Stock and Yogo (2005) show that critical values larger than 10 are appropriate when there are more than two instruments. In any case, it is a good practice to compute and present theF-statistic of the reduced form in empirical work.

If theF-statistic for the significance of the instruments in the reduced form is too small, you should not put much confidence in the IV results. If you have many instruments available, it may be a good strategy to use the most relevant subset and drop the ‘weak’

ones. Donald and Newey (2001) propose a way to choose among many valid instru- ments by minimizing the (finite sample) mean square error of the estimator. Cameron and Trivedi (2005, Subsection 6.4.4) discuss leading alternative estimators that have received renewed interest given the poor finite-sample properties of the standard IV estimator with weak instruments. See also Stock, Wright and Yogo (2002), Hahn and Hausman (2003) and Stock and Yogo (2005) for more discussion. Hahn, Han and Moon (2011) show that the standard Hausman test of Subsection 5.3.1 is invalid in the case of weak instruments, and provide an alternative version that is valid even when the instruments are weak.

5.6.5 Implementing and Reporting Instrumental Variables Estimators Clearly, using instrumental variables estimators rather than OLS is more involved than pressing another button in Eviews or Stata, and writing a paper stating that you ‘addressed the endogeneity problem by using instrumental variables’, without further explanation or details, is not acceptable. A first step, recommended by Larcker and Rusticus (2010) is to describe the economic theories the research questions are based on. For example, the endogeneity problem could be due to an important control variable that is not avail- able (a confounding variable), the regressor of interest could be the outcome of a choice that individuals or firms are making, partly based upon the costs and benefits of such a choice, the direction of causality could be unclear, or there may be good reason to suspect measurement errors. With a more detailed description of the endogeneity problem, its background and potential alternative theories, a researcher is better equipped to select an empirical approach, and readers are more able to evaluate whether the approach is appro- priate. As stated by Roberts and Whited (2013), the only way to find a good instrument is to understand the economics of the question at hand.

An obvious requirement in an empirical study is to state explicitly what the instruments are. This sounds trivial, but this is often overlooked, implicit or hidden in an appendix.

There should also be a discussion of why these instruments are valid, most importantly why they would satisfy the exogeneity requirement. It is rarely the case that instruments are entirely convincing, in the sense that all potential reviewers and discussants would accept them, but that does not imply that one should not try to give convincing arguments.

It is also advisable to anticipate the potential reasons why the instrument is not exogenous

k k

INSTITUTIONS AND ECONOMIC DEVELOPMENT 171

and demonstrate that these effects are either very small or controlled for by inclusion of other variables in the model (see Larcker and Rusticus, 2010).

Another recommendation is to also report the first-stage regression results, like those in Table 5.2, including some relevant statistics. This allows one to see which instruments are weak and which instruments are crucial in driving the results. Importantly, it should be clear from these results that the instruments are relevant. Check, for example, whether theF-test of the instrumental variables exceeds 10. If instruments are only weakly related to the endogenous regressor, instrumental variables estimates will be highly imprecise, or – even worse – suffer from a weak instruments problem. Make sure that the first-stage regression includes all exogenous regressors from the model as well as all instruments.

Third, it is advised to also report OLS results along with the IV ones. This provides a benchmark and allows comparison, for example, to see whether the difference between the results is consistent with the underlying theory and the hypothesized source of endo- geneity. It is typically a bad idea to immediately jump to instrumental variables estimation without having looked at OLS results. Finding that OLS results and IV results are very similar does not necessarily indicate that there are no endogeneity concerns. It could also be that the IV approach is done inappropriately, for example, by using an instrument that is highly correlated with the endogenous regressor and is endogenous itself.

Finally, researchers should provide some robustness checks on the chosen instruments and report tests for appropriateness of the instrumental variables. For example, when relevant, the overidentifying restriction test should be reported, despite its limitations.

Một phần của tài liệu A guide to modern econometrics, 5th edition (Trang 177 - 185)

Tải bản đầy đủ (PDF)

(523 trang)