In the remainder of this chapter, we therefore discuss the method of instrumental variables.This method can be used whenever the error terms are correlated with one or more of the explan
Trang 1Chapter 8 Instrumental Variables
Estimation
8.1 Introduction
In Section 3.3, the ordinary least squares estimator ˆβ was shown to be
consis-tent under condition (3.10), according to which the expectation of the error
term u t associated with observation t is zero conditional on the regressors X t
for that same observation As we saw in Section 4.5, this condition can also
be expressed either by saying that the regressors X t are predetermined or by
saying that the error terms u t are innovations When condition (3.10) doesnot hold, the consistency proof of Section 3.3 is not applicable, and the OLSestimator will, in general, be biased and inconsistent
It is not always reasonable to assume that the error terms are innovations
In fact, as we will see in the next section, there are commonly encounteredsituations in which the error terms are necessarily correlated with some of theregressors for the same observation Even in these circumstances, however, it
is usually possible, although not always easy, to define an information set Ωtfor each observation such that
Trang 2with briefly A more general class of MM estimators, of which both OLS and
IV are special cases, will be the subject of Chapter 9
8.2 Correlation Between Error Terms and Regressors
We now briefly discuss two common situations in which the error terms will
be correlated with the regressors and will therefore not have mean zero ditional on them The first one, usually referred to by the name errors invariables, occurs whenever the independent variables in a regression modelare measured with error The second situation, often simply referred to assimultaneity, occurs whenever two or more endogenous variables are jointlydetermined by a system of simultaneous equations
con-Errors in Variables
For a variety of reasons, many economic variables are measured with error Forexample, macroeconomic time series are often based, in large part, on surveys,and they must therefore suffer from sampling variability Whenever thereare measurement errors, the values economists observe inevitably differ, to agreater or lesser extent, from the true values that economic agents presumablyact upon As we will see, measurement errors in the dependent variable of aregression model are generally of no great consequence, unless they are verylarge However, measurement errors in the independent variables cause theerror terms to be correlated with the regressors that are measured with error,and this causes OLS to be inconsistent
The problems caused by errors in variables can be seen quite clearly in thecontext of the simple linear regression model Consider the model
y ◦ t = β1+ β2x ◦ t + u ◦ t , u ◦ t ∼ IID(0, σ2), (8.02) where the variables x ◦
Here v 1t and v 2t are measurement errors which are assumed, perhaps not
realistically in some cases, to be IID with variances ω2
If we suppose that the true DGP is a special case of (8.02) along with (8.03),
we see from (8.03) that x ◦
t = x t − v 1t and y ◦
t = y t − v 2t If we substitute theseinto (8.02), we find that
y t = β1+ β2(x t − v 1t ) + u ◦
t + v 2t
= β1+ β2x t + u ◦ t + v 2t − β2v 1t
Trang 38.2 Correlation Between Error Terms and Regressors 311
The measurement error in the independent variable also increases the variance
of the error terms, but it has another, much more severe, consequence as well
Because x t = x ◦
t + v 1t , and u t depends on v 1t , u t will be correlated with x t whenever β2 6= 0 In fact, since the random part of x t is v 1t, we see that
E(u t | x t ) = E(u t | v 1t ) = −β2v 1t , (8.05) because we assume that v 1t is independent of u ◦
t and v 2t From (8.05), we can
see, using the fact that E(u t) = 0 unconditionally, that
Cov(x t , u t ) = E(x t u t) = E¡x t E(u t | x t)¢
= −E¡(x ◦ t + v 1t )β2v 1t¢= −β2ω21.
This covariance is negative if β2> 0 and positive if β2 < 0, and, since it does
not depend on the sample size n, it will not go away as n becomes large An exactly similar argument shows that the assumption that E(u t | X t) = 0 is
false whenever any element of X t is measured with error In consequence, theOLS estimator will be biased and inconsistent
Errors in variables are a potential problem whenever we try to estimate aconsumption function, especially if we are using cross-section data Manyeconomic theories (for example, Friedman, 1957) suggest that household con-sumption will depend on “permanent” income or “life-cycle” income, but sur-veys of household behavior almost never measure this Instead, they typically
provide somewhat inaccurate estimates of current income If we think of y t as
measured consumption, x ◦
t as permanent income, and x t as estimated currentincome, then the above analysis applies directly to the consumption function
The marginal propensity to consume is β2, which must be positive, causing
the correlation between u t and x t to be negative As readers are asked to show
in Exercise 8.1, the probability limit of ˆβ2 is less than the true value β20 Inconsequence, the OLS estimator ˆβ2 is biased downward, even asymptotically
Of course, if our objective is simply to estimate the relationship between the
observed dependent variable y t and the observed independent variable x t,there is nothing wrong with using ordinary least squares to estimate equation
(8.04) In that case, u t would simply be defined as the difference between
y t and its expectation conditional on x t But our analysis shows that the
OLS estimators of β1 and β2 in equation (8.04) are not consistent for thecorresponding parameters of equation (8.02) In most cases, it is parameterslike these that we want to estimate on the basis of economic theory
There is an extensive literature on ways to avoid the inconsistency caused byerrors in variables See, among many others, Hausman and Watson (1985),
Trang 4Leamer (1987), and Dagenais and Dagenais (1997) The simplest and mostwidely-used approach is just to use an instrumental variables estimator.
Simultaneous Equations
Economic theory often suggests that two or more endogenous variables aredetermined simultaneously In this situation, as we will see shortly, all of theendogenous variables will necessarily be correlated with the error terms in all
of the equations This means that none of them may validly appear in theregression functions of models that are to be estimated by least squares
A classic example, which well illustrates the econometric problems caused bysimultaneity, is the determination of price and quantity for a commodity at
the partial equilibrium of a competitive market Suppose that q t is quantity
and p t is price, both of which would often be in logarithms A linear (orloglinear) model of demand and supply is
functions, β d and β s are corresponding vectors of parameters, γ d and γ s are
scalar parameters, and u d
t and u s
t are the error terms in the demand and
supply functions Economic theory predicts that, in most cases, γ d < 0 and
γ s > 0, which is equivalent to saying that the demand curve slopes downward
and the supply curve slopes upward
Equations (8.06) and (8.07) are a pair of linear simultaneous equations for
the two unknowns p t and q t For that reason, these equations constitute what
is called a linear simultaneous equations model In this case, there are twodependent variables, quantity and price For estimation purposes, the keyfeature of the model is that quantity depends on price in both equations.Since there are two equations and two unknowns, it is straightforward to solve
equations (8.06) and (8.07) for p t and q t This is most easily done by rewritingthem in matrix notation as
·
u d t
u s t
¸
The solution to (8.08), which will exist whenever γ d 6= γ s, so that the matrix
on the left-hand side of (8.08) is nonsingular, is
·
u d t
u s t
¸!
Trang 58.3 Instrumental Variables Estimation 313
It can be seen from this solution that p t and q t will depend on both u d
t and u s
t,and on every exogenous and predetermined variable that appears in either the
demand function, the supply function, or both Therefore, p t, which appears
on the right-hand side of equations (8.06) and (8.07), must be correlated withthe error terms in both of those equations If we rewrote one or both equations
so that p t was on the left-hand side and q t was on the right-hand side, the
problem would not go away, because q t is also correlated with the error terms
mate the full system of equations, there are many options, some of which will
be discussed in Chapter 12 If we simply want to estimate one equation out
of such a system, the most popular approach is to use instrumental variables
We have discussed two important situations in which the error terms willnecessarily be correlated with some of the regressors, and the OLS estimatorwill consequently be inconsistent This provides a strong motivation to employestimators that do not suffer from this type of inconsistency In the remainder
of this chapter, we therefore discuss the method of instrumental variables.This method can be used whenever the error terms are correlated with one
or more of the explanatory variables, regardless of how that correlation mayhave arisen
8.3 Instrumental Variables Estimation
For most of this chapter, we will focus on the linear regression model
y = Xβ + u, E(uu > ) = σ2I, (8.10)
where at least one of the explanatory variables in the n × k matrix X is
assumed not to be predetermined with respect to the error terms Suppose
that, for each t = 1, , n, condition (8.01) is satisfied for some suitable
information set Ωt , and that we can form an n × k matrix W with typical row W t such that all its elements belong to Ωt The k variables given by the k columns of W are called instrumental variables, or simply instruments.
Later, we will allow for the possibility that the number of instruments mayexceed the number of regressors
Instrumental variables may be either exogenous or predetermined, and, for areason that will be explained later, they should always include any columns
of X that are exogenous or predetermined Finding suitable instruments may
Trang 6be quite easy in some cases, but it can be extremely difficult in others Manyempirical controversies in economics are essentially disputes about whether ornot certain variables constitute valid instruments.
The Simple IV Estimator
For the linear model (8.10), the moment conditions (6.10) simplify to
Since there are k equations and k unknowns, we can solve equations (8.11)
directly to obtain the simple IV estimator
Given the assumption (8.14) of asymptotic identification, it is clear that ˆβIV
is consistent if and only if
plim
n→∞
1
which is precisely the condition (6.16) that was used in the consistency proof
in Section 6.2 We usually refer to this condition by saying that the errorterms are asymptotically uncorrelated with the instruments Condition (8.16)follows from condition (8.13) by the law of large numbers, but it may holdeven if condition (8.13) does not The weaker condition (8.16) is what isrequired for the consistency of the IV estimator
Trang 78.3 Instrumental Variables Estimation 315
Efficiency Considerations
If the model (8.10) is correctly specified with true parameter vector β0 and
true error variance σ2
0, the results of Section 6.2 show that the asymptotic
covariance matrix of n 1/2( ˆβIV− β0) is given by (6.25) or (6.26):
where S W > W ≡ plim n −1 W > W If we have some choice over what
instru-ments to use in the matrix W, it makes sense to choose them so as to minimize
the above asymptotic covariance matrix
First of all, notice that, since (8.17) depends on W only through the nal projection matrix P W , all that matters is the space S(W ) spanned by the
orthogo-instrumental variables In fact, as readers are asked to show in Exercise 8.2,the estimator ˆβIV itself depends on W only through P W This fact is closelyrelated to the result that, for ordinary least squares, fitted values and residuals
depend only on the space S(X) spanned by the regressors.
Suppose first that we are at liberty to choose for instruments any variables atall that satisfy the predeterminedness condition (8.13) Then, under reason-able and plausible conditions, we can characterize the optimal instrumentsfor IV estimation of the model (8.10) By this, we mean the instruments thatminimize the asymptotic covariance matrix (8.17), in the usual sense that anyother choice of instruments leads to an asymptotic covariance matrix thatdiffers from the optimal one by a positive semidefinite matrix
In order to determine the optimal instruments, we must know the generating process In the context of a simultaneous equations model, a singleequation like (8.10), even if we know the values of the parameters, cannot be acomplete description of the DGP, because at least some of the variables in the
data-matrix X are endogenous For the DGP to be fully specified, we must know
how all the endogenous variables are generated For the demand-supply modelgiven by equations (8.06) and (8.07), both of those equations are needed tospecify the DGP For a more complicated simultaneous equations model with
g endogenous variables, we would need g equations For the simple
errors-in-variables model discussed in Section 8.2, we need equations (8.03) as well asequation (8.02) in order to specify the DGP fully
Quite generally, we can suppose that the explanatory variables in (8.10) satisfythe relation
X = ¯ X + V, E(V t | Ω t ) = 0, (8.18) where the tthrow of ¯X is ¯ X t = E(X t | Ω t ), and X t is the tthrow of X Thus
equation (8.18) can be interpreted as saying that ¯X t is the expectation of X t
conditional on the information set Ωt It turns out that the n × k matrix
Trang 8X provides the optimal instruments for (8.10) Of course, in practice, this
matrix is never observed, and we will need to replace ¯X by something that
estimates it consistently
To see that ¯X provides the optimal matrix of instruments, it is, as usual, easier
to reason in terms of precision matrices rather than covariance matrices Forany valid choice of instruments, the precision matrix corresponding to (8.17)
The second equality holds because E(V > W ) = O, since, by the construction
in (8.18), V t has mean zero conditional on W t The last equality is just a LLN
in reverse Similarly, we find that plim n −1 W > X = plim n −1 W > X Thus¯(8.19) becomes
plim
n→∞
1
If we make the choice W = ¯ X, then (8.21) reduces to plim n −1 X¯> X The¯
difference between this and (8.21) is just plim n −1 X¯> M W X, which is a pos-¯
itive semidefinite matrix This shows that ¯X is indeed the optimal choice of
instrumental variables by the criterion of asymptotic variance
We mentioned earlier that all the explanatory variables in (8.10) that are
exo-genous or predetermined should be included in the matrix W of instrumental variables It is now clear why this is so If we denote by Z the submatrix
of X containing the exogenous or predetermined variables, then ¯ Z = Z,
be-cause the row Z t is already contained in Ωt Thus Z is a submatrix of the
matrix ¯X of optimal instruments As such, it should always be a submatrix
of the matrix of instruments W used for estimation, even if W is not actually
equal to ¯X.
The Generalized IV Estimator
In practice, the information set Ωt is very frequently specified by providing
a list of l instrumental variables that suggest themselves for various reasons.
Therefore, we now drop the assumption that the number of instruments is
equal to the number of parameters and let W denote an n×l matrix of ments Often, l is greater than k, the number of regressors in the model (8.10).
instru-In this case, the model is said to be overidentified, because, in general, there
is more than one way to formulate moment conditions like (8.11) using the
Trang 98.3 Instrumental Variables Estimation 317
available instruments If l = k, the model (8.10) is said to be just identified
or exactly identified, because there is only one way to formulate the moment
conditions If l < k, it is said to be underidentified, because there are fewer
moment conditions than parameters to be estimated, and equations (8.11)will therefore have no unique solution
If any instruments at all are available, it is normally possible to generate
an arbitrarily large collection of them, because any deterministic function of the l components of the tthrow W t of W can be used as the tthcomponent
of a new instrument.1 If (8.10) is underidentified, some such procedure is
necessary if we wish to obtain consistent estimates of all the elements of β Alternatively, we would have to impose at least k − l restrictions on β so as
to reduce the number of independent parameters that must be estimated to
no more than the number of instruments
For models that are just identified or overidentified, it is often desirable to
limit the set of potential instruments to deterministic linear functions of the instruments in W, rather than allowing arbitrary deterministic functions We
will see shortly that this is not only reasonable but optimal for linear aneous equation models This means that the IV estimator is unique for a
simult-just identified model, because there is only one k dimensional linear space S(W ) that can be spanned by the k = l instruments, and, as we saw earlier,
the IV estimator for a given model depends only on the space spanned by theinstruments
We can always treat an overidentified model as if it were just identified by
choosing exactly k linear combinations of the l columns of W The challenge
is to choose these linear combinations optimally Formally, we seek an l × k matrix J such that the n × k matrix WJ is a valid instrument matrix and such that the use of J minimizes the asymptotic covariance matrix of the estimator in the class of IV estimators obtained using an n × k instrument matrix of the form WJ ∗ with arbitrary l × k matrix J ∗
There are three requirements that the matrix J must satisfy The first of these is that it should have full column rank of k Otherwise, the space spanned by the columns of WJ would have rank less than k, and the model would be underidentified The second requirement is that J should be at
least asymptotically deterministic If not, it is possible that condition (8.16)
applied to WJ could fail to hold The last requirement is that J be chosen
to minimize the asymptotic covariance matrix of the resulting IV estimator,and we now explain how this may be achieved
If the explanatory variables X satisfy (8.18), then it follows from (8.17) and
(8.20) that the asymptotic covariance matrix of the IV estimator computed
1 This procedure would not work if, for example, all of the original instruments were binary variables.
Trang 10using WJ as instrument matrix is
σ2
0plim
n→∞ (n −1 X¯> P WJ X)¯ −1 (8.22) The tthrow ¯X t of ¯X belongs to Ω tby construction, and so each element of ¯X t
is a deterministic function of the elements of W t However, the deterministic
functions are not necessarily linear with respect to W t Thus, in general, it
is impossible to find a matrix J such that ¯ X = WJ, as would be needed for
WJ to constitute a set of truly optimal instruments A natural second-best
solution is to project ¯X orthogonally on to the space S(W ) This yields the
matrix of instruments
WJ = P W X = W (W¯ > W ) −1 W > X,¯ (8.23)
which implies that
J = (W > W ) −1 W > X.¯ (8.24)
We now show that these instruments are indeed optimal under the constraint
that the instruments should be linear in W t
By substituting P W X for WJ in (8.22), the asymptotic covariance matrix¯
proportional to ¯X > P W X For the estimator with WJ as instruments, the¯
precision matrix is proportional to ¯X > P WJ X The difference between the¯
two precision matrices is therefore proportional to
¯
The k dimensional subspace S(WJ), which is the image of the orthogonal projection P WJ , is a subspace of the l dimensional space S(W ), which is the image of P W Thus, by the result in Exercise 2.16, the difference P W −P WJ isitself an orthogonal projection matrix This implies that the difference (8.26)
is a positive semidefinite matrix, and so we can conclude that (8.23) is indeed
the optimal choice of instruments of the form WJ.
At this point, we come up against the same difficulty as that encountered atthe end of Section 6.2, namely, that the optimal instrument choice is infeasible,because we do not know ¯X But notice that, from the definition (8.24) of the
matrix J, we have that
Trang 118.3 Instrumental Variables Estimation 319
by (8.20) This suggests, correctly, that we can use P W X instead of P W X¯
without changing the asymptotic properties of the estimator
If we use P W X as the matrix of instrumental variables, the moment conditions
(8.11) that define the estimator become
which can be solved to yield the generalized IV estimator, or GIV estimator,
ˆ
βIV = (X > P W X) −1 X > P W y, (8.29)
which is sometimes just abbreviated as GIVE The estimator (8.29) is indeed
a generalization of the simple estimator (8.12), as readers are asked to verify
in Exercise 8.3 For this reason, we will usually refer to the IV estimatorwithout distinguishing the simple from the generalized case
The generalized IV estimator (8.29) can also be obtained by minimizing the
IV criterion function, which has many properties in common with the sum
of squared residuals for models estimated by least squares This function isdefined as follows:
Q(β, y) = (y − Xβ) > P W (y − Xβ) (8.30) Minimizing Q(β, y) with respect to β yields the estimator (8.29), as readers
are asked to show in Exercise 8.4
Identifiability and Consistency of the IV Estimator
In Section 6.2, we defined in (6.12) a k vector α(β) of deterministic functions
as the probability limits of the functions used in the moment conditions that
define an estimator, and we saw that the parameter vector β is asymptotically
identified if two asymptotic identification conditions are satisfied The first
condition is that α(β0) = 0, and the second is that α(β) 6= 0 for all β 6= β0.The analogous vector of functions for the IV estimator is
W > X , which was defined in (8.14), and S W > W was fined just after (8.17) For asymptotic identification, we assume that boththese matrices exist and have full rank This assumption is analogous to the
de-assumption that 1/n times the matrix X > X has probability limit S X > X, amatrix with full rank, which we originally made in Section 3.3 when we proved
that the OLS estimator is consistent If S W > W does not have full rank, then
at least one of the instruments is perfectly collinear with the others,
asymp-totically, and should therefore be dropped If S W > X does not have full rank,
Trang 12then the asymptotic version of the moment conditions (8.28) has fewer than k
linearly independent equations, and these conditions therefore have no uniquesolution
If β0is the true parameter vector, then y − Xβ0= u, and the right-hand side
of (8.31) vanishes under the assumption (8.16) used to show the consistency
of the simple IV estimator Thus α(β0) = 0, and the first condition forasymptotic identification is satisfied
The second condition requires that α(β) 6= 0 for all β 6= β0 It is easy to seefrom (8.31) that
α(β) = S X > W (S W > W)−1 S W > X (β0− β).
For this to be nonzero for all nonzero β0− β, it is necessary and sufficient
that the matrix S X > W (S W > W)−1 S W > X should have full rank k This will
be the case if the matrices S W > W and S W > X both have full rank, as we
have assumed If l = k, the conditions on the two matrices S W > W and
S W > X simplify, as we saw when considering the simple IV estimator, to the
single condition (8.14) The condition that S X > W (S W > W)−1 S W > X has full
rank can also be used to show that the probability limit of 1/n times the IV criterion function (8.30) has a unique global minimum at β = β0, as readersare asked to show in Exercise 8.5
The two asymptotic identification conditions are sufficient for consistency.Because we are dealing here with linear models, there is no need for a sophis-ticated proof of this fact; see Exercise 8.6 The key assumption is, of course,(8.16) If this assumption did not hold, because any of the instruments wasasymptotically correlated with the error terms, the first of the asymptoticidentification conditions would not hold either, and the IV estimator wouldnot be consistent
Asymptotic Distribution of the IV Estimator
Like every estimator that we have studied, the IV estimator is ically normally distributed with an asymptotic covariance matrix that can
asymptot-be estimated consistently The asymptotic covariance matrix for the simple
IV estimator, expression (8.17), turns out to be valid for the generalized IV
estimator as well To see this, we replace W in (8.17) by the asymptotically optimal instruments P W X As in (8.25), we find that
X > P P W X X = X > P W X(X > P W X) −1 X > P W X = X > P W X,
from which it follows that (8.17) is unchanged if W is replaced by P W X.
It can also be shown directly that (8.17) is the asymptotic covariance matrix
of the generalized IV estimator From (8.29), it follows that
n 1/2( ˆβIV− β0) = (n −1 X > P W X) −1 n −1/2 X > P W u (8.32)
Trang 138.3 Instrumental Variables Estimation 321
Under reasonable assumptions, a central limit theorem can be applied to
the expression n −1/2 W > u, which allows us to conclude that the asymptotic
distribution of this expression is multivariate normal, with mean zero andcovariance matrix
lim
n→∞
1
− n W > E(uu > )W = σ20S W > W , (8.33) since we assume that E(uu > ) = σ2
0I With this result, it can be shown quitesimply that (8.17) is the asymptotic covariance matrix of ˆβIV; see Exercise 8.7
In practice, since σ2
0 is unknown, we used
Nevertheless, many regression packages divide by n − k instead of by n.
The choice of instruments will usually affect the asymptotic covariance matrix
of the IV estimator If some or all of the columns of ¯X are not contained in
the span S(W ) of the instruments, an efficiency gain is potentially available
if that span is made larger Readers are asked in Exercise 8.8 to demonstrate
formally that adding an extra instrument by appending a new column to W
will, in general, reduce the asymptotic covariance matrix Of course, it cannot
be made smaller than the lower bound σ2
0( ¯X > X)¯ −1, which is attained if theoptimal instruments ¯X are available.
When all the regressors can validly be used as instruments, we have ¯X = X,
and the efficient IV estimator coincides with the OLS estimator, as the Markov Theorem predicts
Gauss-Two-Stage Least Squares
The IV estimator (8.29) is commonly known as the two-stage least squares,
or 2SLS, estimator, because, before the days of good econometrics softwarepackages, it was often calculated in two stages using OLS regressions In the
first stage, each column x i , i = 1, , k, of X is regressed on W, if necessary.
If a regressor x i is a valid instrument, it is already (or should be) one of the
columns of W In that case, since P W x i = x i, no first-stage regression isneeded, and we say that such a regressor serves as its own instrument.The fitted values from the first-stage regressions, plus the actual values ofany regressors that serve as their own instruments, are collected to form the
matrix P W X Then the second-stage regression,
Trang 14is used to obtain the 2SLS estimates Because P W is an idempotent matrix,
the OLS estimate of β from this second-stage regression is
ˆ
β2sls= (X > P W X) −1 X > P W y,
which is identical to (8.29), the generalized IV estimator ˆβIV
If this two-stage procedure is used, some care must be taken when estimatingthe standard error of the regression and the covariance matrix of the parameter
estimates The OLS estimate of σ2 from regression (8.35) is
to estimate σ2 Instead, it would use (8.37), or at least something that isasymptotically equivalent to it
Two-stage least squares was invented by Theil (1953) and Basmann (1957)
at a time when computers were very primitive Consequently, despite theclassic papers of Durbin (1954) and Sargan (1958) on instrumental variablesestimation, the term “two-stage least squares” came to be very widely used
in econometrics, even when the estimator is not actually computed in twostages We prefer to think of two-stage least squares as simply a particularway to compute the generalized IV estimator, and we will use ˆβIV rather thanˆ
β2sls to denote that estimator
8.4 Finite-Sample Properties of IV Estimators
Unfortunately, the finite-sample distributions of IV estimators are much morecomplicated than the asymptotic ones Indeed, except in very special cases,these distributions are unknowable in practice Although it is consistent, the
IV estimator for just identified models has a distribution with such thick tailsthat its expectation does not even exist With overidentified models, theexpectation of the estimator exists, but it is in general different from the trueparameter value, so that the estimator is biased, often very substantially so
In consequence, investigators can easily make serious errors of inference wheninterpreting IV estimates
Trang 158.4 Finite-Sample Properties of IV Estimators 323
The biases in the OLS estimates of a model like (8.10) arise because theerror terms are correlated with some of the regressors The IV estimatorsolves this problem asymptotically, because the projections of the regressors
on to S(W ) are asymptotically uncorrelated with the error terms However,
there will always still be some correlation in finite samples, and this causesthe IV estimator to be biased
Systems of Equations
In order to understand the finite-sample properties of the IV estimator, weneed to consider the model (8.10) as part of a system of equations Wetherefore change notation somewhat and rewrite (8.10) as
y = Zβ1+ Yβ2+ u, E(uu > ) = σ2I, (8.38) where the matrix of regressors X has been partitioned into two parts, namely,
an n × k1 matrix of exogenous and predetermined variables, Z, and an n × k2matrix of endogenous variables, Y, and the vector β has been partitioned conformably into two subvectors β1 and β2 There are assumed to be l ≥ k instruments, of which k1 are the columns of the matrix Z.
The model (8.38) is not fully specified, because it says nothing about how the
matrix Y is generated For each observation t, t = 1, , n, the value y t of
the dependent variable and the values Y t of the other endogenous variablesare assumed to be determined by a set of linear simultaneous equations The
variables in the matrix Y are called current endogenous variables, because they are determined simultaneously, row by row, along with y Suppose that
all the exogenous and predetermined explanatory variables in the full set of
simultaneous equations are included in the n × l instrument matrix W, of which the first k1 columns are those of Z Then, as can easily be seen by
analogy with the explicit result (8.09) for the demand-supply model, we have
for each endogenous variable y i , i = 0, 1, , k2, that
y i = Wπ i + v i , E(v i | W ) = 0 (8.39) Here y0 ≡ y, and the y i , for i = 1, , k2, are the columns of Y The π i are l vectors of unknown coefficients, and the v i are n vectors of error terms
that are innovations with respect to the instruments
Equations like (8.39), which have only exogenous and predetermined variables
on the right-hand side, are called reduced form equations, in contrast withequations like (8.38), which are called structural equations Writing a model
as a set of reduced form equations emphasizes the fact that all the endogenousvariables are generated by similar mechanisms In general, the error terms forthe various reduced form equations will display contemporaneous correlation:
If v ti denotes a typical element of the vector v i , then, for observation t, the reduced form error terms v ti will generally be correlated among themselves
and correlated with the error term u t of the structural equation
Trang 16y = xβ0+ σ u u, x = wπ0+ σ v v, (8.40) analogously to (8.39) By explicitly writing σ u and σ v as the standard devia-
tions of the error terms, we can define the vectors u and v to be multivariate standard normal, that is, distributed as N (0, I) There is contemporaneous correlation of u and v, so that we have E(u t v t ) = ρ, for some correlation coefficient ρ such that −1 < ρ < 1 The result of Exercise 4.4 shows that the expectation of u t conditional on v t is ρv t , and so we can write u = ρv + u1,
where u1 has mean zero conditional on v.
In this simple, just identified, setup, the IV estimator of the parameter β is
ˆ
βIV = (w > x) −1 w > y = β0+ σ u (w > x) −1 w > u (8.41) This expression is clearly unchanged if the instrument w is multiplied by an arbitrary scalar, and so we can, without loss of generality, rescale w so that
w > w = 1 Then, using the second equation in (8.40), we find that
Let us now compute the expectation of this expression conditional on v Since,
by construction, E(u1| v) = 0, we obtain
E( ˆβIV− β0) = ρσ u
σ v
z
where we have made the definitions a ≡ π0 /σ v , and z ≡ w > v Given our
rescaling of w, it is easy to see that z ∼ N (0, 1).
If ρ = 0, the right-hand side of (8.42) vanishes, and so the unconditional
expectation of ˆβIV− β0 vanishes as well Therefore, in this special case, ˆβIV
is unbiased This is as expected, since, if ρ = 0, the regressor x is uncorrelated with the error vector u If ρ 6= 0, however, (8.42) is equal to a nonzero factor times the random variable z/(a + z) Unless a = 0, it turns out that this
random variable has no expectation To see this, we can try to calculate it
Trang 178.4 Finite-Sample Properties of IV Estimators 325
where, as usual, φ(·) is the density of the standard normal distribution It is
a fairly simple calculus exercise to show that the integral in (8.43) diverges in
the neighborhood of x = −a.
If π0 = 0, then a = 0 In this rather odd case, x = σ v v is just noise, as though
it were an error term Therefore, since z/(a + z) reduces to 1, the expectation
exists, but it is not zero, and ˆβIV is therefore biased
When a 6= 0, which is the usual case, the IV estimator (8.41) is neither biased nor unbiased, because it has no expectation for any finite sample size n This
may seem to contradict the result according to which ˆβIV is asymptoticallynormal, since all the moments of the normal distribution exist However,the fact that a sequence of random variables converges to a limiting ran-
dom variable does not necessarily imply that the moments of the variables
in the sequence converge to those of the limiting variable; see Davidson andMacKinnon (1993, Section 4.5) The estimator (8.41) is a case in point For-tunately, this possible failure to converge of the moments does not extend tothe CDFs of the random variables, which do indeed converge to that of the
limit Consequently, P values and the upper and lower limits of confidence
intervals computed with the asymptotic distribution are legitimate mations, in the sense that they become more and more accurate as the samplesize increases
approxi-A less simple calculation can be used to show that, in the overidentified case,
the first l − k moments of ˆ βIV exist; see Kinal (1980) This is consistentwith the result we have just obtained for an exactly identified model, where
l − k = 0, and the IV estimator has no moments at all When the mean of
ˆ
βIV exists, it is almost never equal to β0 Readers will have a much cleareridea of the impact of the existence or nonexistence of moments, and of thebias of the IV estimator, if they work carefully through Exercises 8.10 to 8.13,
in which they are asked to generate by simulation the EDFs of the estimator
in different situations
The General Case
We now return to the general case, in which the structural equation (8.38)
is being estimated, and the other endogenous variables are generated by the
reduced form equations (8.39) for i = 1, , k2, which correspond to the stage regressions for 2SLS We can group the vectors of fitted values from
first-these regressions into an n × k2 matrix P W Y The generalized IV
estima-tor is then equivalent to a simple IV estimaestima-tor that uses the instruments
P W X = [Z P W Y ] By grouping the l vectors π i , i = 1, , k2 into an
l × k2 matrix Π2 and the vectors of error terms v i into an n × k2 matrix V2,
we see that
P W X = [Z P W Y ] = [Z P W (WΠ2+ V2)]
= [Z WΠ2+ P W V2] = WΠ + P W V (8.44)
Trang 18Here V is an n × k matrix of the form [O V2], where the zero block has
dimension n × k1, and Π is an l × k matrix, which can be written as Π = [Π1 Π2], where the l × k1 matrix Π1 is a k1× k1 identity matrix sitting on
top of an (l − k1) × k1 zero matrix It is easily checked that these definitions
make the last equality in (8.44) correct Thus P W X has two components:
WΠ, which by assumption is uncorrelated with u, and P W V, which will
almost always be correlated with u.
If we substitute the rightmost expression of (8.44) into (8.32), eliminating the
factors of powers of n, which are unnecessary in the finite-sample context, we
If V = O, the supposedly endogenous variables Y are in fact exogenous or
predetermined, and it can be checked (see Exercise 8.14) that, in this case,ˆ
βIV is just the OLS estimator for model (8.10)
If V is not zero, but is independent of u, then we see immediately that the expectation of (8.45) conditional on V is zero This case is the analog of the case with ρ = 0 in (8.42) Note that we require the full independence of V and u for this to hold If instead V were just predetermined with respect
to u, the IV estimator would still have a finite-sample bias, for exactly the
same reasons as those leading to finite-sample bias of the OLS estimator withpredetermined but not exogenous explanatory variables
When V and u are contemporaneously correlated, it can be shown that all the terms in (8.45) which involve V do not contribute asymptotically; see
Exercise 8.15 Thus we can see that any discrepancy between the sample and asymptotic distributions of ˆβIV− β0 must arise from the terms
finite-in (8.45) that finite-involve V In fact, finite-in the absence of other features of the model
that could give rise to finite-sample bias, such as lagged dependent variables,the poor finite-sample properties of the IV estimator arise solely from the
contemporaneous correlation between P W V and u In particular, the second
term in the second factor of (8.45) will generally have a nonzero mean, and
this term can be a major source of bias when the correlation between u and some of the columns of V is high.
If the terms involving V in (8.45) are relatively small, the finite-sample
distri-bution of the IV estimator is likely to be well approximated by its asymptoticdistribution However, if these terms are not small, the asymptotic approxi-mation may be poor Thus our analysis suggests that there are three situations
in which the IV estimator is likely to have poor finite-sample properties
Trang 198.4 Finite-Sample Properties of IV Estimators 327
• When l, the number of instruments, is large, W will be able to explain
much of the variation in V ; recall from Section 3.8 that adding additional regressors can never reduce the R2 of a regression With large l, conse- quently, P W V will be relatively large When the number of instruments
is extremely large relative to the sample size, the first-stage regressions
may fit so well that P W Y is very similar to Y In this situation, the
IV estimates may be almost as biased as the OLS ones
• When at least some of the reduced-form regressions (8.39) fit poorly,
in the sense that the R2 is small or the F statistic for all the slope
coefficients to be zero is insignificant, the model is said to suffer from
weak instruments In this situation, even if P W V is no larger than usual,
it may nevertheless be large relative to WΠ When the instruments are
very weak, the finite-sample distribution of the IV estimator may be veryfar from its asymptotic distribution even in samples with many thousands
of observations An example of this is furnished by the case in which a = 0
in (8.42) in our simple example with one regressor and one instrument
As we saw, the distribution of the estimator is quite different when a = 0 from what it is when a 6= 0; the distribution when a ∼= 0 may well be
similar to the distribution when a = 0.
• When the correlation between u and some of the columns of V is very
high, V > P W u will tend to be relatively large Whether it will be large
enough to cause serious problems for inference will depend on the samplesize, the number of instruments, and how well the instruments explainthe endogenous variables
It may seem that adding additional instruments will always increase the sample bias of the IV estimator, and Exercise 8.13 illustrates a case in which
finite-it does In that case, the addfinite-itional instruments do not really belong in thereduced-form regressions However, if the instruments truly belong in thereduced-form regressions, adding them will alleviate the weak instrumentsproblem, and that can actually cause the bias to diminish
Finite-sample inference in models estimated by instrumental variables is asubject of active research in econometrics Relatively recent papers on thistopic include Nelson and Startz (1990a, 1990b), Buse (1992), Bekker (1994),Bound, Jaeger, and Baker (1995), Dufour (1997), Staiger and Stock (1997),Wang and Zivot (1998), Zivot, Startz, and Nelson (1998), Angrist, Imbens,and Krueger (1999), Blomquist and Dahlberg (1999), Donald and Newey(2001), Hahn and Hausman (2002), Kleibergen (2002), and Stock, Wright,and Yogo (2002) There remain many unsolved problems
Trang 208.5 Hypothesis Testing
Because the finite-sample distributions of IV estimators are almost neverknown, exact tests of hypotheses based on such estimators are almost neveravailable However, large-sample tests can be performed in a variety of ways.Since many of the methods of performing these tests are very similar to meth-ods that we have already discussed in Chapters 4 and 6, there is no need todiscuss them in detail
Asymptotic t and Wald Statistics
When there is just one restriction, the easiest approach is simply to compute
an asymptotic t test For example, if we wish to test the hypothesis that
β i = β 0i , where β i is one of the regression parameters, then a suitable teststatistic is
t β i = ¡ βˆi − β i0
d
where ˆβ i is the IV estimate of β i, and dVar( ˆβ i ) is the ith diagonal element
of the estimated covariance matrix, (8.34) This test statistic will not follow
the Student’s t distribution in finite samples, but it will be asymptotically distributed as N (0, 1) under the null hypothesis.
For testing restrictions on two or more parameters, the natural analog of
(8.47) is a Wald statistic Suppose that β is partitioned as [β1 β2], and we
wish to test the hypothesis that β2 = β20 Then, as in (6.71), the appropriateWald statistic is
W β2 = ( ˆβ2− β20)>¡dVar( ˆβ2)¢−1( ˆβ2− β20), (8.48)
where dVar( ˆβ2) is the submatrix of (8.34) that corresponds to the vector β2
This Wald statistic can be thought of as a generalization of the asymptotic t statistic: When β2 is a scalar, the square root of (8.48) is (8.47)
The IV Variant of the GNR
In many circumstances, the easiest way to obtain asymptotically valid teststatistics for models estimated using instrumental variables is to use a variant
of the Gauss-Newton regression For the model (8.10), this variant, called theIVGNR, takes the form
y − Xβ = P W Xb + residuals (8.49)
As with the usual GNR, the variables of the IVGNR must be evaluated at
some prespecified value of β before the regression can be run, in the usual
way, using ordinary least squares
The IVGNR has the same properties relative to model (8.10) as the ordinaryGNR has relative to linear and nonlinear regression models estimated by least