1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Econometric theory and methods, Russell Davidson - Chapter 8 pot

41 449 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 41
Dung lượng 317,8 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

In the remainder of this chapter, we therefore discuss the method of instrumental variables.This method can be used whenever the error terms are correlated with one or more of the explan

Trang 1

Chapter 8 Instrumental Variables

Estimation

8.1 Introduction

In Section 3.3, the ordinary least squares estimator ˆβ was shown to be

consis-tent under condition (3.10), according to which the expectation of the error

term u t associated with observation t is zero conditional on the regressors X t

for that same observation As we saw in Section 4.5, this condition can also

be expressed either by saying that the regressors X t are predetermined or by

saying that the error terms u t are innovations When condition (3.10) doesnot hold, the consistency proof of Section 3.3 is not applicable, and the OLSestimator will, in general, be biased and inconsistent

It is not always reasonable to assume that the error terms are innovations

In fact, as we will see in the next section, there are commonly encounteredsituations in which the error terms are necessarily correlated with some of theregressors for the same observation Even in these circumstances, however, it

is usually possible, although not always easy, to define an information set Ωtfor each observation such that

Trang 2

with briefly A more general class of MM estimators, of which both OLS and

IV are special cases, will be the subject of Chapter 9

8.2 Correlation Between Error Terms and Regressors

We now briefly discuss two common situations in which the error terms will

be correlated with the regressors and will therefore not have mean zero ditional on them The first one, usually referred to by the name errors invariables, occurs whenever the independent variables in a regression modelare measured with error The second situation, often simply referred to assimultaneity, occurs whenever two or more endogenous variables are jointlydetermined by a system of simultaneous equations

con-Errors in Variables

For a variety of reasons, many economic variables are measured with error Forexample, macroeconomic time series are often based, in large part, on surveys,and they must therefore suffer from sampling variability Whenever thereare measurement errors, the values economists observe inevitably differ, to agreater or lesser extent, from the true values that economic agents presumablyact upon As we will see, measurement errors in the dependent variable of aregression model are generally of no great consequence, unless they are verylarge However, measurement errors in the independent variables cause theerror terms to be correlated with the regressors that are measured with error,and this causes OLS to be inconsistent

The problems caused by errors in variables can be seen quite clearly in thecontext of the simple linear regression model Consider the model

y ◦ t = β1+ β2x ◦ t + u ◦ t , u ◦ t ∼ IID(0, σ2), (8.02) where the variables x ◦

Here v 1t and v 2t are measurement errors which are assumed, perhaps not

realistically in some cases, to be IID with variances ω2

If we suppose that the true DGP is a special case of (8.02) along with (8.03),

we see from (8.03) that x ◦

t = x t − v 1t and y ◦

t = y t − v 2t If we substitute theseinto (8.02), we find that

y t = β1+ β2(x t − v 1t ) + u ◦

t + v 2t

= β1+ β2x t + u ◦ t + v 2t − β2v 1t

Trang 3

8.2 Correlation Between Error Terms and Regressors 311

The measurement error in the independent variable also increases the variance

of the error terms, but it has another, much more severe, consequence as well

Because x t = x ◦

t + v 1t , and u t depends on v 1t , u t will be correlated with x t whenever β2 6= 0 In fact, since the random part of x t is v 1t, we see that

E(u t | x t ) = E(u t | v 1t ) = −β2v 1t , (8.05) because we assume that v 1t is independent of u ◦

t and v 2t From (8.05), we can

see, using the fact that E(u t) = 0 unconditionally, that

Cov(x t , u t ) = E(x t u t) = E¡x t E(u t | x t

= −E¡(x ◦ t + v 1t )β2v 1t¢= −β2ω21.

This covariance is negative if β2> 0 and positive if β2 < 0, and, since it does

not depend on the sample size n, it will not go away as n becomes large An exactly similar argument shows that the assumption that E(u t | X t) = 0 is

false whenever any element of X t is measured with error In consequence, theOLS estimator will be biased and inconsistent

Errors in variables are a potential problem whenever we try to estimate aconsumption function, especially if we are using cross-section data Manyeconomic theories (for example, Friedman, 1957) suggest that household con-sumption will depend on “permanent” income or “life-cycle” income, but sur-veys of household behavior almost never measure this Instead, they typically

provide somewhat inaccurate estimates of current income If we think of y t as

measured consumption, x ◦

t as permanent income, and x t as estimated currentincome, then the above analysis applies directly to the consumption function

The marginal propensity to consume is β2, which must be positive, causing

the correlation between u t and x t to be negative As readers are asked to show

in Exercise 8.1, the probability limit of ˆβ2 is less than the true value β20 Inconsequence, the OLS estimator ˆβ2 is biased downward, even asymptotically

Of course, if our objective is simply to estimate the relationship between the

observed dependent variable y t and the observed independent variable x t,there is nothing wrong with using ordinary least squares to estimate equation

(8.04) In that case, u t would simply be defined as the difference between

y t and its expectation conditional on x t But our analysis shows that the

OLS estimators of β1 and β2 in equation (8.04) are not consistent for thecorresponding parameters of equation (8.02) In most cases, it is parameterslike these that we want to estimate on the basis of economic theory

There is an extensive literature on ways to avoid the inconsistency caused byerrors in variables See, among many others, Hausman and Watson (1985),

Trang 4

Leamer (1987), and Dagenais and Dagenais (1997) The simplest and mostwidely-used approach is just to use an instrumental variables estimator.

Simultaneous Equations

Economic theory often suggests that two or more endogenous variables aredetermined simultaneously In this situation, as we will see shortly, all of theendogenous variables will necessarily be correlated with the error terms in all

of the equations This means that none of them may validly appear in theregression functions of models that are to be estimated by least squares

A classic example, which well illustrates the econometric problems caused bysimultaneity, is the determination of price and quantity for a commodity at

the partial equilibrium of a competitive market Suppose that q t is quantity

and p t is price, both of which would often be in logarithms A linear (orloglinear) model of demand and supply is

functions, β d and β s are corresponding vectors of parameters, γ d and γ s are

scalar parameters, and u d

t and u s

t are the error terms in the demand and

supply functions Economic theory predicts that, in most cases, γ d < 0 and

γ s > 0, which is equivalent to saying that the demand curve slopes downward

and the supply curve slopes upward

Equations (8.06) and (8.07) are a pair of linear simultaneous equations for

the two unknowns p t and q t For that reason, these equations constitute what

is called a linear simultaneous equations model In this case, there are twodependent variables, quantity and price For estimation purposes, the keyfeature of the model is that quantity depends on price in both equations.Since there are two equations and two unknowns, it is straightforward to solve

equations (8.06) and (8.07) for p t and q t This is most easily done by rewritingthem in matrix notation as

·

u d t

u s t

¸

The solution to (8.08), which will exist whenever γ d 6= γ s, so that the matrix

on the left-hand side of (8.08) is nonsingular, is

·

u d t

u s t

¸!

Trang 5

8.3 Instrumental Variables Estimation 313

It can be seen from this solution that p t and q t will depend on both u d

t and u s

t,and on every exogenous and predetermined variable that appears in either the

demand function, the supply function, or both Therefore, p t, which appears

on the right-hand side of equations (8.06) and (8.07), must be correlated withthe error terms in both of those equations If we rewrote one or both equations

so that p t was on the left-hand side and q t was on the right-hand side, the

problem would not go away, because q t is also correlated with the error terms

mate the full system of equations, there are many options, some of which will

be discussed in Chapter 12 If we simply want to estimate one equation out

of such a system, the most popular approach is to use instrumental variables

We have discussed two important situations in which the error terms willnecessarily be correlated with some of the regressors, and the OLS estimatorwill consequently be inconsistent This provides a strong motivation to employestimators that do not suffer from this type of inconsistency In the remainder

of this chapter, we therefore discuss the method of instrumental variables.This method can be used whenever the error terms are correlated with one

or more of the explanatory variables, regardless of how that correlation mayhave arisen

8.3 Instrumental Variables Estimation

For most of this chapter, we will focus on the linear regression model

y = Xβ + u, E(uu > ) = σ2I, (8.10)

where at least one of the explanatory variables in the n × k matrix X is

assumed not to be predetermined with respect to the error terms Suppose

that, for each t = 1, , n, condition (8.01) is satisfied for some suitable

information set Ωt , and that we can form an n × k matrix W with typical row W t such that all its elements belong to Ωt The k variables given by the k columns of W are called instrumental variables, or simply instruments.

Later, we will allow for the possibility that the number of instruments mayexceed the number of regressors

Instrumental variables may be either exogenous or predetermined, and, for areason that will be explained later, they should always include any columns

of X that are exogenous or predetermined Finding suitable instruments may

Trang 6

be quite easy in some cases, but it can be extremely difficult in others Manyempirical controversies in economics are essentially disputes about whether ornot certain variables constitute valid instruments.

The Simple IV Estimator

For the linear model (8.10), the moment conditions (6.10) simplify to

Since there are k equations and k unknowns, we can solve equations (8.11)

directly to obtain the simple IV estimator

Given the assumption (8.14) of asymptotic identification, it is clear that ˆβIV

is consistent if and only if

plim

n→∞

1

which is precisely the condition (6.16) that was used in the consistency proof

in Section 6.2 We usually refer to this condition by saying that the errorterms are asymptotically uncorrelated with the instruments Condition (8.16)follows from condition (8.13) by the law of large numbers, but it may holdeven if condition (8.13) does not The weaker condition (8.16) is what isrequired for the consistency of the IV estimator

Trang 7

8.3 Instrumental Variables Estimation 315

Efficiency Considerations

If the model (8.10) is correctly specified with true parameter vector β0 and

true error variance σ2

0, the results of Section 6.2 show that the asymptotic

covariance matrix of n 1/2( ˆβIV− β0) is given by (6.25) or (6.26):

where S W > W ≡ plim n −1 W > W If we have some choice over what

instru-ments to use in the matrix W, it makes sense to choose them so as to minimize

the above asymptotic covariance matrix

First of all, notice that, since (8.17) depends on W only through the nal projection matrix P W , all that matters is the space S(W ) spanned by the

orthogo-instrumental variables In fact, as readers are asked to show in Exercise 8.2,the estimator ˆβIV itself depends on W only through P W This fact is closelyrelated to the result that, for ordinary least squares, fitted values and residuals

depend only on the space S(X) spanned by the regressors.

Suppose first that we are at liberty to choose for instruments any variables atall that satisfy the predeterminedness condition (8.13) Then, under reason-able and plausible conditions, we can characterize the optimal instrumentsfor IV estimation of the model (8.10) By this, we mean the instruments thatminimize the asymptotic covariance matrix (8.17), in the usual sense that anyother choice of instruments leads to an asymptotic covariance matrix thatdiffers from the optimal one by a positive semidefinite matrix

In order to determine the optimal instruments, we must know the generating process In the context of a simultaneous equations model, a singleequation like (8.10), even if we know the values of the parameters, cannot be acomplete description of the DGP, because at least some of the variables in the

data-matrix X are endogenous For the DGP to be fully specified, we must know

how all the endogenous variables are generated For the demand-supply modelgiven by equations (8.06) and (8.07), both of those equations are needed tospecify the DGP For a more complicated simultaneous equations model with

g endogenous variables, we would need g equations For the simple

errors-in-variables model discussed in Section 8.2, we need equations (8.03) as well asequation (8.02) in order to specify the DGP fully

Quite generally, we can suppose that the explanatory variables in (8.10) satisfythe relation

X = ¯ X + V, E(V t | Ω t ) = 0, (8.18) where the tthrow of ¯X is ¯ X t = E(X t | Ω t ), and X t is the tthrow of X Thus

equation (8.18) can be interpreted as saying that ¯X t is the expectation of X t

conditional on the information set Ωt It turns out that the n × k matrix

Trang 8

X provides the optimal instruments for (8.10) Of course, in practice, this

matrix is never observed, and we will need to replace ¯X by something that

estimates it consistently

To see that ¯X provides the optimal matrix of instruments, it is, as usual, easier

to reason in terms of precision matrices rather than covariance matrices Forany valid choice of instruments, the precision matrix corresponding to (8.17)

The second equality holds because E(V > W ) = O, since, by the construction

in (8.18), V t has mean zero conditional on W t The last equality is just a LLN

in reverse Similarly, we find that plim n −1 W > X = plim n −1 W > X Thus¯(8.19) becomes

plim

n→∞

1

If we make the choice W = ¯ X, then (8.21) reduces to plim n −1 X¯> X The¯

difference between this and (8.21) is just plim n −1 X¯> M W X, which is a pos-¯

itive semidefinite matrix This shows that ¯X is indeed the optimal choice of

instrumental variables by the criterion of asymptotic variance

We mentioned earlier that all the explanatory variables in (8.10) that are

exo-genous or predetermined should be included in the matrix W of instrumental variables It is now clear why this is so If we denote by Z the submatrix

of X containing the exogenous or predetermined variables, then ¯ Z = Z,

be-cause the row Z t is already contained in Ωt Thus Z is a submatrix of the

matrix ¯X of optimal instruments As such, it should always be a submatrix

of the matrix of instruments W used for estimation, even if W is not actually

equal to ¯X.

The Generalized IV Estimator

In practice, the information set Ωt is very frequently specified by providing

a list of l instrumental variables that suggest themselves for various reasons.

Therefore, we now drop the assumption that the number of instruments is

equal to the number of parameters and let W denote an n×l matrix of ments Often, l is greater than k, the number of regressors in the model (8.10).

instru-In this case, the model is said to be overidentified, because, in general, there

is more than one way to formulate moment conditions like (8.11) using the

Trang 9

8.3 Instrumental Variables Estimation 317

available instruments If l = k, the model (8.10) is said to be just identified

or exactly identified, because there is only one way to formulate the moment

conditions If l < k, it is said to be underidentified, because there are fewer

moment conditions than parameters to be estimated, and equations (8.11)will therefore have no unique solution

If any instruments at all are available, it is normally possible to generate

an arbitrarily large collection of them, because any deterministic function of the l components of the tthrow W t of W can be used as the tthcomponent

of a new instrument.1 If (8.10) is underidentified, some such procedure is

necessary if we wish to obtain consistent estimates of all the elements of β Alternatively, we would have to impose at least k − l restrictions on β so as

to reduce the number of independent parameters that must be estimated to

no more than the number of instruments

For models that are just identified or overidentified, it is often desirable to

limit the set of potential instruments to deterministic linear functions of the instruments in W, rather than allowing arbitrary deterministic functions We

will see shortly that this is not only reasonable but optimal for linear aneous equation models This means that the IV estimator is unique for a

simult-just identified model, because there is only one k dimensional linear space S(W ) that can be spanned by the k = l instruments, and, as we saw earlier,

the IV estimator for a given model depends only on the space spanned by theinstruments

We can always treat an overidentified model as if it were just identified by

choosing exactly k linear combinations of the l columns of W The challenge

is to choose these linear combinations optimally Formally, we seek an l × k matrix J such that the n × k matrix WJ is a valid instrument matrix and such that the use of J minimizes the asymptotic covariance matrix of the estimator in the class of IV estimators obtained using an n × k instrument matrix of the form WJ ∗ with arbitrary l × k matrix J ∗

There are three requirements that the matrix J must satisfy The first of these is that it should have full column rank of k Otherwise, the space spanned by the columns of WJ would have rank less than k, and the model would be underidentified The second requirement is that J should be at

least asymptotically deterministic If not, it is possible that condition (8.16)

applied to WJ could fail to hold The last requirement is that J be chosen

to minimize the asymptotic covariance matrix of the resulting IV estimator,and we now explain how this may be achieved

If the explanatory variables X satisfy (8.18), then it follows from (8.17) and

(8.20) that the asymptotic covariance matrix of the IV estimator computed

1 This procedure would not work if, for example, all of the original instruments were binary variables.

Trang 10

using WJ as instrument matrix is

σ2

0plim

n→∞ (n −1 X¯> P WJ X)¯ −1 (8.22) The tthrow ¯X t of ¯X belongs to Ω tby construction, and so each element of ¯X t

is a deterministic function of the elements of W t However, the deterministic

functions are not necessarily linear with respect to W t Thus, in general, it

is impossible to find a matrix J such that ¯ X = WJ, as would be needed for

WJ to constitute a set of truly optimal instruments A natural second-best

solution is to project ¯X orthogonally on to the space S(W ) This yields the

matrix of instruments

WJ = P W X = W (W¯ > W ) −1 W > X,¯ (8.23)

which implies that

J = (W > W ) −1 W > X.¯ (8.24)

We now show that these instruments are indeed optimal under the constraint

that the instruments should be linear in W t

By substituting P W X for WJ in (8.22), the asymptotic covariance matrix¯

proportional to ¯X > P W X For the estimator with WJ as instruments, the¯

precision matrix is proportional to ¯X > P WJ X The difference between the¯

two precision matrices is therefore proportional to

¯

The k dimensional subspace S(WJ), which is the image of the orthogonal projection P WJ , is a subspace of the l dimensional space S(W ), which is the image of P W Thus, by the result in Exercise 2.16, the difference P W −P WJ isitself an orthogonal projection matrix This implies that the difference (8.26)

is a positive semidefinite matrix, and so we can conclude that (8.23) is indeed

the optimal choice of instruments of the form WJ.

At this point, we come up against the same difficulty as that encountered atthe end of Section 6.2, namely, that the optimal instrument choice is infeasible,because we do not know ¯X But notice that, from the definition (8.24) of the

matrix J, we have that

Trang 11

8.3 Instrumental Variables Estimation 319

by (8.20) This suggests, correctly, that we can use P W X instead of P W X¯

without changing the asymptotic properties of the estimator

If we use P W X as the matrix of instrumental variables, the moment conditions

(8.11) that define the estimator become

which can be solved to yield the generalized IV estimator, or GIV estimator,

ˆ

βIV = (X > P W X) −1 X > P W y, (8.29)

which is sometimes just abbreviated as GIVE The estimator (8.29) is indeed

a generalization of the simple estimator (8.12), as readers are asked to verify

in Exercise 8.3 For this reason, we will usually refer to the IV estimatorwithout distinguishing the simple from the generalized case

The generalized IV estimator (8.29) can also be obtained by minimizing the

IV criterion function, which has many properties in common with the sum

of squared residuals for models estimated by least squares This function isdefined as follows:

Q(β, y) = (y − Xβ) > P W (y − Xβ) (8.30) Minimizing Q(β, y) with respect to β yields the estimator (8.29), as readers

are asked to show in Exercise 8.4

Identifiability and Consistency of the IV Estimator

In Section 6.2, we defined in (6.12) a k vector α(β) of deterministic functions

as the probability limits of the functions used in the moment conditions that

define an estimator, and we saw that the parameter vector β is asymptotically

identified if two asymptotic identification conditions are satisfied The first

condition is that α(β0) = 0, and the second is that α(β) 6= 0 for all β 6= β0.The analogous vector of functions for the IV estimator is

W > X , which was defined in (8.14), and S W > W was fined just after (8.17) For asymptotic identification, we assume that boththese matrices exist and have full rank This assumption is analogous to the

de-assumption that 1/n times the matrix X > X has probability limit S X > X, amatrix with full rank, which we originally made in Section 3.3 when we proved

that the OLS estimator is consistent If S W > W does not have full rank, then

at least one of the instruments is perfectly collinear with the others,

asymp-totically, and should therefore be dropped If S W > X does not have full rank,

Trang 12

then the asymptotic version of the moment conditions (8.28) has fewer than k

linearly independent equations, and these conditions therefore have no uniquesolution

If β0is the true parameter vector, then y − Xβ0= u, and the right-hand side

of (8.31) vanishes under the assumption (8.16) used to show the consistency

of the simple IV estimator Thus α(β0) = 0, and the first condition forasymptotic identification is satisfied

The second condition requires that α(β) 6= 0 for all β 6= β0 It is easy to seefrom (8.31) that

α(β) = S X > W (S W > W)−1 S W > X (β0− β).

For this to be nonzero for all nonzero β0− β, it is necessary and sufficient

that the matrix S X > W (S W > W)−1 S W > X should have full rank k This will

be the case if the matrices S W > W and S W > X both have full rank, as we

have assumed If l = k, the conditions on the two matrices S W > W and

S W > X simplify, as we saw when considering the simple IV estimator, to the

single condition (8.14) The condition that S X > W (S W > W)−1 S W > X has full

rank can also be used to show that the probability limit of 1/n times the IV criterion function (8.30) has a unique global minimum at β = β0, as readersare asked to show in Exercise 8.5

The two asymptotic identification conditions are sufficient for consistency.Because we are dealing here with linear models, there is no need for a sophis-ticated proof of this fact; see Exercise 8.6 The key assumption is, of course,(8.16) If this assumption did not hold, because any of the instruments wasasymptotically correlated with the error terms, the first of the asymptoticidentification conditions would not hold either, and the IV estimator wouldnot be consistent

Asymptotic Distribution of the IV Estimator

Like every estimator that we have studied, the IV estimator is ically normally distributed with an asymptotic covariance matrix that can

asymptot-be estimated consistently The asymptotic covariance matrix for the simple

IV estimator, expression (8.17), turns out to be valid for the generalized IV

estimator as well To see this, we replace W in (8.17) by the asymptotically optimal instruments P W X As in (8.25), we find that

X > P P W X X = X > P W X(X > P W X) −1 X > P W X = X > P W X,

from which it follows that (8.17) is unchanged if W is replaced by P W X.

It can also be shown directly that (8.17) is the asymptotic covariance matrix

of the generalized IV estimator From (8.29), it follows that

n 1/2( ˆβIV− β0) = (n −1 X > P W X) −1 n −1/2 X > P W u (8.32)

Trang 13

8.3 Instrumental Variables Estimation 321

Under reasonable assumptions, a central limit theorem can be applied to

the expression n −1/2 W > u, which allows us to conclude that the asymptotic

distribution of this expression is multivariate normal, with mean zero andcovariance matrix

lim

n→∞

1

− n W > E(uu > )W = σ20S W > W , (8.33) since we assume that E(uu > ) = σ2

0I With this result, it can be shown quitesimply that (8.17) is the asymptotic covariance matrix of ˆβIV; see Exercise 8.7

In practice, since σ2

0 is unknown, we used

Nevertheless, many regression packages divide by n − k instead of by n.

The choice of instruments will usually affect the asymptotic covariance matrix

of the IV estimator If some or all of the columns of ¯X are not contained in

the span S(W ) of the instruments, an efficiency gain is potentially available

if that span is made larger Readers are asked in Exercise 8.8 to demonstrate

formally that adding an extra instrument by appending a new column to W

will, in general, reduce the asymptotic covariance matrix Of course, it cannot

be made smaller than the lower bound σ2

0( ¯X > X)¯ −1, which is attained if theoptimal instruments ¯X are available.

When all the regressors can validly be used as instruments, we have ¯X = X,

and the efficient IV estimator coincides with the OLS estimator, as the Markov Theorem predicts

Gauss-Two-Stage Least Squares

The IV estimator (8.29) is commonly known as the two-stage least squares,

or 2SLS, estimator, because, before the days of good econometrics softwarepackages, it was often calculated in two stages using OLS regressions In the

first stage, each column x i , i = 1, , k, of X is regressed on W, if necessary.

If a regressor x i is a valid instrument, it is already (or should be) one of the

columns of W In that case, since P W x i = x i, no first-stage regression isneeded, and we say that such a regressor serves as its own instrument.The fitted values from the first-stage regressions, plus the actual values ofany regressors that serve as their own instruments, are collected to form the

matrix P W X Then the second-stage regression,

Trang 14

is used to obtain the 2SLS estimates Because P W is an idempotent matrix,

the OLS estimate of β from this second-stage regression is

ˆ

β2sls= (X > P W X) −1 X > P W y,

which is identical to (8.29), the generalized IV estimator ˆβIV

If this two-stage procedure is used, some care must be taken when estimatingthe standard error of the regression and the covariance matrix of the parameter

estimates The OLS estimate of σ2 from regression (8.35) is

to estimate σ2 Instead, it would use (8.37), or at least something that isasymptotically equivalent to it

Two-stage least squares was invented by Theil (1953) and Basmann (1957)

at a time when computers were very primitive Consequently, despite theclassic papers of Durbin (1954) and Sargan (1958) on instrumental variablesestimation, the term “two-stage least squares” came to be very widely used

in econometrics, even when the estimator is not actually computed in twostages We prefer to think of two-stage least squares as simply a particularway to compute the generalized IV estimator, and we will use ˆβIV rather thanˆ

β2sls to denote that estimator

8.4 Finite-Sample Properties of IV Estimators

Unfortunately, the finite-sample distributions of IV estimators are much morecomplicated than the asymptotic ones Indeed, except in very special cases,these distributions are unknowable in practice Although it is consistent, the

IV estimator for just identified models has a distribution with such thick tailsthat its expectation does not even exist With overidentified models, theexpectation of the estimator exists, but it is in general different from the trueparameter value, so that the estimator is biased, often very substantially so

In consequence, investigators can easily make serious errors of inference wheninterpreting IV estimates

Trang 15

8.4 Finite-Sample Properties of IV Estimators 323

The biases in the OLS estimates of a model like (8.10) arise because theerror terms are correlated with some of the regressors The IV estimatorsolves this problem asymptotically, because the projections of the regressors

on to S(W ) are asymptotically uncorrelated with the error terms However,

there will always still be some correlation in finite samples, and this causesthe IV estimator to be biased

Systems of Equations

In order to understand the finite-sample properties of the IV estimator, weneed to consider the model (8.10) as part of a system of equations Wetherefore change notation somewhat and rewrite (8.10) as

y = Zβ1+ Yβ2+ u, E(uu > ) = σ2I, (8.38) where the matrix of regressors X has been partitioned into two parts, namely,

an n × k1 matrix of exogenous and predetermined variables, Z, and an n × k2matrix of endogenous variables, Y, and the vector β has been partitioned conformably into two subvectors β1 and β2 There are assumed to be l ≥ k instruments, of which k1 are the columns of the matrix Z.

The model (8.38) is not fully specified, because it says nothing about how the

matrix Y is generated For each observation t, t = 1, , n, the value y t of

the dependent variable and the values Y t of the other endogenous variablesare assumed to be determined by a set of linear simultaneous equations The

variables in the matrix Y are called current endogenous variables, because they are determined simultaneously, row by row, along with y Suppose that

all the exogenous and predetermined explanatory variables in the full set of

simultaneous equations are included in the n × l instrument matrix W, of which the first k1 columns are those of Z Then, as can easily be seen by

analogy with the explicit result (8.09) for the demand-supply model, we have

for each endogenous variable y i , i = 0, 1, , k2, that

y i = Wπ i + v i , E(v i | W ) = 0 (8.39) Here y0 ≡ y, and the y i , for i = 1, , k2, are the columns of Y The π i are l vectors of unknown coefficients, and the v i are n vectors of error terms

that are innovations with respect to the instruments

Equations like (8.39), which have only exogenous and predetermined variables

on the right-hand side, are called reduced form equations, in contrast withequations like (8.38), which are called structural equations Writing a model

as a set of reduced form equations emphasizes the fact that all the endogenousvariables are generated by similar mechanisms In general, the error terms forthe various reduced form equations will display contemporaneous correlation:

If v ti denotes a typical element of the vector v i , then, for observation t, the reduced form error terms v ti will generally be correlated among themselves

and correlated with the error term u t of the structural equation

Trang 16

y = xβ0+ σ u u, x = wπ0+ σ v v, (8.40) analogously to (8.39) By explicitly writing σ u and σ v as the standard devia-

tions of the error terms, we can define the vectors u and v to be multivariate standard normal, that is, distributed as N (0, I) There is contemporaneous correlation of u and v, so that we have E(u t v t ) = ρ, for some correlation coefficient ρ such that −1 < ρ < 1 The result of Exercise 4.4 shows that the expectation of u t conditional on v t is ρv t , and so we can write u = ρv + u1,

where u1 has mean zero conditional on v.

In this simple, just identified, setup, the IV estimator of the parameter β is

ˆ

βIV = (w > x) −1 w > y = β0+ σ u (w > x) −1 w > u (8.41) This expression is clearly unchanged if the instrument w is multiplied by an arbitrary scalar, and so we can, without loss of generality, rescale w so that

w > w = 1 Then, using the second equation in (8.40), we find that

Let us now compute the expectation of this expression conditional on v Since,

by construction, E(u1| v) = 0, we obtain

E( ˆβIV− β0) = ρσ u

σ v

z

where we have made the definitions a ≡ π0 /σ v , and z ≡ w > v Given our

rescaling of w, it is easy to see that z ∼ N (0, 1).

If ρ = 0, the right-hand side of (8.42) vanishes, and so the unconditional

expectation of ˆβIV− β0 vanishes as well Therefore, in this special case, ˆβIV

is unbiased This is as expected, since, if ρ = 0, the regressor x is uncorrelated with the error vector u If ρ 6= 0, however, (8.42) is equal to a nonzero factor times the random variable z/(a + z) Unless a = 0, it turns out that this

random variable has no expectation To see this, we can try to calculate it

Trang 17

8.4 Finite-Sample Properties of IV Estimators 325

where, as usual, φ(·) is the density of the standard normal distribution It is

a fairly simple calculus exercise to show that the integral in (8.43) diverges in

the neighborhood of x = −a.

If π0 = 0, then a = 0 In this rather odd case, x = σ v v is just noise, as though

it were an error term Therefore, since z/(a + z) reduces to 1, the expectation

exists, but it is not zero, and ˆβIV is therefore biased

When a 6= 0, which is the usual case, the IV estimator (8.41) is neither biased nor unbiased, because it has no expectation for any finite sample size n This

may seem to contradict the result according to which ˆβIV is asymptoticallynormal, since all the moments of the normal distribution exist However,the fact that a sequence of random variables converges to a limiting ran-

dom variable does not necessarily imply that the moments of the variables

in the sequence converge to those of the limiting variable; see Davidson andMacKinnon (1993, Section 4.5) The estimator (8.41) is a case in point For-tunately, this possible failure to converge of the moments does not extend tothe CDFs of the random variables, which do indeed converge to that of the

limit Consequently, P values and the upper and lower limits of confidence

intervals computed with the asymptotic distribution are legitimate mations, in the sense that they become more and more accurate as the samplesize increases

approxi-A less simple calculation can be used to show that, in the overidentified case,

the first l − k moments of ˆ βIV exist; see Kinal (1980) This is consistentwith the result we have just obtained for an exactly identified model, where

l − k = 0, and the IV estimator has no moments at all When the mean of

ˆ

βIV exists, it is almost never equal to β0 Readers will have a much cleareridea of the impact of the existence or nonexistence of moments, and of thebias of the IV estimator, if they work carefully through Exercises 8.10 to 8.13,

in which they are asked to generate by simulation the EDFs of the estimator

in different situations

The General Case

We now return to the general case, in which the structural equation (8.38)

is being estimated, and the other endogenous variables are generated by the

reduced form equations (8.39) for i = 1, , k2, which correspond to the stage regressions for 2SLS We can group the vectors of fitted values from

first-these regressions into an n × k2 matrix P W Y The generalized IV

estima-tor is then equivalent to a simple IV estimaestima-tor that uses the instruments

P W X = [Z P W Y ] By grouping the l vectors π i , i = 1, , k2 into an

l × k2 matrix Π2 and the vectors of error terms v i into an n × k2 matrix V2,

we see that

P W X = [Z P W Y ] = [Z P W (WΠ2+ V2)]

= [Z WΠ2+ P W V2] = WΠ + P W V (8.44)

Trang 18

Here V is an n × k matrix of the form [O V2], where the zero block has

dimension n × k1, and Π is an l × k matrix, which can be written as Π = [Π1 Π2], where the l × k1 matrix Π1 is a k1× k1 identity matrix sitting on

top of an (l − k1) × k1 zero matrix It is easily checked that these definitions

make the last equality in (8.44) correct Thus P W X has two components:

WΠ, which by assumption is uncorrelated with u, and P W V, which will

almost always be correlated with u.

If we substitute the rightmost expression of (8.44) into (8.32), eliminating the

factors of powers of n, which are unnecessary in the finite-sample context, we

If V = O, the supposedly endogenous variables Y are in fact exogenous or

predetermined, and it can be checked (see Exercise 8.14) that, in this case,ˆ

βIV is just the OLS estimator for model (8.10)

If V is not zero, but is independent of u, then we see immediately that the expectation of (8.45) conditional on V is zero This case is the analog of the case with ρ = 0 in (8.42) Note that we require the full independence of V and u for this to hold If instead V were just predetermined with respect

to u, the IV estimator would still have a finite-sample bias, for exactly the

same reasons as those leading to finite-sample bias of the OLS estimator withpredetermined but not exogenous explanatory variables

When V and u are contemporaneously correlated, it can be shown that all the terms in (8.45) which involve V do not contribute asymptotically; see

Exercise 8.15 Thus we can see that any discrepancy between the sample and asymptotic distributions of ˆβIV− β0 must arise from the terms

finite-in (8.45) that finite-involve V In fact, finite-in the absence of other features of the model

that could give rise to finite-sample bias, such as lagged dependent variables,the poor finite-sample properties of the IV estimator arise solely from the

contemporaneous correlation between P W V and u In particular, the second

term in the second factor of (8.45) will generally have a nonzero mean, and

this term can be a major source of bias when the correlation between u and some of the columns of V is high.

If the terms involving V in (8.45) are relatively small, the finite-sample

distri-bution of the IV estimator is likely to be well approximated by its asymptoticdistribution However, if these terms are not small, the asymptotic approxi-mation may be poor Thus our analysis suggests that there are three situations

in which the IV estimator is likely to have poor finite-sample properties

Trang 19

8.4 Finite-Sample Properties of IV Estimators 327

• When l, the number of instruments, is large, W will be able to explain

much of the variation in V ; recall from Section 3.8 that adding additional regressors can never reduce the R2 of a regression With large l, conse- quently, P W V will be relatively large When the number of instruments

is extremely large relative to the sample size, the first-stage regressions

may fit so well that P W Y is very similar to Y In this situation, the

IV estimates may be almost as biased as the OLS ones

• When at least some of the reduced-form regressions (8.39) fit poorly,

in the sense that the R2 is small or the F statistic for all the slope

coefficients to be zero is insignificant, the model is said to suffer from

weak instruments In this situation, even if P W V is no larger than usual,

it may nevertheless be large relative to WΠ When the instruments are

very weak, the finite-sample distribution of the IV estimator may be veryfar from its asymptotic distribution even in samples with many thousands

of observations An example of this is furnished by the case in which a = 0

in (8.42) in our simple example with one regressor and one instrument

As we saw, the distribution of the estimator is quite different when a = 0 from what it is when a 6= 0; the distribution when a ∼= 0 may well be

similar to the distribution when a = 0.

• When the correlation between u and some of the columns of V is very

high, V > P W u will tend to be relatively large Whether it will be large

enough to cause serious problems for inference will depend on the samplesize, the number of instruments, and how well the instruments explainthe endogenous variables

It may seem that adding additional instruments will always increase the sample bias of the IV estimator, and Exercise 8.13 illustrates a case in which

finite-it does In that case, the addfinite-itional instruments do not really belong in thereduced-form regressions However, if the instruments truly belong in thereduced-form regressions, adding them will alleviate the weak instrumentsproblem, and that can actually cause the bias to diminish

Finite-sample inference in models estimated by instrumental variables is asubject of active research in econometrics Relatively recent papers on thistopic include Nelson and Startz (1990a, 1990b), Buse (1992), Bekker (1994),Bound, Jaeger, and Baker (1995), Dufour (1997), Staiger and Stock (1997),Wang and Zivot (1998), Zivot, Startz, and Nelson (1998), Angrist, Imbens,and Krueger (1999), Blomquist and Dahlberg (1999), Donald and Newey(2001), Hahn and Hausman (2002), Kleibergen (2002), and Stock, Wright,and Yogo (2002) There remain many unsolved problems

Trang 20

8.5 Hypothesis Testing

Because the finite-sample distributions of IV estimators are almost neverknown, exact tests of hypotheses based on such estimators are almost neveravailable However, large-sample tests can be performed in a variety of ways.Since many of the methods of performing these tests are very similar to meth-ods that we have already discussed in Chapters 4 and 6, there is no need todiscuss them in detail

Asymptotic t and Wald Statistics

When there is just one restriction, the easiest approach is simply to compute

an asymptotic t test For example, if we wish to test the hypothesis that

β i = β 0i , where β i is one of the regression parameters, then a suitable teststatistic is

t β i = ¡ βˆi − β i0

d

where ˆβ i is the IV estimate of β i, and dVar( ˆβ i ) is the ith diagonal element

of the estimated covariance matrix, (8.34) This test statistic will not follow

the Student’s t distribution in finite samples, but it will be asymptotically distributed as N (0, 1) under the null hypothesis.

For testing restrictions on two or more parameters, the natural analog of

(8.47) is a Wald statistic Suppose that β is partitioned as [β1 β2], and we

wish to test the hypothesis that β2 = β20 Then, as in (6.71), the appropriateWald statistic is

W β2 = ( ˆβ2− β20)>¡dVar( ˆβ2)¢−1( ˆβ2− β20), (8.48)

where dVar( ˆβ2) is the submatrix of (8.34) that corresponds to the vector β2

This Wald statistic can be thought of as a generalization of the asymptotic t statistic: When β2 is a scalar, the square root of (8.48) is (8.47)

The IV Variant of the GNR

In many circumstances, the easiest way to obtain asymptotically valid teststatistics for models estimated using instrumental variables is to use a variant

of the Gauss-Newton regression For the model (8.10), this variant, called theIVGNR, takes the form

y − Xβ = P W Xb + residuals (8.49)

As with the usual GNR, the variables of the IVGNR must be evaluated at

some prespecified value of β before the regression can be run, in the usual

way, using ordinary least squares

The IVGNR has the same properties relative to model (8.10) as the ordinaryGNR has relative to linear and nonlinear regression models estimated by least

Ngày đăng: 04/07/2014, 15:20

TỪ KHÓA LIÊN QUAN