Endogeneity refers to the fact that an independent variable IV included in the model is a choice variable not exogenous Structure E[ ] =... An independent variable IV included in the
Trang 1Dr Pham Thi Bich Ngoc
Hoa Sen University
ngoc.phamthibich@hoasen.edu.vn
1
Trang 2 Endogeneity refers to the fact that an independent variable (IV) included in the model is a choice
variable (not exogenous)
Structure E[ ] =
Trang 3 Omitted variable bias
Measurement error
Simultaneity
3
Trang 4Omitting a variable (X2) creates a bias only if:
1 X2 is an explanator of Y (so, when omitted, it
becomes a component of the error term)
2 X2 is correlated with X1 (so that X2 creates a
correlation between X1 and the error term).
Trang 5be an increase in The estimate of picks up
the effect of and the hidden effect of
Trang 6 Measurement error also induces a
correlation between our included
explanator and the error term.
Instead of observing Xi , we observe X*
Trang 8 An independent variable (IV) included in the model
is a choice variable, potentially affected by the
dependent variable (DV)
Examples:
◦ IV = Exports; DV = GDP
◦ IV = education; DV = income
Given: both X and Y are jointly determined
Because X and Y are determined simultaneously, X
can adjust in response to shocks to Y ()
Thus X will be correlated with
Trang 9 The classic example of simultaneous causality in
economics is supply and demand.
Both prices and quantities adjust until supply and demand are in equilibrium.
A shock to demand or supply causes BOTH prices and
quantities to move.
Thus, any attempt to estimate the relationship between
prices and quantities (say, to estimate a demand elasticity) suffers from SIMULTANEITY BIAS.
Econometricians have a frequent interest in estimating
elasticities resulting from such an equilibrium process
Simultaneity bias is a MAJOR problem.
9
Trang 10 Suppose that Y2it is an endogenous explanatory
variable:
◦ Y1it = a0 + a1 Y2it + a2 Xit + uit (1)
◦ Y2it = b0 + b1 Xit + b2 Zit + vit (2)
Equations (1) and (2) have a “triangular” structure
Given this triangular structure, the OLS estimate of a1
in equation (1) is unbiased only if vit is uncorrelated with uit
If vit is correlated with uit, then Y2it is correlated
with uit which means that the OLS estimate of a1
would be biased
To avoid this bias, we must estimate equation (1)
“instrumental variables” (IV) regression rather than OLS
Endogeneity bias
Trang 11 Instrumental Variables (IV) estimation is used when your model has endogenous x’s
That is, whenever Cov(x,u) ≠ 0
Thus, IV can be used to address the problem
of omitted variable bias
Additionally, IV can be used to solve the
classic errors-in-variables problem
Trang 12 Suppose that Y2it is an endogenous
explanatory variable:
◦ Y1it = a0 + a1 Y2it + a2 Xit + uit (1)
◦ Y2it = b0 + b1 Xit + b2 Zit + vit (2)
A Triangle Relationship:
Trang 13 Substituting eq (2) into eq (1):
◦ Y1it = a0 + a1 (b0 + b1 Xit + b2 Zit + vit) + a2 Xit + uit (3)
◦ All the explanatory variables (Xit and Zit) are
exogenous
The basic idea underlying IV regression is to
remove vit from the Y1it model so that our
estimate of a1 is unbiased.
Instrumental Variable:
13
Trang 14 Note that vit is removed from the Y1it model
if we use the predicted rather than the
actual values of Y2it on the right hand side
◦ Y1it = a0 + a1 (b0^ + b1^ Xit + b2^ Zit ) + a2 Xit + uit (4)
The a1 estimate is biased in eq (3) but it is
unbiased in eq (4) because the vit term has been removed.
Z : instrumental variable
Instrumental Variable:
Trang 15 Instrument Relevance: The instrument must
be correlated with the endogenous variable
bien cong cu z phai quan he chat vôi bien bi noi sinh
BIEN CONG CU Z PHAI DOC LAP VOI U
BIEN Y2 CO NOI SINH KO
Trang 16 Two stage least square?
- Stage 1: Regress eq (2)
Trang 17 Using the ivregress command
◦ The models can be estimated using two-stage least squares (2SLS), limited-information maximum
likelihood (LIML) or generalized method of
ivreg2 depvar [varlist1] (varlist2=varlist_iv)
[weight] [if exp] [in range] [, options]
17
Trang 18 The most-up-to-date implementation of
ivreg2 requires Stata version 11 or later.
data
varlist1 are the exogenous regressors or "included
instruments"
varlist_iv are the exogenous variables excluded from
the regression or "excluded instruments"
varlist2 the endogenous regressors that are being
"instrumented"
Trang 19Used for panel data:
ssc install xtivreg2
xtivreg2 depvar [varlist1] (varlist2=varlist_iv)
19
Trang 20 IV estimation can be extended to the
multiple regression case
Call the model we are interested in
estimating the structural model
Our problem is that one or more of the
variables are endogenous
We need an instrument for each endogenous variable
Trang 21 If there is just one instrument for our
endogenous variable, we can’t test whether
the instrument is uncorrelated with the error
We say the model is just identified
If we have multiple instruments, it is possible
to test the overidentifying restrictions – to
see if some of the instruments are correlated with the error
Trang 22 In the instrumental variable regression, if we have multiple endogenous regressors x1, …,
xk and multiple instruments z1, …, zm, the
coefficients on the endogenous
regressors are said to be:
Exactly identified if m = k
Overidentified if m > k
Underidentified if m < k can not identify the coefficients.
Trang 23 The Sargan-Hansen test is a test of overidentifyingrestrictions.
The joint null hypothesis (H0) is that the instruments arevalid instruments, i.e., uncorrelated with the error term,and that the excluded instruments are correctly excludedfrom the estimated equation
Under the null, the test statistic is distributed as squared in the number of (L-K) overidentifyingrestrictions
chi- A rejection casts doubt on the validity of the instruments
If p-value >5% instruments are valid
23
Trang 24 For the 2SLS estimator, the test statistic is Sargan's
statistic, typically calculated as N*R-squared from a
regression of the IV residuals on the full set of instruments
Under the assumption of conditional homoskedasticity,
Hansen's J statistic becomes Sargan's statistic The J
statistic is consistent in the presence of heteroskedasticityand (for HAC-consistent estimation) autocorrelation;
Sargan's statistic is consistent if the disturbance is
homoskedastic and (for AC-consistent estimation) if it is
also autocorrelated With robust, bw and/or cluster,
Hansen's J statistic is reported
Trang 25 Endogeneity tests of one or more endogenous regressors
can implemented using the endog option
Under the null hypothesis (H0) that the specified
endogenous regressors can actually be treated as
exogenous, the test statistic is distributed as chi-squared with degrees of freedom equal to the number of regressorstested
Unlike the Durbin-Wu-Hausman tests reported by ivendog, the endog option of ivreg2 can report test statistics that are robust to various violations of conditional homoskedasticity
If p-value <5% endogenous regressors
25
Trang 26 The underidentification test is an LM test of whether the equation is identified, i.e., that the excluded instruments are "relevant", meaning correlated with the endogenous regressors
Under the null hypothesis (H0) that the equation is
underidentified A rejection of the null indicates that the
matrix is full column rank, i.e., the model is identified
If p-value <5% instruments are relevant
Trang 27Dependent Variable: LWAGE=Log of wage
EXP =Work experience,
WKS =Weeks worked,
OCC =Occupation, 1 if blue collar,
IND =1 if manufacturing industry,
SOUTH =1 if resides in south,
SMSA =1 if resides in a city (SMSA),
Trang 28LWAGE=β1+ β2 EXP + β3 EXPsq + β4OCC + β5 SOUTH + β6 SMSA + β7 WKS + ε
Weeks worked (WKS) is believed to be endogenous
(1) use 1 instrumental variable: the Marital Status dummy
variable (MS)
(2) use 2 instrumental variables: the Marital Status dummy
variable (MS) and dummy variable (BLK)
Trang 32Please replicate for the second case!