The omission of a relevant explanatory variable causes the regression coefficients to be biased and the standard errors to be invalid.. OMISSION OF A RELEVANT VARIABLE If we estimate th
Trang 1MODEL SPECIFICATIONS
Dr Tu Thuy Anh Faculty of International Economics
Trang 2 Model misspecification:
Omitting relevant variables
Including irrelevant variables
Function specification: Ramsey test
Structural change: the Chow test
2
Trang 3Consequences of Variable Misspecification
X
Y b1 b2 2 b3 3
u X
Y b1 b2 2
3 3
2 2 1
ˆ
X b
X b b
ˆ b b X
Y
To keep the analysis simple, we will assume that there are only two
possibilities Either Y depends only on X2, or it depends on both X2 and X3
OMISSION OF A RELEVANT VARIABLE
Trang 4Consequences of Variable Misspecification
X
Y b1 b2 2 b3 3
u X
Y b1 b2 2
3 3
2 2 1
ˆ
X b
X b b
assumptions are valid
Likewise we will not encounter any problems if Y depends on both X2 and X3and we fit the multiple regression
OMISSION OF A RELEVANT VARIABLE
Trang 5Consequences of Variable Misspecification
X
Y b1 b2 2 b3 3
u X
Y b1 b2 2
3 3
2 2 1
ˆ
X b
X b b
ˆ b b X
Y Correct specification, no problems
Correct specification,
no problems
In this sequence we will examine the consequences of fitting a simple
regression when the true model is multiple The omission of a relevant
explanatory variable causes the regression coefficients to be biased and the standard errors to be invalid
Coefficients are biased (in general) Standard errors are invalid.
OMISSION OF A RELEVANT VARIABLE
Trang 6u X
3 3
2 2
3 2
2)
(
X X
X X
X
X b
E
i
i i
b b
b
2
b
3
The strength of the proxy effect depends on two factors: the strength of the
effect of X3 on Y, which is given by b3, and the ability of X2 to mimic X3
The ability of X2 to mimic X3 is determined by the slope coefficient obtained
when X3 is regressed on X2, the term highlighted in yellow
Y Y
X
X b
i
i i
OMISSION OF A RELEVANT VARIABLE
Trang 7OMISSION OF A RELEVANT VARIABLE
If we estimate the restricted (bad) model
instead of the right (unrestricted) one, then
the estimate of parameter B2 will be biased,
and the bias will depend on:
The magnitude of the omitted parameter (B3)
The correlation between the included and the omitted variables (X2 and X3)
R2 is affected
will be affected more
7
Trang 9Example: Teaching Ratings
Correlation Coefficients, using the observations 1 - 463
5% critical value (two-tailed) = 0,0911 for n = 463
minority age female onecredit beauty
Trang 10Example: Teaching Ratings
Model 1: OLS, using observations 1-463
Dependent variable: course_eval
coefficient std error t-ratio p-value -
-Mean dependent var 3,998272 S.D dependent var 0,554866
Sum squared resid 120,0996 S.E of regression 0,513766
R-squared 0,155647 Adjusted R-squared 0,142657
F(7, 455) 11,98202 P-value(F) 4,67e-14
Log-likelihood -344,5811 Akaike criterion 705,1623
Schwarz criterion 738,2641 Hannan-Quinn 718,1936 10
Trang 11Example: Teaching Ratings
Model 3: OLS, using observations 1-463
Dependent variable: course_eval
coefficient std error t-ratio p-value -
-Mean dependent var 3,998272 S.D dependent var 0,554866
Sum squared resid 126,6494 S.E of regression 0,527010
R-squared 0,109599 Adjusted R-squared 0,097883
F(6, 456) 9,354823 P-value(F) 1,09e-09
Log-likelihood -356,8740 Akaike criterion 727,7480
Schwarz criterion 756,7121 Hannan-Quinn 739,1504
11
Trang 12Consequences of Variable Misspecification
X
Y b1 b2 2 b3 3
u X
Y b1 b2 2
3 3
2 2 1
ˆ
X b
X b b
INCLUSION OF AN IRRELEVANT VARIABLE
Including irrelevant variables: The effects are different from those of omitted variable misspecification In this case the coefficients in general remain
unbiased, but they are inefficient The standard errors remain valid, but are needlessly large
Coefficients are unbiased (in general), but inefficient.
Standard errors are valid (in general)
Trang 13u X
Y b1 b2 2
3 3 2
2 1
Y
u X
X
Y b1 b2 2 0 3
INCLUSION OF AN IRRELEVANT VARIABLE
Rewrite the true model adding X3 as an explanatory variable, with a
coefficient of 0 Now the true model and the fitted model coincide Hence b2
will be an unbiased estimator of b2 and b3 will be an unbiased estimator of 0
However, the variance of b2 will be larger than it would have been if the
correct simple regression had been run because it includes the factor 1 / (1 –
r2), where r is the correlation between X2 and X3
The standard errors remain valid, but they will tend to be larger than those obtained in a simple regression, reflecting the loss of efficiency
,
2 2 2
2 2
3 2
2
1
1
X X i
u b
r X
Trang 14Example: Teaching Ratings
Model 9: OLS, using observations 1-463
Dependent variable: course_eval
coefficient std error t-ratio p-value -
Mean dependent var 3,998272 S.D dependent var 0,554866
Sum squared resid 120,1052 S.E of regression 0,513214
R-squared 0,155608 Adjusted R-squared 0,144497
F(6, 456) 14,00557 P-value(F) 1,19e-14
Log-likelihood -344,5919 Akaike criterion 703,1838
Schwarz criterion 732,1479 Hannan-Quinn 714,5861
14
Trang 15Example: Teaching Ratings
Model 9: OLS, using observations 1-463
Dependent variable: course_eval
coefficient std error t-ratio p-value -
Mean dependent var 3,998272 S.D dependent var 0,554866
Sum squared resid 120,1052 S.E of regression 0,513214
R-squared 0,155608 Adjusted R-squared 0,144497
F(6, 456) 14,00557 P-value(F) 1,19e-14
Log-likelihood -344,5919 Akaike criterion 703,1838
Schwarz criterion 732,1479 Hannan-Quinn 714,5861
15
Trang 16Example: Teaching Ratings
Model 1: OLS, using observations 1-463
Dependent variable: course_eval
coefficient std error t-ratio p-value -
Mean dependent var 3,998272 S.D dependent var 0,554866
Sum squared resid 120,0996 S.E of regression 0,513766
R-squared 0,155647 Adjusted R-squared 0,142657
F(7, 455) 11,98202 P-value(F) 4,67e-14
Log-likelihood -344,5811 Akaike criterion 705,1623
Schwarz criterion 738,2641 Hannan-Quinn 718,1936 16
Trang 17Function specification
Why the model has to be linear?
The effects of a bad specification for our
functional form would as serious as the omitted Variables problem: biased estimates
Ramsey (1969) proposes a test with the
following hypothesis:
H0: Right functional form
Ha: Mistaken functional form
17
Trang 18Ramsey RESET test
Original model
We may suspect that some variables should be
introduced non-linearly (e.g in quadratic or
cubic terms) To detect it the RESET test
proposes using powers of the adjusted
endogenous variables from the original
equation into an auxiliary regression:
Hypothesis H0: 1=0 versus Ha: 10
If more terms involved, possible F tests
18
Trang 19Example: Teaching Ratings
RESET test for specification (squares and cubes) Test statistic: F = 1,821350,
Trang 20Structural change
What if parameters are NOT constant over the
sample structural change
Male vs female
ASEAN vs non-ASEAN membership
HQ class vs regular class
If we do not control for structural changes, our
predictions will not be reliable, our estimation will be inefficient and biased for the not
controlled parameter.
20
Trang 21The Chow test
We assume that we may have two subsamples:
Compute the sum of squares of the residuals
for every model (SSRT, SSR1 and SSR2)
Finally we compute the following test:
21
Trang 22Example: Teaching Ratings
coefficient std error t-ratio p-value