Specification ErrorWhen constructing any regression model, we are always most interested in explaining what variables cause the dependent variable to change and by how much.. In other wo
Trang 1Specification Error
When constructing any regression model, we are always most interested in explaining what variables cause the dependent variable to change and by how much This will always depend on a combination of economic theory; basic human behavior; and past experience
One of the assumptions of OLS is that the model is correctly specified The specification error can be explained by these two aspects :
-a) Missing / omitting relevant information / explanatory variables or from including irrelevant variables
b) Incorrect functional form
This lecture will discuss the following issues : which regressors should be included and / or excluded from a particular model In other words, we will consider the following cases :
-a) A regression model that excludes some important explanatory variables
b) A regression model that includes some irrelevant regressors
1) Exclusion of relevant variables
Suppose that we are interested in the following model :
1 2 2 K K 1 1 K L K L i
Y = β + β X + L + β X + β + X + + L + β + X + + ε
The question is whether the set of L regressors - X( K + 1 ) + L + X( K + L ) - are important variables that should be included in the model
But because of a certain reason, we have to use the following model :
For illustration, we can use a model with only two explanatory variables The model with two explanatory variables is specified as follows :
-True model Y i = β 1 + β 2 X 2 i + β 3 X 3 i + ε i 9.1
Note: we assumed that X 2 and X 3 are the two important regressors that explain the dependent variable Y, that is, we expect that β3# 0 The model we use to estimate is as follows :
This means we have excluded an important regressor X3i
Trang 2∑
i 2
i 2i 2
x
Y x
Recall the lecture of Prof Motahar in calculating the coefficient for regressor X 2
Important consequences of excluding important explanatory variables
a) E βˆ2 ≠ β2 and E βˆ2 = β2 if only if COV(X 2,X 3 ) = 0
To calculate the mathematical expectation of this estimate, we must substitute Yi with the formula for the true model, since our true model is 9.1 :
=
∑
∑
2 i 2
i i 3 3 i 2 2 1 i 2i 2
x
ε X β X β β Y x E βˆ
i 2
i 3 2i 3 2 2
x
X x β β βˆ
2i 3 2i 3
We can easily prove 9.5 and its numerator COV(X 2,X 3 )
b) βˆ2 is no longer explained as the direct effect (net) on the dependent variable Y
Notice that when omitting relevant variables, the estimated coefficient of the explanatory variable is insignificant in explaining the direct effect (net) on the dependent variable We prove this as follows :
-Recall the simple regression of Prof Motahar in defining the slope of
i i 2 2 1
i β β X ε
2i
2
x i
i
Y
x
β∧ =∑
So, if the simple regression is X3i = β1 + β22X2i + εi the coefficient of X2 can also be defined by the expression, in which,the estimator is :
-2i 3
2
i
X
x
β∧ = ∑
Trang 3This coefficient is the direct effect of X2 on X3
2
i
x
β β β ε
β∧ = = + + +
∑
∑
+ + +
= ∑ = ∑ + ∑ + ∑ +∑
Now notice that ∑
=
n
1 i
x = n (X X)
1 i
∑
=
=
=
= n
1
2 i n
1 i i
x X
∑
∑
=
=
n
1
2
i
n
1
i
i
x
X
x
=1
Thus,
i 1 i 1 i 1
1 1 1
ε
β β = β = =
= ∑ + ∑ +∑
And we also have :
-n
i 1
i i
x ε n COV X ε
=
have : - βˆ 2 = β2+β β3 ∧22
9.10
Important meanings :
Gross effect of X2 on Y in the model, βˆ 2 equals the direct effect of X2 on Y (that
is, β∧2 in the true model) plus the indirect effect of X2 on Y (that is,β β3. ∧22)
Thus, the estimated coefficient in the regression without X3 (and assuming that this variable is relevant), so then βˆ 2 is insignificant in explaining a direct effect (net)
on Y
We can graphically illustrate this and address some examples
Trang 4This regression shows that HOUSING is explained quite well through GNP and INT.RATE If we temporarily assume that this is the true model, we then regress HOUSING against GNP
Trang 5We can conclude that this model excluded an important explanatory variable - INT.RATE (Observe how the coefficient of determination; the coefficient of GNP; and the standard error of the estimator of GNP change)
Conduct another regression : INT.RATE on GNP
Trang 6Based on these three regression results, let us now consider what we have studied in 9.10
c) Variance of the estimate of the coefficient in the model is biased and thus tests on this hypothesis are invalid
( ) 2
2
1 ˆ
i
VAR
x
=
but because β3 # 0 and since we have assumed that X3 is an important and relevant factor in explaining Y, then :
2 23
2 i 2
r x
1
βˆ
VAR
∑
9.11 is the variance in the estimated model and 9.12 is the variance when we assume β3# 0
It is obvious that :
-( ) 2
2
1 ˆ
i
VAR
x
=
2 23
2 i 2
r x
1
βˆ VAR
∑
Trang 7Therefore, the standard error of the estimator β∧2 will be inaccurate (unstable, or biased), and thus the use of its standard error is inaccurate, too As a result, any hypotheses testing will be invalid From looking at the regression results, we will easily see that
For caution, we use the Wald test for a restricted model (an estimated model) and for an unrestricted model (a true model), based on the hypothesis that β3= 0
2 Including irrelevant variables
To analyze this case, we return again to the two-regressor model, only this time
we assume that X3 does not relate to Y (that is β3 = 0) In other words, X3 is irrelevant
True model Y i = β 1 + β 2 X 2 i + ε i
Estimated model Y i = β 1 + β 2 X 2 i + β 3 X 3 i + ε i
The estimated model has the following criteria :
-a) Estimators of other coefficients (except X3) are unbiased and consistent
Again, if we take the estimated coefficients and calculate their expectations :
Trang 8-( ) ) ( ) ( )
i 3 2i
2 3i
2 i 2
i 3 2i i
3 i
2 3i i
2 i 2
x x x
x
x x x
Y -x x
Y
βˆ
∑
∑
∑
Then substitute the true model for Yi and do some manipulation :
( )( ) ( ) ( ( ) )( ) ( ) ( )( )2 )
i 3 2i
2 3i
2 i 2
i 3 2i i
3 i
2 3i i
2 i 2
i 3 2i
2 3i
2 i 2
2 i 3 2i 2
2 3i
2 i 2 2
2
x x x
x
x x x
ε -x x
ε x
x x
x
x x β -x x
β
βˆ
∑
∑
∑
∑
∑
Clearly, the first term is β 2 and the second term zero expectation, so the estimator
is unbiased
From looking at the second term of expression 9.16 we can find that :
2
( / )( / ) - ( / )( / )
Since, as n is larger, then (Sεx2 / )n and (Sεx3/ )n converge to COV (ε, X) = 0 Hence, we find that this estimator has consistency
Now consider the coefficient of estimator for the variable that has been inappropriately included : -
i 3 2i
2 3i
2 i 2
i 3 2i i
2 i
2 2i i
3 i 3
x x x
x
x x x
Y -x x
Y
βˆ
∑
∑
∑
=
Again, substitute the true model for Yi and do some manipulation : -
i 3 2i
2 3i
2 i 2
i 3 2i i
2 i
2 2i i
3 i 3
x x x
x
x x x
ε -x x
ε
βˆ
∑
∑
∑
+
=
The expectation for this estimator is zero
b) Variances for the estimators are higher than for those excluding irrelevant variables, so those estimators are inefficient because the variance is not minimal See expression 9.14
c) Variances of the estimators are unbiased so hypothesis testing is still valid
In conclusion : We find that when we include irrelevant variables, we get unbiased estimators for all of the coefficients, but the cost is that the minimum variances are larger than they would otherwise be
Trang 9For example, for including irrelevant variables in the equation, we can add two more, such as population POP and unemployment UNEMP into the model :
-Now examine the regression results, especially for the two new variables
Since we assume that the two new variables are irrelevant, we are going to do the Wald test on these
Trang 10
3) General – to – Simple Modeling Strategy
The results that we have just established suggest that the general-to-simple modeling strategy is superior to the simple-to-general strategy The steps are as follows :
-[ Use economic theory, previous research, and experience to specify a
general model (in this case “general” means a model that includes all possible relevant regressors)
[ If any of the coefficients are statistically insignificant, omit the least
significant one and re-estimate Variables are eliminated one-by-one because of the effect of the elimination on the remaining variables If the first regression shows two insignificant variables, and the least significant one is then omitted, this may increase the significance of the remaining one
[ From using the Wald Tests to test the final model (the restricted model),
compare against the initial general model (the unrestricted model)
4) An application of modelling Strategy