The linear regression model II — departures from the assumptions underlying the statistical GM In the previous chapter we discussed the specification of the linear regression model as w
Trang 1The linear regression model II — departures from the assumptions underlying the
statistical GM
In the previous chapter we discussed the specification of the linear regression model as well as its statistical analysis based on the underlying eight standard assumptions In the next three chapters several departures from [1]-[8] and their implications will be discussed The discussion differs somewhat from the usual textbook discussion (see Judge et al (1982)) because of the differences in emphasis in the specification of the model
In Section 20.1 the implications of having E(y,/o(X,)) instead of E(y,/X;=X,) as the systematic component are discussed Such a change gives rise to the stochastic regressors model which as a statistical model shares some features with the linear regression model, but the statistical inference related to the statistical parameters of interest 0 is somewhat different The statistical parameters of interest and their role in the context
of the statistical GM is the subject of Section 20.2 In this section the so- called omitted variables bias problem is reinterpreted as a parameters of interest issue In Section 20.3 the assumption of exogeneity is briefly considered The cases where a priori exact linear and non-linear restrictions
on @ exist are discussed in Section 20.4 Estimation as well as testing when such information is available are considered Section 20.5 considers the concept of the rank deficiency of X known as collinearity and its implications The potentially more serious problem of ‘near collinearity’ is the subject of Section 20.6 Both problems of collinearity and near collinearity are interpreted as insufficient data information for the analysis
of the parameters of interest It is crucially important to emphasise at the outset that the discussion of the various departures from the assumptions underlying the statistical GM which follows assumes that the probability
412
Trang 220.1 The stochastic linear regression model 413
and sampling models remain valid and unchanged This assumption is needed because when the probability and/or the sampling model change the whole statistical model requires respecifying
20.1 The stochastic linear regression model
The first assumption underlying the statistical GM is that the systematic component is defined as
to which X,, isa random variable By construction o(X ,,)< ¥ The o-field generated by X,=(X,,, X2, , Xx,) 18 defined to be
uk = Ely, /o(X,))= 64 2893 X, (20.6)
(see Chapter 15) Using (6) we can define the statistical GM:
Trang 3where the parameters of interest are Ø=(ÿ,ø?), ÿ=E;jø;,, ø?=
Øịi—Ø;;E;;ø¿¡ The random vectors (X;, X¿, , X;Y= are assumed
to satisfy the rank condition, rank(#)=k for any observed value of # The
error term defined by
satisfies the following properties:
Given that X, in the statistical GM is a random vector, intuition suggests that the probability model underlying (7) should come in the form of the joint distribution D(s,,X,; w) We need, however, a form of this distribution which involves the parameters of interest directly Such a form is readily available using the equality
D(y,,X5 W) = Diy/X,; Wy) D(X; Wo), (20.12)
with Ø=(ÿ,ø”) being a parametrisation of w, This suggests that the probability model underlying (7) should take the form
D= {D(y,/X,; 8): D(X, Wz), 0=(B, 0?) Rx R,,teT}, (20.13) where
(Z,, Z,, , Zz) isa random sample from D(Z,;), t=1,2, , T
Trang 420.1 The stochastic linear regression model 415
respectively, where as usual
z={ yt )
Xr
If we collect all the above components together we can specify the stochastic linear regression model as follows:
The statistical GM: y,= BX%,+u,, te T
LI] u= EUy,/ø(X,)) and u,= y,— Ely,/a(X,))
[21 0=(B,0°), =E;7ø¿, and a*=06,,—6,2%5;6,, are the
The sampling model
[8] (Z,, Z., , Zr) is a random sample from D(Z,; w), t= 1, 2, , T,
respectively
The assumption related to the weak exogeneity of X, with respect to Ø for t=1,2, , T shows clearly that the concept is related only to inference on 8 and not to the statistical inference based on the estimator of @ As shown below, the distribution of the MLE of @ depends crucially on the marginal distribution of X, Hence, for prediction purposes this marginal distribution has a role to play although for efficient estimation and testing
on @ it is not needed This shows clearly that the weak exogeneity concept is about statistical parameters of interest and not so much about distributions.
Trang 5The probability and sampling models taken together imply that for y= (Vy, ¥o 7) and #=(X,.X: X,)' the likelihood function is
LỊB:y #)= nl D(v,/X,; 0) D(X; Wa) (20.16) The log likelihood takes the form
no surprise to learn that maximisation with respect to @ yields
Trang 620.1 The stochastic linear regression model 417
E(Ệ*)= E[E(Ệ*/ø(#))]= §+ EL'#) 12" Elu/o(X))]
This shows that f* is an efficient estimator of B
Using the same properties for the conditional expectation operator we can show that for the MLE 6*? of o?
E(6*?) = ELE(6*?/o(2))) == ELE(a*'a*/o(2))]
= ET E(u'Myu/o(%)], where My=1,-2(2°2) 712"
= ELE(tr Myuu /ø2(Z))] a E[tr MyE(uu /ø(2))]
= PE(tt My)= = 07 Blt dy =tr(2)7!222))
T—k -( T Jer for all observable values of % (20.24) This implies that although ê*? is a biased estimator of o7 the estimator defined by
is unbiased
Trang 7Using the Lehmann—Scheffe theorem (see Chapter 12) we can show that
ty, He=lyy, 2X, Xy) is a minimal sufficient statistic and, as can be seen from (18) and (19), both estimators are functions of this statistic
Although we were able to derive certain finite properties of the MLE’s p* and é*? without having their distribution, no testing or confidence regions are possible without it For this reason we usually resort to asymptotic theory Under the assumption
as a hybrid of the linear and stochastic linear regression models
20.2 The statistical parameters of interest
The statistical parameters which define the statistical GM are said to be the statistical parameters of interest In the case of the linear regression model
these are B=Zjj0,, and ø?=ø¡;—ø;5;‡øạ; Estimation of these
statistical parameters provides us with an estimated data generating mechanism assumed to have given rise to the observed data in question The notion of the statistical parameters of interest is of paramount importance because the whole statistical analysis ‘revolves’ around these parameters A cursory look at assumptions [1]-[8] defining the linear regression model reveals that all the assumptions are directly or indirectly related to the statistical parameters of interest 0 Assumption [1] defines the systematic and non-systematic component in terms of 9 The assumption of weak exogeneity [3] is defined relative to 6 Any a priori information is introduced into the statistical model via # Assumption [5] referring to the rank of X is indirectly related to @ because the condition
Trang 820.2 The statistical parameters of interest 419
is the sample equivalent to the condition
required to ensure that £,, is invertible and thus the statistical parameters
of interest @ can be defined, Note that for T>k, rank(X)=rank(X’X) Assumptions [6] to [8] are directly related to 6 in view of the fact that they are all defined in terms of D(y,/X,; 6)
The statistical parameters of interest 6 do not necessarily coincide with the theoretical parameters of interest, say € The two sets of parameters, however, should be related in such a way as to ensure that € is uniquely defined in terms of 8 Only then the theoretical parameters of interest can be given statistical meaning In such a case € is said to be identifiable (see Chapter 25) Empirical econometric models represent reparametrised statistical GM’s in terms of € Their statistical meaning is derived from 0 and their theoretical meaning through € As it stands, the statistical GM,
enable the modeller to test any such testable restrictions That is, the
statistical GM is not restricted to coincide with any theoretical model at the outset Before any such restrictions are imposed we need to ensure that the estimated statistical GM is well defined statistically; the underlying assumptions [1]-[8] are valid for the data in hand
The statistical parametrisation @ depends crucially on the choice of Z, and its underlying probabilistic structure as summarised in D(Z; w) Any changes in Z, or/and D(Z; w) changes @ as well as the statistical model in ` question Hence, caution should be exercised in postulating arguments which depend on different parametrisations, especially when the parametrisations involved are not directly comparable In order to illustrate this let us consider the so-called omitted variables bias problem The textbook discussion of the omitted variables bias argument can be summarised as follows:
The true specification is
Trang 9My =1—X(X’X)"'X’ That is, B and 6? suffer from omitted variables bias unless W’X = 0 and y=0, respectively; see Maddala (1977), Johnston (1984), Schmidt (1976), inter alia
From the textbook specification approach viewpoint, where the statistical model is derived by attaching an error term to the theoretical model, it is impossible to question the validity of the above argument On the other hand, looking at it from the specification viewpoint proposed in Chapter 19 we can see a number of serious weaknesses in the argument The most obvious weakness of the argument is that it depends on two statistical models with different parametrisations In particular B in (33) and (34) is very different If we denote the coefficient of X in (33) by B=Zj3'¢,,, the same coefficient in X takes the form:
Bo = 22 3621 — Dy 3273033 đai,
where 25 3=(222—2p3%533'L39), U33= Cov(W,), £23 = Cov(X,, W,), 631 = Cov(W,, ¥,) (see Chapter 15) Moreover, the probability models underlying (33) and (34) are D(y,/X,; 0) and D(y,/X,, W,; 05) respectively Once this is realised we can see that the omitted variables bias (39) should be written as
Trang 10information, Fo = oly,, X,, W,, t= 1,2, , Tyand F =oly, X,,t= 1,2, ,
T) respectively This, however, does not imply that the omitted variables argument is useless, quite the opposite In cases where the sample information is the same (#,=.¥) the argument can be very useful in deriving misspecification tests (see Chapters 21 and 22) For further discussion of this issue see Spanos (198 5b)
The above argument illustrates the dangers of not specifying explicitly the underlying probability model and the statistical parameters of interest
By changing the underlying probability distribution and the parametrisation the results on bias disappear The two parametrisations are only comparable when they are both derivable from the joint distribution, D(Z,, ,Z;;y) using alternative ‘reduction’ arguments
20.3 Weak exogeneity
When the random vector X, is assumed to be weakly exogenous in the context of the linear regression model it amounts to postulating that the stochastic structure of X,, as specified by its marginal distribution DỊX,; ¿),
is not relevant as far as the statistical inference on the parameters of interest
Trang 110 =(ÿ, ø”) is concerned That is, although at the outset we postulate D(y,, X35 y) as far as the parameters of interest are concerned, D(y,/X,; ,) suffices; note that
is true for any joint distribution (see Chapter 5) If we want to test the exogeneity assumption we need to specify D(X,; w,) and consider it in relation to D(y,/X,; w,) (see Wu (1973), Engle (1984); inter alia) These exogeneity tests usually test certain implications of the exogeneity assumption and this can present various problems The implications of exogeneity tested depend crucially on the other assumptions of the model as well as the appropriate specification of the statistical GM giving rise to x,, t=1,2, , T; see Engle et al (1983)
Exogeneity in this context will be treated as a non-directly testable assumption and no exogeneity tests will be considered It will be argued in Chapter 21 that exogeneity assumptions can be tested indirectly by testing the assumptions [6]-[8] The argument in a nutshell is that when inappropriate marginalisation and conditioning are used in defining the parameters of interest the assumptions [6]-[8] are unlikely to be valid (see Engle et al (1983), Richard (1982)) For example a way to ‘test’ the weak exogeneity assumption indirectly is to test for departures from the normality of D(y,,X,; y) using the implied normality of D(y,/X,; 6) and homoskedasticity of Var(y,/X,=x,) For instance, in the case where D(y,, X43 y) is multivariate Student's r, the parameters w, and w, above are no longer variation free (see Section 21.4) Testing for departures from normality in the directions implied by D(y,, X,; w) being multivariate t can be viewed as
an indirect test for the variation free assumption underlying weak exogeneity
20.4 Restrictions on the statistical parameters of interest Ø
The statistical inference results on the linear regression model derived in Chapter 19 are based on the assumption that no a priori information on Ø= (, ø?) is available Such a priori information, when available, can take various forms such as linear, non-linear, exact, inexact or stochastic In this section only exact a priori information on £ and its implications will be considered; a priori information on o? is rather scarce
(1) Linear a priori restrictions on B
Let us assume that a priori information in the form of m linear restrictions
Trang 1220.4 Restrictions on parameters of interest 423
is also available at the outset, where R and r are mx k and mx 1 known
matrices, rank(R)=m Such restrictions imply that the parameter space where f takes values in no longer R* but some subset of it as determined by (47) These restrictions represent information relevant for the statis- tical analysis of the linear regression model and can be taken into consideration
In the estimation of @ these restrictions can be taken into consideration
by extending the concept of the log likelihood function to include such restrictions This is achieved by defining the Lagrangian function to be
I(Ø, g; y, X)= const —-~ log ơ? ~3„z(y—X)(y— Xổ) — (R§—t),
(20.48) where g represents an m x | vector of Lagrange multipliers Optimisation of (48) with respect to B, c* and y gives rise to the first-order conditions:
Trang 13Using (56) we can deduce that
(i) When RB=r, E(p)=B and E(ji)=0, ice B and ji are unbiased
estimators of B and 0, respectively
(ii) Band pare fully efficient estimators of B and p since their variances
achieve the Cramer—Rao lower bounds as can be verified directly using the extended information matrix:
(see exercises 1 and 2)
(1) [Cov(Ø) —Cov(p)] <0, i.e the covariance of the constrained MLE
B is always less than or equal to the covariance of the unconstrained MLE 8, irrespective of whether RB=r holds or not; but [MSE(B) — MSE(g)] >0 where MSE stands for mean square error (see Chapter 12)
The constrained MLE of o? can be written in the form
Trang 1420.4 Restrictions on parameters of interest 425
But for §?=[l/(T+m—k)]ũũ, F(S”)=ø? when Rÿ=r, since =0
The F-test revisited
In Section 19.5 above we derived the F-test based on the test statistic
r(y)= (RỆ—r)[RIXX) s1) 0ý (R@—Ð (20.64)
for the null hypothesis
Ho: RB=r against H,: RB¥r,
using the intuitive argument that when H, is valid || RB —r|| must be close to
zero We can derive the same test using various other intuitive arguments
similar to this in relation to quantities like | B — || and |i) being close to
zero when H, is valid (see question 5) A more formal! derivation of the F- test can be based on the likelihood ratio test procedure (see Chapter 14) The above null and alternative hypotheses in the language of Chapter 14 can be written as
Hy: @€O,, H,:0cQ@,=Q—Q,,
where
6=(B,07), O={(B,o*): BERS a7 ER, },
= {(B,a7): BER’, RB=r.o7 eR, }
The likelihood ratio takes the form
Tax HOY) ey) Qn Me) Me? 2.2)"
L(8:y) LIỖ:y) (2z ny FR, waa) +
Ay) =
Trang 15The problem we have to face at this stage is to determine the distribution of Aly) or some monotonic function of it Using (58) we can write A(y) in the form
(RB ry [ROX’X) RI (RB—)]
Looking at (66) we can see that it is directly related to (64) whose distribution we know Hence, A(y) can be transformed into the F-test using the monotonic transformation
This transformation provides us with an alternative way to calculate the value of the test statistic t(y) using the estimates of the restricted and unrestricted MLE’s of a” An even simpler operational form of t(y) can be specified using the equality (58) From this equality we can deduce that
where RRSS and URSS stand for restricted and unrestricted residual sums
of squares, respectively This is a more convenient form because most computer packages report the RSS and instead of going through the calculation needed for (64) we estimate the regression equation with and without the restrictions and use the RSS in the two cases to calculate t(y) as
in (70)
Example
Let us return to the money equation estimated in Chapter 19:
m, = 2.896 + 0.690), + 0.865p, —0.055i, + &,, (20.71) (1.034) (0.105) (0020) (0013) (0.04)
R*=0.995, R?=0.995, s=0.0393,
log L=1474, RSS=0.11752, T=80.