THE LINEAR REGRESSION MODEL II

The linear regression model II — departures from the assumptions underlying the statistical GM In the previous chapter we discussed the specification of the linear regression model as w

Trang 1

The linear regression model II — departures from the assumptions underlying the

statistical GM

In the previous chapter we discussed the specification of the linear regression model as well as its statistical analysis based on the underlying eight standard assumptions In the next three chapters several departures from [1]-[8] and their implications will be discussed The discussion differs somewhat from the usual textbook discussion (see Judge et al (1982)) because of the differences in emphasis in the specification of the model

In Section 20.1 the implications of having E(y,/o(X,)) instead of E(y,/X;=X,) as the systematic component are discussed Such a change gives rise to the stochastic regressors model which as a statistical model shares some features with the linear regression model, but the statistical inference related to the statistical parameters of interest 0 is somewhat different The statistical parameters of interest and their role in the context

of the statistical GM is the subject of Section 20.2 In this section the so- called omitted variables bias problem is reinterpreted as a parameters of interest issue In Section 20.3 the assumption of exogeneity is briefly considered The cases where a priori exact linear and non-linear restrictions

on @ exist are discussed in Section 20.4 Estimation as well as testing when such information is available are considered Section 20.5 considers the concept of the rank deficiency of X known as collinearity and its implications The potentially more serious problem of ‘near collinearity’ is the subject of Section 20.6 Both problems of collinearity and near collinearity are interpreted as insufficient data information for the analysis

of the parameters of interest It is crucially important to emphasise at the outset that the discussion of the various departures from the assumptions underlying the statistical GM which follows assumes that the probability

412

Trang 2

20.1 The stochastic linear regression model 413

and sampling models remain valid and unchanged This assumption is needed because when the probability and/or the sampling model change the whole statistical model requires respecifying

20.1 The stochastic linear regression model

The first assumption underlying the statistical GM is that the systematic component is defined as

to which X,, isa random variable By construction o(X ,,)< ¥ The o-field generated by X,=(X,,, X2, , Xx,) 18 defined to be

uk = Ely, /o(X,))= 64 2893 X, (20.6)

(see Chapter 15) Using (6) we can define the statistical GM:

Trang 3

where the parameters of interest are Ø=(ÿ,ø?), ÿ=E;jø;,, ø?=

Øịi—Ø;;E;;ø¿¡ The random vectors (X;, X¿, , X;Y= are assumed

to satisfy the rank condition, rank(#)=k for any observed value of # The

error term defined by

satisfies the following properties:

Given that X, in the statistical GM is a random vector, intuition suggests that the probability model underlying (7) should come in the form of the joint distribution D(s,,X,; w) We need, however, a form of this distribution which involves the parameters of interest directly Such a form is readily available using the equality

D(y,,X5 W) = Diy/X,; Wy) D(X; Wo), (20.12)

with Ø=(ÿ,ø”) being a parametrisation of w, This suggests that the probability model underlying (7) should take the form

D= {D(y,/X,; 8): D(X, Wz), 0=(B, 0?) Rx R,,teT}, (20.13) where

(Z,, Z,, , Zz) isa random sample from D(Z,;), t=1,2, , T

Trang 4

respectively, where as usual

z={ yt )

Xr

If we collect all the above components together we can specify the stochastic linear regression model as follows:

The statistical GM: y,= BX%,+u,, te T

LI] u= EUy,/ø(X,)) and u,= y,— Ely,/a(X,))

[21 0=(B,0°), =E;7ø¿, and a*=06,,—6,2%5;6,, are the

The sampling model

[8] (Z,, Z., , Zr) is a random sample from D(Z,; w), t= 1, 2, , T,

respectively

The assumption related to the weak exogeneity of X, with respect to Ø for t=1,2, , T shows clearly that the concept is related only to inference on 8 and not to the statistical inference based on the estimator of @ As shown below, the distribution of the MLE of @ depends crucially on the marginal distribution of X, Hence, for prediction purposes this marginal distribution has a role to play although for efficient estimation and testing

on @ it is not needed This shows clearly that the weak exogeneity concept is about statistical parameters of interest and not so much about distributions.

Trang 5

The probability and sampling models taken together imply that for y= (Vy, ¥o 7) and #=(X,.X: X,)' the likelihood function is

LỊB:y #)= nl D(v,/X,; 0) D(X; Wa) (20.16) The log likelihood takes the form

no surprise to learn that maximisation with respect to @ yields

Trang 6

E(Ệ*)= E[E(Ệ*/ø(#))]= §+ EL'#) 12" Elu/o(X))]

This shows that f* is an efficient estimator of B

Using the same properties for the conditional expectation operator we can show that for the MLE 6*? of o?

E(6*?) = ELE(6*?/o(2))) == ELE(a*'a*/o(2))]

= ET E(u'Myu/o(%)], where My=1,-2(2°2) 712"

= ELE(tr Myuu /ø2(Z))] a E[tr MyE(uu /ø(2))]

= PE(tt My)= = 07 Blt dy =tr(2)7!222))

T—k -( T Jer for all observable values of % (20.24) This implies that although ê*? is a biased estimator of o7 the estimator defined by

is unbiased

Trang 7

Using the Lehmann—Scheffe theorem (see Chapter 12) we can show that

ty, He=lyy, 2X, Xy) is a minimal sufficient statistic and, as can be seen from (18) and (19), both estimators are functions of this statistic

Although we were able to derive certain finite properties of the MLE’s p* and é*? without having their distribution, no testing or confidence regions are possible without it For this reason we usually resort to asymptotic theory Under the assumption

as a hybrid of the linear and stochastic linear regression models

20.2 The statistical parameters of interest

The statistical parameters which define the statistical GM are said to be the statistical parameters of interest In the case of the linear regression model

these are B=Zjj0,, and ø?=ø¡;—ø;5;‡øạ; Estimation of these

statistical parameters provides us with an estimated data generating mechanism assumed to have given rise to the observed data in question The notion of the statistical parameters of interest is of paramount importance because the whole statistical analysis ‘revolves’ around these parameters A cursory look at assumptions [1]-[8] defining the linear regression model reveals that all the assumptions are directly or indirectly related to the statistical parameters of interest 0 Assumption [1] defines the systematic and non-systematic component in terms of 9 The assumption of weak exogeneity [3] is defined relative to 6 Any a priori information is introduced into the statistical model via # Assumption [5] referring to the rank of X is indirectly related to @ because the condition

Trang 8

20.2 The statistical parameters of interest 419

is the sample equivalent to the condition

required to ensure that £,, is invertible and thus the statistical parameters

of interest @ can be defined, Note that for T>k, rank(X)=rank(X’X) Assumptions [6] to [8] are directly related to 6 in view of the fact that they are all defined in terms of D(y,/X,; 6)

The statistical parameters of interest 6 do not necessarily coincide with the theoretical parameters of interest, say € The two sets of parameters, however, should be related in such a way as to ensure that € is uniquely defined in terms of 8 Only then the theoretical parameters of interest can be given statistical meaning In such a case € is said to be identifiable (see Chapter 25) Empirical econometric models represent reparametrised statistical GM’s in terms of € Their statistical meaning is derived from 0 and their theoretical meaning through € As it stands, the statistical GM,

enable the modeller to test any such testable restrictions That is, the

statistical GM is not restricted to coincide with any theoretical model at the outset Before any such restrictions are imposed we need to ensure that the estimated statistical GM is well defined statistically; the underlying assumptions [1]-[8] are valid for the data in hand

The statistical parametrisation @ depends crucially on the choice of Z, and its underlying probabilistic structure as summarised in D(Z; w) Any changes in Z, or/and D(Z; w) changes @ as well as the statistical model in ` question Hence, caution should be exercised in postulating arguments which depend on different parametrisations, especially when the parametrisations involved are not directly comparable In order to illustrate this let us consider the so-called omitted variables bias problem The textbook discussion of the omitted variables bias argument can be summarised as follows:

The true specification is

Trang 9

My =1—X(X’X)"'X’ That is, B and 6? suffer from omitted variables bias unless W’X = 0 and y=0, respectively; see Maddala (1977), Johnston (1984), Schmidt (1976), inter alia

From the textbook specification approach viewpoint, where the statistical model is derived by attaching an error term to the theoretical model, it is impossible to question the validity of the above argument On the other hand, looking at it from the specification viewpoint proposed in Chapter 19 we can see a number of serious weaknesses in the argument The most obvious weakness of the argument is that it depends on two statistical models with different parametrisations In particular B in (33) and (34) is very different If we denote the coefficient of X in (33) by B=Zj3'¢,,, the same coefficient in X takes the form:

Bo = 22 3621 — Dy 3273033 đai,

where 25 3=(222—2p3%533'L39), U33= Cov(W,), £23 = Cov(X,, W,), 631 = Cov(W,, ¥,) (see Chapter 15) Moreover, the probability models underlying (33) and (34) are D(y,/X,; 0) and D(y,/X,, W,; 05) respectively Once this is realised we can see that the omitted variables bias (39) should be written as

Trang 10

information, Fo = oly,, X,, W,, t= 1,2, , Tyand F =oly, X,,t= 1,2, ,

T) respectively This, however, does not imply that the omitted variables argument is useless, quite the opposite In cases where the sample information is the same (#,=.¥) the argument can be very useful in deriving misspecification tests (see Chapters 21 and 22) For further discussion of this issue see Spanos (198 5b)

The above argument illustrates the dangers of not specifying explicitly the underlying probability model and the statistical parameters of interest

By changing the underlying probability distribution and the parametrisation the results on bias disappear The two parametrisations are only comparable when they are both derivable from the joint distribution, D(Z,, ,Z;;y) using alternative ‘reduction’ arguments

20.3 Weak exogeneity

When the random vector X, is assumed to be weakly exogenous in the context of the linear regression model it amounts to postulating that the stochastic structure of X,, as specified by its marginal distribution DỊX,; ¿),

is not relevant as far as the statistical inference on the parameters of interest

Trang 11

0 =(ÿ, ø”) is concerned That is, although at the outset we postulate D(y,, X35 y) as far as the parameters of interest are concerned, D(y,/X,; ,) suffices; note that

is true for any joint distribution (see Chapter 5) If we want to test the exogeneity assumption we need to specify D(X,; w,) and consider it in relation to D(y,/X,; w,) (see Wu (1973), Engle (1984); inter alia) These exogeneity tests usually test certain implications of the exogeneity assumption and this can present various problems The implications of exogeneity tested depend crucially on the other assumptions of the model as well as the appropriate specification of the statistical GM giving rise to x,, t=1,2, , T; see Engle et al (1983)

Exogeneity in this context will be treated as a non-directly testable assumption and no exogeneity tests will be considered It will be argued in Chapter 21 that exogeneity assumptions can be tested indirectly by testing the assumptions [6]-[8] The argument in a nutshell is that when inappropriate marginalisation and conditioning are used in defining the parameters of interest the assumptions [6]-[8] are unlikely to be valid (see Engle et al (1983), Richard (1982)) For example a way to ‘test’ the weak exogeneity assumption indirectly is to test for departures from the normality of D(y,,X,; y) using the implied normality of D(y,/X,; 6) and homoskedasticity of Var(y,/X,=x,) For instance, in the case where D(y,, X43 y) is multivariate Student's r, the parameters w, and w, above are no longer variation free (see Section 21.4) Testing for departures from normality in the directions implied by D(y,, X,; w) being multivariate t can be viewed as

an indirect test for the variation free assumption underlying weak exogeneity

20.4 Restrictions on the statistical parameters of interest Ø

The statistical inference results on the linear regression model derived in Chapter 19 are based on the assumption that no a priori information on Ø= (, ø?) is available Such a priori information, when available, can take various forms such as linear, non-linear, exact, inexact or stochastic In this section only exact a priori information on £ and its implications will be considered; a priori information on o? is rather scarce

(1) Linear a priori restrictions on B

Let us assume that a priori information in the form of m linear restrictions

Trang 12

20.4 Restrictions on parameters of interest 423

is also available at the outset, where R and r are mx k and mx 1 known

matrices, rank(R)=m Such restrictions imply that the parameter space where f takes values in no longer R* but some subset of it as determined by (47) These restrictions represent information relevant for the statistical analysis of the linear regression model and can be taken into consideration

In the estimation of @ these restrictions can be taken into consideration

by extending the concept of the log likelihood function to include such restrictions This is achieved by defining the Lagrangian function to be

I(Ø, g; y, X)= const —-~ log ơ? ~3„z(y—X)(y— Xổ) — (R§—t),

(20.48) where g represents an m x | vector of Lagrange multipliers Optimisation of (48) with respect to B, c* and y gives rise to the first-order conditions:

Trang 13

Using (56) we can deduce that

(i) When RB=r, E(p)=B and E(ji)=0, ice B and ji are unbiased

estimators of B and 0, respectively

(ii) Band pare fully efficient estimators of B and p since their variances

achieve the Cramer—Rao lower bounds as can be verified directly using the extended information matrix:

(see exercises 1 and 2)

(1) [Cov(Ø) —Cov(p)] <0, i.e the covariance of the constrained MLE

B is always less than or equal to the covariance of the unconstrained MLE 8, irrespective of whether RB=r holds or not; but [MSE(B) — MSE(g)] >0 where MSE stands for mean square error (see Chapter 12)

The constrained MLE of o? can be written in the form

Trang 14

20.4 Restrictions on parameters of interest 425

But for §?=[l/(T+m—k)]ũũ, F(S”)=ø? when Rÿ=r, since =0

The F-test revisited

In Section 19.5 above we derived the F-test based on the test statistic

r(y)= (RỆ—r)[RIXX) s1) 0ý (R@—Ð (20.64)

for the null hypothesis

Ho: RB=r against H,: RB¥r,

using the intuitive argument that when H, is valid || RB —r|| must be close to

zero We can derive the same test using various other intuitive arguments

similar to this in relation to quantities like | B — || and |i) being close to

zero when H, is valid (see question 5) A more formal! derivation of the F- test can be based on the likelihood ratio test procedure (see Chapter 14) The above null and alternative hypotheses in the language of Chapter 14 can be written as

Hy: @€O,, H,:0cQ@,=Q—Q,,

where

6=(B,07), O={(B,o*): BERS a7 ER, },

= {(B,a7): BER’, RB=r.o7 eR, }

The likelihood ratio takes the form

Tax HOY) ey) Qn Me) Me? 2.2)"

L(8:y) LIỖ:y) (2z ny FR, waa) +

Ay) =

Trang 15

The problem we have to face at this stage is to determine the distribution of Aly) or some monotonic function of it Using (58) we can write A(y) in the form

(RB ry [ROX’X) RI (RB—)]

Looking at (66) we can see that it is directly related to (64) whose distribution we know Hence, A(y) can be transformed into the F-test using the monotonic transformation

This transformation provides us with an alternative way to calculate the value of the test statistic t(y) using the estimates of the restricted and unrestricted MLE’s of a” An even simpler operational form of t(y) can be specified using the equality (58) From this equality we can deduce that

where RRSS and URSS stand for restricted and unrestricted residual sums

of squares, respectively This is a more convenient form because most computer packages report the RSS and instead of going through the calculation needed for (64) we estimate the regression equation with and without the restrictions and use the RSS in the two cases to calculate t(y) as

in (70)

Example

Let us return to the money equation estimated in Chapter 19:

m, = 2.896 + 0.690), + 0.865p, —0.055i, + &,, (20.71) (1.034) (0.105) (0020) (0013) (0.04)

R*=0.995, R?=0.995, s=0.0393,

log L=1474, RSS=0.11752, T=80.

Tiêu đề	The Linear Regression Model II
Trường học	University of Example
Chuyên ngành	Statistics
Thể loại	Luận văn
Thành phố	Example City

Định dạng
Số trang	31
Dung lượng	1,01 MB