THE MULTIVARIATE LINEAR REGRESSION MODEL
Trang 1where y,: mx 1, B: kx m, x, k x 1, u,: mx 1 The system (1) is effectively a system of m linear regression equations:
with B=(B,, B2,. Bn)
In direct analogy with the m= 1 case (see Chapter 19) the multivariate linear regression model will be derived from first principles based on the
joint distribution of the observable random variables involved, D(Z,; ý) -
where Z,=(yi, Xi), (m+k)x 1 Assuming that Z, is an HD normally distributed vector, Le
Trang 2Moreover, by construction, u, and y, satisfy the following properties:
(i) E(u,) = ELE(u,/X, = x,)] =0;
(it) u,u,) = E[E(uu,/X, = x,)] = 0 ts:
(iii) EUsw)=E[Eua/X,=xJ]=E[w,Etw/X,=x)]=0 reT,
where O=X,¡—¡¿:ŠX;; *¿;¡ (compare these with the results in Section
19.2)
The similarity between the m= 1 case and the general case allows us to
consider several loose ends left in Chapter 19 The first is the use of the joint
distribution D(Z,; ys) in defining the model instead of concentrating exclusively on D(y,/X,; w,) The loss of generality in postulating the form of the joint distribution is more than compensated for by the additional insight provided In practice it is often easier to ‘judge’ the plausibility of
assumptions relating to the nature of D(Z,; y) rather than D(y,/X,; p,)
Moreover, in misspecification analysis the relationship between the assumptions underlying the model and those underlying the random vector process {Z,, t¢ 1} enhances our understanding of the nature of the
possible departures An interesting example of this is the relationship of the
assumption that {Z,,reT} isa
(1) normal (N);
(2) independent (J); and
(3) identically distributed (ID) process; and
[6] (i) ——_Dly,/X,; 4) is normal;
(iii)
The question which naturally arises is whether (i)}(ii) imply (N) or not The following lemma shows that if (i)-{ili) are supplemented by the assumption
Trang 3that X,~ N(0,Z,,), det(X,,)40, the reverse implication holds
Lemma 24.1
Z,~ N(0,X) for te TT if and only if
(i) X,~N(0,E,,), det(L,) #0;
whereY:T xm.X:T x k.B:k x m.U: 7 x m The system in (1) can be viewed
as the tth row of (6) The ith row taking the form
represents all T observations on the ith regression in (2) In order to define the conditional distribution D(Y/X; w,) we need the special notation of Kronecker products (see Appendix 2) Using this notation the matrix distribution can be written in the form
vec(Y) =(l„ @ X) vec(B) + vec(U) (24.9)
or
in an obvious notation
Trang 4The multivariate linear regression (MLR) model is of considerable interest in econometrics because of its direct relationship with the simultaneous equations formulation to be considered in Chapter 25 In particular, the latter formulation can be viewed as a reparametrisation of the MLR model where the statistical parameters of interest @=(B, Q) do not coincide with the theoretical parameters of interest € Instead, the two sets of parameters are related by some system of implicit equations of the form:
These equations can be interpreted as providing an alternative parametrisation for the statistical GM in terms of the theoretical parameters of interest In view of this relationship between the two statistical models a sound understanding of the MLR model will pave the way for the simultaneous equations formulation in Chapter 25
24.2 Specification and estimation
In direct analogy to the linear regression model (m= 1) the multivariate linear regression model is specified as follows:
qd) Statistical GM: y,=B’x,+u,, teT
y:mxl, Xx;:kx1, B:kxm
[1] The systematic and non-systematic components are:
H,= E(y,X,=x,)=Bx, u,=y,— Ety,/X,=x,),
and by construction
E(u,) = EL E(u,/X, =x,)]=9,
E(uu,) = E[LE(wu,/X, = x,)] =0, reT
[2] The statistical parameters of interest are 6=(B,Q) where B=
X2; E¿y, Q=%,, —2,2%37'E),
[3] X, is assumed to be weakly exogenous with respect to 0
[4] No a prion information on Ø
[5] Rank(X)=k, X =(x,, X5, , X7): T xk, for T>k.
Trang 5(IW) Sampling model
{8} Y=(¡,Y¿ Yr} 1S an independent sample sequentially drawn
from D(y,/X,; 6), t=1,2, , T.and T2=m+k
The above specification is almost identical with that of m= 1 considered
in Chapter 19 The discussion of the assumptions in the same chapter applies to [1]-[8] above with only minor modifications due to m> 1 The only real change brought about by m> | is the increase in the number of statistical parameters of interest being mk +4m(m + 1) It should come as no
surprise to learn that the similarities between the two statistical models
extend to estimation, testing and prediction
From assumptions [6] to [8] we can deduce that the likelihood function takes the form
r
10; Y) )=c(Y) IP (y,/X,; 9)
and the log likelihood is
log L= const ~5, lost (det Q)— 5 S (y, )@~!(y,—Bx,) (24.12)
=const —4[T log(det 2) + trQ~!(Y —XB(Y—XB)] (24.13)
(see exercise 1) The first-order conditions are
Trang 6These first-order conditions lead to the following MLE’s:
and #, 1 ũ, This orthogonality can be used to deñne a goodness-of-lit measure by extending R? = 1—(a'd) (yy) to
G=I—(Ù Õ)\(Y'Y)!=(Y'YT-ÙÔ)J(Y'Y)T1, (24.20) The matrix G varies between the identity matrix when U =0 and zero when
Y=U (no explanation) In order to reduce this matrix goodness-of-fit measure to a scalar we can use the trace or the determinant
where E(:) is relative to D(V/X: 0)
Trang 7Finite sample properties of B and Oo
From the fact that B and Q are MLE’s we can deduce that they enjoy the invariance property of such estimators (see Chapter 13) and they are functions of the minimal sufficient statistics, if they exist Using the Lehman— Scheffe result (see Chapter 12) we can see that the ratio
D(Y/X; 6) |
DIV gi 8) RPL 20767 1[YY —Y0Yu —(Y —YuJXB—
BX(Y-Yạ]; (24.24)
is independent of 0 if YY=Y’Y, and Y’'X = YX This implies that
a(Y)=(t,(¥),t,(¥)), where t,(Y)=Y’Y, t,(¥Y) =Y'X
defines the set of minimal sufficient statistics and
Trang 8and thus Õ=[1/(T—k)]Ữ Ô is an unbiased estimator of Q In view of (25}-(31) we can summarise the finite sample properties of the MLE’s B and
(3) B is an unbiased estimator of B (i.e E+(B)=B) but Q is a biased
estimator of Q; Q=[1/(T—k)]U'U being unbiased
(4) Bisa fully efficient estimator of B in view of the fact that Cov(B) =
O@(XX)'! and the information matrix of @=(B,Q) takes the form
(5) B and Q are independent: in view of the orthogonality in (19)
Asymptotic properties of B and Q
Arguing again by analogy to the m= 1 case we can derive the asymptotic properties of the MLE’s B and Q of B and Q, respectively
(1) Consistency: (B *,B,Q24 Q)
In view of the result (B—B)~ N(0, Q @ (X’X)z!) we can deduce that if
lim,_, , (XX); ! =0 then Cov(B) — 0 and thus B is a consistent estimator of
B (see Chapters 12 and 19) Similarly, given that lim, ,,, E(Q)=Q and lim,,,, CoQ) =0, AS Q
Note that the following statements are equivalent:
Trang 9where 2„„(XX)y and Ama(X’X)7' refer to the smallest and largest eigenvalue of (X'X); and its inverse respectively; see Amemiya (1985) (ii) Strong consistency: (B — B)
From the theory of maximum likelihood estimation we know that under
relatively mild conditions (see Chapter 13) the MLE 6 of 6 \/T(6—6) ~
N(,I,,(0)~) For this result to apply, however, we need the boundedness of I,,(0)=lim,_, ,.(1/T)1(8) as well as its non-singularity In the present case the asymptotic information matrix is bounded and non-singular (full rank)
if lim,_, , (X'X)/T=Q, < «x and non-singular Under this condition we can
deduce that
\/ T(B—B) ~ N0,2 @ Qz') (24.33) and
J/T(Q —Q) ~ NO, 2(Q © Q)) (24.34) (see Rothenberg (1973))
Note that if {(X’X),;, T >k} is a sequence of k x k positive definite matrices such that (X’X),;_, —(X’X); is positive semi-definite and e’(X’X);c + x as
T— x for every c#0 then lim;_,(X’X)7'=0
(iv) In view of (iii) we can deduce that B and Q are both asymptotically unbiased and efficient
24.3 A priori information
One particularly important departure from the assumptions underlying the multivariate linear regression model is the introduction of a priori
restrictions related to 6 When such additional information is available
assumption [4] no longer applies and the results on estimation derived in
Section 24.2 need to be modified The importance of a priori information in
Trang 10the present context arises partly because it allows us to derive tests which can be usefully employed in misspecification testing and partly because this will provide the link between the multivariate linear regression model and the simultaneous equations model to be considered in Chapter 25 (1) Linear restrictions ‘related’ to X,
The first form of restrictions to be considered is
where B,, = vec(B) =(B' Bo , Bi.) 1 mk x 1, Rip x mk, px 1 This form of
linear restrictions is more general than (35) as well as
is to ‘solve’ the system (35) for B and substitute the ‘solution’ into (40) In
order to do that we define two arbitrary matrices D¥:(k — p) x k, rank (D*)=
Trang 11k—p, and C#: (k—p) xm, and reformulate (35) into
Given that L?=L, P? = P and LP=0 (ie they are orthogonal projections)
we can deduce that P takes the form
apart from the constant terms, say B ,,are zero This can be expressed in the
Trang 12form (35) with
D, =(0,1,-1), B=(B ;,B,,,), C=0
and H, takes the form B,,,=9
(2) Linear restrictions ‘related’ to y,
The second form of restrictions to be considered is
where I',: mx q (q<m) and A,: kx q are known matrices with rank(E) =a The restrictions in (50) represent linear between-equations restrictions because the ith row of B represents the ith coefficient on all equations Interpreted in the context of (35) these restrictions are directly related to the
yS This implies that if we follow the procedure used for the restrictions in (38) we have to be much more careful because the form of the underlying probability model might be affected Richard (1979) shows how this procedure can give rise to the restricted MLE’s of Band Q For expositional purposes we will adopt the Lagrange multiplier procedure The Lagrangian function is
\(B, Q, M) = -5 log(det 9)—‡ tr Q~!(Y —XBJ(Y —XB)
Trang 13This implies that the constrained MLE’s of B and Q are
~ law a lage ~ 2
Q=— ỮŨ=ôÔ +7 B- By (X'X)(B—B) (24.59) (see Richard (1979)) If we compare (58) with (48) we can see that the main difference is that Q enters the MLE estimator of B in view of the fact that the restrictions (50) affect the form of the probability model It is interesting to note that if we premultiply (58) by F, it yields (54) The above formulae, (58), (59), will be of considerable value in Chapter 25
(3) Linear restrictions ‘related’ to both y, and X,
A natural way to proceed is to combine the linear restrictions (38) and (50)
where Y*=YT,, B*=BL, and E=UF;
The linear restrictions in (60) in vector form can be written as
vec(D, BI, + C)=(T, © D,) vec(B) + vec(C) =0 (24.66)
or
Trang 14where B, = vec B and r= — vec(C) This suggests that an obvious way to generalise this is to substitute (/', © D,) with a pxkm matrix R to formulate the restrictions in the form
excluded variables, and zeros everywhere else
Across-equations linear restrictions can be accommodated in the off block-diagonal submatrices R;,,i,j=1,2, m,iAj of R with R,, referring
to the restrictions between equations i and j
Let us consider the derivation of the constrained MLE’s of Band Q under the linear restrictions (68) The most convenient form of the statistical GM for the sample period r= 1, 2, , Tis not
Trang 15ñ =B,— x Xx) ) IR TR(X,Q7X,)7 'R*]- (RB, —r),(24.76)
and _
If we compare these formulae with those in the m= 1 case (see Chapter 20)
we can see that the only difference (when Q is known) is the presence of Q,
This is because in the m>1 case the restrictions RB,=r affect the
underlying probability model by restricting y, In the econometric literature the estimator (78) is known as the generalised least-squares (GLS) estimator
In practice Q is unknown and thus in order to ‘solve’ the conditions (73)-{75) we need to resort to iterative numerical optimisation (see Harvey (1981), Quandt (1983) inter alia)
The purpose of the next section is to consider two special cases of (68) where the restrictions can be substituted directly into a reformulated statistical GM These are the cases of exclusion and across-equations linear
homogeneous restrictions In these two cases the constrained MLE of B,,
takes a form similar to (78)
24.4 The Zellner and Malinvaud formulations
In econometric modelling two special cases of the general linear restrictions
are particularly useful These are the exclusion and across-equations linear
homogeneous restrictions In order to illustrate these let us consider the
Trang 16two-equation case
X1
u (ante Bai nh) xX», +( ") teT (24.80)
Vat Bi2 Bar Bas Har
X3¢
(i) Exclusion restrictions: B,,=0, B.3=0;
(ii) Across-equation linear homogeneous restrictions: B,, =P , >
It turns out that in these two cases the restrictions can be accommodated directly into a reformulation of the statistical GM and no constrained optimisation is necessary The purpose of this section is to discuss the
estimation of B, under these two forms of restrictions and derive explicit
formulae which will prove useful in Chapter 25
Let us consider the exclusion restrictions first The vectorised form of
where X;, refers to the regressor data matrix for the ith equation and Ø# the
corresponding coefficients vector In the case of the example in (80) with the
Trang 17restrictions ¡¡ =0, 633 =0, (84) takes the form
hs LIB Y2 0 X2/\B2 H; nm
where X, =(x ,X3), X.=(%1,%2), Bi =(B21B31)' and B,=(B, 2822)’
The formulation (84) is known as the seemingly unrelated regression equations (SURE), a term coined by Zellner (1962), because the m linear regression equations in (84) seem to be unrelated at first sight but this turns out to be false When different restrictions are placed on different equations the original statistical GM is affected and the various equations become interrelated In particular the covariance matrix Q enters the estimator of
* As shown in the previous section, in the case where Qis known the MLE
of B* takes the form
‡=(X‡(9~! @1,)X‡)"'X‡(@~' @I;)y, (24.87)
Otherwise, the MLE is derived using some iterative numerical procedure For this case Zellner (1962) suggested the two-step least-squares estimator
where Q=(1/T)U'U, U=Y—XB It is not very difficult to see that this
estimator can be viewed as an approximation to the MLE defined in the
previous section by the first-order conditions (73}{75) where only two iterations were performed One to derive © and then substitute into (87)
Zellner went on to show that if
Trang 18of across-equation linear homogeneous restrictions such as B,,=f,> in example (80) Such restrictions can be accommodated into the formulation (82) directly by redefining the regressor matrix as
j =(Š xrô, 'x7] Y XO; 'y,, i=1,2, ,1, (24.96
where / refers to the number of iterations which is either chosen a priori or
determined by some convergence criterion such as
|Ê*.(—f|<e for somee>0, eg ¢=0.001 (24.97)
In the case where /=2 the estimator defined by (96) coincides with