(BQ) Part 2 book Applied multivariate statistical analysis has contents: Multivariate linear regression models, principal components, factor analysis and inference for structured covariance matrices; canonical correlation analysis; discrimination and classification; clustering, distance methods, and ordination.
Trang 1on the responses Unfortunately, the name regression, culled from the title of the first paper on the subject by F Galton [14], in no way reflects either the importance
or breadth of application of this methodology In this chapter, we first discuss the multiple regression model for the predic tion of a single response This model is then generalized to handle the prediction of
several dependent variables Our treatment must be somewhat terse, as a vast lit erature exists on the subject (If you are interested in pursuing regression analysis, see the following books, in ascending order of difficulty: Bowerman and O'Connell [5], Neter, Wasserman, Kutner, and Nachtsheim [17], Draper and Smith [12], Cook and Weisberg [9], Seber [20], and Goldberger [15].) Our abbreviated treatment high lights the regression assumptions and their consequences, alternative formulations
of the regression model, and the general applicability of regression techniques to seemingly different situations
7.2 TH E CLASSICAL LIN EAR REGRESSION MODEL
3 54
Let z1 , z2, • • •
, Zr be r predictor variables thought to be related to a response variable
Y For example, with r ==
4, we might have
Y ==
current market value of home
Trang 2Section 7 2 The Classica l Linear Reg ress ion M odel 3 5 5
square feet of living area z2 = location (indicator for zone of city) z3 = appraised value last year
z4 = quality of construction (price per square foot) The classical linear regression model states that in a continuous manner on the z/s, and a random error surement error and the effects of other variables not explicitly considered in the model Y is composed of a mean, which depends s, which accounts for mea The values of the predictor variables recorded from the experiment or set by the in vestigator are treated as fixed The error (and hence the response) is viewed as a ran dom variable whose behavior is characterized by a set of distributional assumptions Specifically, the linear regression model with a single response takes the form
Y = f3o + f31Z1 + · · · + {3,z, + s [Response] = [mean (depending on z1, z2, •
, Zr) ] + [error]
The term "linear" refers to the fact that the mean is a linear function of the unknown parameters as first -order terms {30 , {31 , , f3r · The predictor variables may or may not enter the model With n
independent observations on Y and the associated values of zi, the com plete model becomes Y1 = f3o + /31 Z1 1 + f32Z12 +
Yn = f3o + /31Zn 1 + f32Zn2 + · · · + f3rZnr + Bn where the error terms are assumed to have the following properties:
or
1 E(sj) = 0;
2 3 Var Cov (sj , sk) = O, j # k ( sj) = a-2 (constant); and
In matrix notation, (7-1) becomes Yi
and
2 Cov (e) = E(ee' ) = a-21
s1 + s2
Bn
(7-1) (7-2)
Trang 33 5 6 Chapter 7 M u ltiva r i ate Linear Reg ression Models
Note that a one in the first column of the design matrix Z is the multiplier of the constant term {30 • It is customary to introduce the artificial variable Zjo = 1, so that
We now provide some examples of the linear regression model
Example 7.1 (Fitti ng a straight-line regression model) Determine the linear regression model for fitting a straight line
Mean response = E(Y) = {30 + f31z1
to the data
Before the responses Y' = [Yi , }2, .. , YSJ are observed, the errors e' =
[ 81 , 82, , 85] are random, and we can write
The data for this model are contained in the observed response vector y and the
design matrix Z, where
Trang 4Section 7.2 The Classica l Li near Reg ress ion Model 357
otherwise
if the observation is from population 3 otherwise
{ 1 if the observation is z2 == 0 from population otherwise 2
and {30 == JL, {31 == T1 , {32 == T2 , {33 == T3 • Then
1j == f3o + f3IZjl + f32 Zj2 + f33 Zj3 + Bj , j == 1, 2, ' 8 where we arrange the observations from the three populations in sequence Thus, we obtain the observed response vector and design matrix
Trang 53 5 8 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
7.3 LEAST SQUARES ESTI MATIO N
One of the obj ectives of regression analysis is t o develop an equation that will allow the investigator to predict the response for given values of the predictor variables Thus, it is necessary to "fit" the model in (7-3) to the observed yj corresponding to the known values 1 , Zj1, , Zjr· That is, we must determine the values for the regression coefficients f3 and the error variance a-2 consistent with the available data
Let b be trial values for /3 Consider the difference yj - b0 - b1 z j 1 - · · · - b, z 1 r between the observed response yj and the value b0 + b1zj1 + · + brZjr that would
be expected if b were the "true" parameter vector Typically, the differences
yj - b0 - b1zj1 - · - b,Zjr will not be zero, because the response fluctuates (in a manner characterized by the error term assumptions) about its expected value The
method of least squares selects b so as to minimize the sum of the squares of the differences:
n
S(b) == j=1 � (yj - bo - b1Zj1 - " · - brZjr)2
The coefficients b chosen by the least squares criterion are called least squqres esti
mates of the regression parameters /3 They will henceforth be denoted by f3 to em phasize their role as es!imates of f3
The coefficients f3 are consistentA with !he data in th� sense that they produce estimated (fitted) mean responses, {30 + {31zj1 + · + f3rZjr, the sum of whose squares of the differences from the observed yj is as small as possible The deviations
Bj == Yj - f3o - /31 Zj1 - " · - f3rZjr, j == 1 , 2, .. , n (7-5) are called residuals The vector of residuals e == y - zp contains the information about the remaining unknown parameter a-2• (See Result 7.2.)
Result 7.1 Let Z have full rank r + 1 < n.1 The least squares estimate of f3
in (7-3) is given by
p == (Z' z)-1Z'y
Let y == zp == Hy denote the fitted values of y, where H == Z (Z'Z)-1Z' is called
"hat" matrix Then the residuals
== y'[I - Z (Z'Z) Z'Jy == y'y - y'Z/3
1 If Z is not full rank, (Z'Z)-1 is replaced by (Z'Z)-, a generalized inverse of Z'Z (See Exercise 7 6 )
Trang 6Section 7 3 Least S q u a res Estimation 359
Proof Let f3 = (Z'Z) Z'y as asserted Then £
= y - y = y- Z/3 = [I - Z(Z'Z)-1Z']y The matrix [I - Z(Z'Z)-1Z'] satisfies
Consequently,Z'e = Z'(y - y) = Z'[I - Z(Z'Z)-1Z']y = O,soy'e = p 'Z'e = o
Additionally, � '£ = y'[I-Z(Z'Zr1z'] [I-�(Z'Zr1z']y = y'[I-z(z'zr1z']y
= y' y - y' Z f3 To verify the expression for f3, we write
so
" "' "' "'
y - Zb = y - Z f3 + Z f3 - Zb = y - Z f3 + Z ( f3 - b) S(b) = (y - Zb)'(y - Zb) "
" " "
= (y - Z/3)'(y - Z/3) + " " + (/3 - b)'Z'Z(/3 - b) 2(y - Zf3)'Z(f3 - b) " " "'
"
= (y - Z/3)'(y - Z/3) + (/3 - b)'Z'Z(/3 - b) since (y - ZP)'Z = e'Z = 0' Th�firstterminS(b) doesnotdepend�nb andthe sec�nd is the squared length of Z ( f3 - b) Because Z has full rank, Z ( f3 - b) f 0
if f3 (Z'Z)-1Z'y Note that (Z'Z)-1 exists sinceZ'Zhasrankr # b, so the minimum sum of squares is unique and occurs for b = f3 = + 1 < n (IfZ'Zisnot
of full rank, Z' Za = 0 for some a # 0, but then a' Z' Za = 0 or Za = 0, which con
Result 7.1 shows how the least squares estimates p and the residuals e can be
obtained from the design matrix Z and responses y by simple matrix operations
Example 7.3 (Calcu lating the least squares estimates, the residuals,
and the residual sum of squares)
Calculate the least square estimates p , the residuals e, and the residual sum of squares for a straight-line model
fit to the data
Trang 7360 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
y'y == (y + y - y)'(y + y - y) == (y + i)'(y + e) == y'y + e' e (7-7)
Trang 8Section 7 3 Least Squares Esti mation 3 6 1
Since the first column of Z is 1, the condition Z' f; == 0 includes the requirement
The preceding sum of squares decomposition suggests that the quality of the models fit can be measured by the coefficient of determination
n
" "'2
£ J ej j=l
� (yj - y)2 j=l
� (yj - y)2 j=l
The quantity R2 gives the proportion o f the total variation in the Y/S "explained" by,
or attributable to, the predictor variables z1 , z2, • • • , Zr Here R2 (or the multiple correlation coefficient R == + Vfii) equals 1 if the fitted equation passes through all
tpe da!a points, s� that ej == 0 for all j At the other extreme, R2 is 0 if ffio == y and
{3 1 == {32 == · · · == f3r == 0 In this case, the predictor variables z1 , z2, • , Zr have no in fluence on the response
Geometry of Least Sq uares
A geometrical interpretation of the least squares technique highlights the nature of the concept According to the classical linear regression model,
Trang 9362 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
y on the plane consisting of all linear combinations of the columns of Z The resid ual vector e = y - y is perpendicular to that plane This geometry holds even when
Z is not of full rank
When Z has full rank, the projection operation is expressed analytically as multiplication by the matrix Z ( Z' Z) -1 Z' To see this, we use the spectral decompo sition (2-16) to write
where A1 > A2 > · · · > Ar+1 > 0 are the eigenvalues of Z' Z and e1, e2, , er+l are the corresponding eigenvectors If Z is of full rank,
(Z'Z)-1 = -A1 e1e1 + -A2 e2e2 + · · · + Ar+1 er+1e�+1
Consider qi == Ai 112Zei, which is a linear combination of the columns of Z Then q; Qk
== A-:-1/2 A -k1/2e�Z' Zek l l = A-:-1/2 A -k112e�Akek l l = 0 if i # k or 1 if i = k • That is the ' r + 1
vectors qi are mutually perpendicular and have unit length Their linear combinations span the space of all linear combinations of the columns of Z Moreover,
Z (Z'Z)-1Z' = i=1 � Aj1ZeiejZ' � i=1 qiq;
Trang 10Section 7 3 Least Sq u a res Esti mation 363
According to Result 2A.2 and Definition 2A.12, the projection of y on a linear
bination of { ql , q2, , q,+l } is � ( q;y) q; = � q;q; y = z (Z' Zf1Z'y = z{3
Thus, multiplication by Z (Z' Z) -1Z' projects a vector onto the space spanned by the columns of Similarly, Z.2 [I - Z (Z' Z) -1Z' J is the matrix for the proj ection of y on the plane perpendicular to the plane spanned by the columns of Z
Sampling Properties of Classical Least Squares Esti mators
The least squares estimator /3 and the residuals e have the sampling properties de tailed in the next result
Result 7.2 Under the general linear regression model in (7-3), the least squares
-1
estimator p = (Z'Z) Z'Y has
E(P) = P and Cov (P) = a-2(Z'Z)-1
The residuals e have the properties
E(e) = 0 and Cov (e) = a-2[I - Z (Z'Z)-1Z'] = a-2[I - H]
Also,E(e'e) = (n - r - 1)a-2, so defining
Moreover, p and e are uncorrelated
Proof Before the response Y = zp + e is observed, it is a random vector Now,
i = l
has rank r 1 + 1 and generates the unique projection o f y on the space spanned by the linearly dent columns of Z This is true for any choice of the generalized inverse (See [20] )
Trang 11indepen-364 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
since [I - Z (Z'Z)-1Z'] Z = Z - Z = 0 From (2-24) and (2-45),
E(P) = f3 + (Z'Z)-1Z' E(e) = f3 Cov (p) = ( Z' Z)-1Z' Cov (e) Z ( Z' Z) -1 = a-2(Z' Z)-1Z' Z ( Z' Z) -1
= a-2 ( z' z) -1 E(e) = [I - Z(Z'Z)-1Z']E(e) = 0
Cov(e) = [I - Z(Z'Z)-1Z']Cov(e) [I - Z(Z'Z)-1Z'J'
= a-2[I - Z(Z'Z)-1Z']
where the last equality follows from (7-6) Also,
Cov( p , e) = E[( p - f3)e'] = (Z'Z)-1Z'E(ee')[I - Z(Z'Z)-1Z']
= a-2(Z'Z)-1Z'[I - Z(Z'Z)-1Z'J = 0
because Z'[I - Z(Z'Z)-1Z'] = 0 From (7-10), (7-6), and Result 4.9,
e'e = e'[I - Z(Z'Z)-1Z'] [I - Z(Z'Z)-1Z']e
= e'[I - Z(Z'Z)-1Z'Je
= tr [ e' (I - Z ( Z' Z) -1 Z' ) e J -1
= tr([I - Z(Z'Z) Z'Jee') Now, for an arbitrary n X n random matrix W,
E(tr (W)) = E(W11 = E(W11) + + W22 E(W22) + · · · + + · · · Wnn) + E(Wnn) = tr[E(W)]
Thus, using Result 2A.12, we obtain
E(e'e) = tr([I - Z(Z'Z)-1Z']E(ee'))
•
The least squares estimator f3 possesses a minimum variance property that was first established by Gauss The following result concerns "best" estimators of linear
parametric functions of the form c' f3 = c0{30 + c1{31 + · · · + crf3r for any c
Result 7.3 (Gauss'3 least squares theorem) Let Y = Z/3 + e, where E( e) = 0, Cov (e) = a-2 I, and Z has full rank " " " r + 1 For any " c, the estimator
c' f3 = Cof3o + C1 {31 + + Crf3r
3Much later, Markov proved a less general result, which misled many writers into attaching his
name to this theorem
Trang 12Section 7 4 I nfe rences About the Reg ress ion M odel 3 6 5
of c' f3 has the smallest possible variance among all linear estimators of the form
a'Y = a1Y1 + a2Y2 + · · · + anYn
that are unbiased for c' f3
Proof For any fixed c, let a'Y be any unbiased estimator of c' f3 Then E(a'Y) = c' /3, whatever the value of f3 Also, by assumption, E(a'Y) = E( a' Z/3 + a' e) = a' Z/3 Equating the two expected value expressions yields a'Z/3 = c' f3 or (c' - a'Z)/3 = 0 for all /3, including the choice f3 = (c' - a'Z)' This implies that c' = a' Z for any unbiased estimator Now, c' f3 = c'(Z'Z) 1Z'Y = a*'Y with a* = Z (Z'Z) c Moreover, from " - -1 Result7.2E(P) = f3,soc'P = a*'Yis anunbiasedestimatorofc'f3 Thus,for any
a satisfying the unbiased requirement c' =
a' Z, Var(a'Y) = Var(a'Z/3 + a'e) = Var(a'e) = a'Io2a
= a-2(a - a* + a*)'(a - a* + a*)
= a-2 [ (a - a*)' (a - a*) + a*' a*]
since (a - a*)'a* = (a - a*)'Z(Z'Z)-1c = 0 from the condition (a - a*)'Z = a'Z - a*'Z = c' - c' = 0' Because a* is fixed and (a - a*)'(a - a*) is positive unless a= a*, Var(a'Y) is minimized by the choice a*'Y = c'(Z'Zf1Z'Y = c' {3 •
"
This powerful result states that substitution of fJ for f3 leads to the best estimator "
of c' f3 for any c of interest In statistical terminology, the estimator c' f3 is called the best (minimum-variance) linear unbiased estimator (BLUE) of c' /3
7.4 I N FERENCES ABOUT TH E REGRESSION M O D E L
We describe inferential procedures based on the classical linear regression model in (7-3) with the additional (tentative) assumption that the errors e have a normal dis tribution Methods for checking the general adequacy of the model are considered
in Section 7.6
Inferences Concerning the Regression Parameters
Before we can assess the importance of particular variables in the regression function
E(Y) = f3o + {31 Z1 + · · · + f3r Zr (7-1 1)
we must determine the sampling distributions of {3 and the residual sum of squares,
e' e To do so, we shall assume that the errors e have a normal distribution
Result 7.4 Let Y = Z/3 + e, where Z has full rank r + 1 and e is distributed
as Nn(O, o-21) Theil the maximum likelihood estimator of f3 is the same as the least squares estimator /3 Moreover,
P = (Z'Zf1Z'Y is distributed as N,+l ( /3, cr2(Z'Zf1)
Trang 13366 Chapter 7 M u ltiva riate Linear Reg ress ion Models
and is distributed independently of the residuals na2 e = Y - zp Further,
= e' e is distributed as o-2 X�-r-1 where (J-2 is the maximum likelihood estimator of o-2•
Proof Given the data and the normal assumption for the errors, the likeli hood function for f3 , o-2 is
c-j=1 V2iT 1 lT (21T )nf2lTn
e-( y- z {3) I ( y- z {3) /2a2 (21T )nf2lTn
For a fixed value o-2, the likelihood is maximized by minimizing (y - Z/3)' (y - Z/3) But this minimization yields the least squares estimate {J = ( Z' Z) -1 Z'y, which does not depend upon o-2• Therefore, under the normal assumption, !he maximum likeli hood and least squares approaches provide the same estimator f3 Next, maximizing
L(p, o-2) over o-2 [see (4-18)] gives
L({3, 02) = (21T )nf2( (J-2)n/2 1 e-n/2 where cl-2 = (y - z{3)'(y - z{3) n (7-12
) From (7-10), we can express e {3 and e as linear combinations of the normal variables Specifically,
Because Z is fixed, Result 4.3 implies the joint normality of {3 and e Their mean vec
tors and covariance matrices were obtained in Result 7.2 Again, using (7-6), we get
Since Cov ( p , e) = 0 for the normal random vectors p and e , these vectors are in dependent (See Result 4.5.) Next, let (A, e) be any eigenvalue-eigenvector pair for I - Z (Z' Z)-1Z' Then,
by (7-6), [I - Z (Z'Z)-1Z']2 = [I - Z (Z'Z)-1Z'] so
Ae = [I - Z(Z'Z)-1Z']e = [I - Z(Z'Z)-1Z'J2e = A[I - Z(Z'Z)-1Z'Je = A2e
That is, A = 0 or 1 Now, tr [I- Z (Z'Z)-1Z'] = n - r - 1 (see the proof of
Result 7.2),and from Result 4.9,tr[I - Z(Z'Z)-1Z'] = A1 + A2 + ··· + An, where A1 >
Trang 14Section 7.4 I nferences About the Reg ress ion M odel 3 6 7
where e1, e2, , en-r-1 are the normalized eigenvectors associated with the eigen values A1 = A2 = · · · = An-r-1 = 1 Let
V = - - - - -e2 � - - - - e
e�-r-1 Then V is normal with mean vector Cov(V;, Vk) = { e' 21e - 2e'e - 2 o:u 0 and k - u ; k - u '
That is, the Vi are independent N(O, o-2) and by (7-10),
i = k otherwise na2 = e' e = e' [l - Z (Z'Z)-1Z'J e = Vi + V� + · · · + V�-r-1
d "b d 2 2
IS
A confidence ellipsoid for fJ is easily constructed It is expressed in terms of the
estimated covariancematrixs2(Z'Z)-1,where s2 = e'ej(n - r - 1)
Result 7.5 Let Y = ZfJ + e, where Z has full rank r + 1 and e is Nn(O, o-21) Then a 100(1 - a)% confidence region for fJ is given by
(fJ-p)'Z'Z(fJ-P) < (r + 1)s2Fr+1,n-r-1(a) where Fr+1,n-r-1 (a) is the upper (100a)th percentile of an F-distribution with r and n - r - 1 d.f + 1 Also, simultaneous 100(1 - a)% confidence intervals for the f3i are given by
�i ± � V(r + 1)Fr+l,n-r-l(a), i = 0, 1, , r
- /\
where Var(f3J is the diagonal element of s2(Z'Z) corresponding to f3i· Proof
Consider the symmetric square-root matrix (Z' Z )112 [See (2-22).] Set 1/2 /\
V =
(Z'Z) (fJ - fJ) and note that E(V) = 0,
Cov(V) = (Z'Z)112 Cov(p) (Z'Z)112 = o-2(Z'Z)112(Z'Z)-1(Z'Z)1;2 = a-21 "
and V is normally distributed, since it consists of linear combinations of the {3/s Therefore, V'V = (P - fJ)'(Z'Z)112(Z'Z)112(P - fJ) = (P - fJ)' (Z'Z)(P - fJ) is distributed as o-2 X;+1• By Result 7.4 (n - r - 1) s2 = e' e is distributed as o-2 x�-r-1, independently of {3 and, hence, independently of V Consequently, [X;+1/(r + 1)]/[x�-r-1/(n - r - 1)] = [V'V/(r + 1)]/s2 has an Fr+1,n-r-1, distrib ution, and the confidence ellipsoid for fJ follows Projecting this ellipsoid for ( fJ - fJ) using Result 5A.1 with A-1 = Z'Z/s2, c2 = (r + 1)Fr+1,n-r-1(a), and u' = [0, ,0, 1,0, ,0] yields l /3; - �; I < V(r + 1)Fr+l,n-r-l(a) � ' where
- /\
Var (f3i) is the diagonal element of s2( Z' Z) corresponding to f3i· •
Trang 15368 Chapter 7 M u ltiva r i ate Li near Reg ress ion Models
The confidence ellipsoid is centered at the maximum likelihood estimate {3 , and its orientation and size are determined by the eigenvalues and eigenvectors of Z' Z If an eigenvalue is nearly zero, the confidence ellipsoid will be very long in the direction of the corresponding eigenvector Practitioners often ignore the "simultaneous" confidence property of the in terval estimates in Result 7.5 Instead, they replace (r + 1)Fr+1,n-r-1(a) with the
one-at-a-time t value tn-r-1 ( a/2) and use the intervals
when searching for important predictor variables
Example 7.4 (Fitti ng a regression model to real-estate data)
The assessment data in Table 7.1 were gathered from 20 homes in a Milwaukee,
Wisconsin, neighborhood Fit the regression model lj = f3o + {31 Zj1 + {32Zj2 + Bj where z1 =
total dwelling size (in hundreds of square feet), z2 =
assessed value (in thousands of dollars), and Y =
selling price (in thousands of dollars), to these data using the method of least squares A computer calculation yields TABLE 7 1 R EAL- ESTATE DATA
Z1 Total dwelling size (100 ft2)
15.31 15.20 16.25 14.33 14.57 17.33 14.48 14.91 15.25 13.89 15.18 14.44 14.87 18.63 15.20 25.76 19.05 15.37 18.06 16.35
Z2 Assessed value ($1000) 57.3 63.8 65.4 57.0 63.8 63.2 60.2 57.7 56.4 55.6 62.6 63.4 60.2 67.2 57.1 89.6 68.6 60.1 66.3 65.8
y Selling price ($1000) 74.8 74.0 72.9 70.0 74.9 76.0 72.0 73.5 74.5 73.5 71.5 71.0 78.9 86.5 68.0 102.0 84.0 69.0 88.0 76.0
Trang 1630.967 (7.88) + 2.634z1 (.785) + .045z2 (.285) with s == 3.473 The numbers in parentheses are the estimated standard devi ations of the least squares coefficients Also, R2 == .834, indicating that the data exhibit a strong regression relationship (See Panel 7.1, which contains there gression analysis of these data using the SAS statistical software package.) If
the residuals e pass the diagnostic checks described in Section 7 6, the fitted PANEL 7.1 SAS ANALYSIS FOR EXAMPLE 7.4 USING PROC REG
title 'Reg ression Ana lysis';
Sq uares
1 032.87506 204.99494
1 237.87000
76.55000 4.53630
Mean Square
51 6.43753
1 2.05853
Adj R-sq Parameter Estimates
PROGRAM COMMANDS
F value 42.828
0.81 49
T for HO:
Parameter = 0
3.929 3.353 0.1 58
OUTPUT
Prob > F 0.0001
Prob > ITI 0.001 1 0.0038 0.8760
Trang 17370 Chapter 7 M u ltiva r i ate Linear Reg ress ion M ode ls
equation could be used to predict the selling price of another house in the neigh borhood from its size and assessed value We note that a 95% confidence in
terval for {32 [see (7 -14)] is given by
�2 ± t17(.025) � =
.045 ± 2.110(.285)
or
( - 556, 647) Since the confidence interval includes {32 = 0, the variable z2 might be dropped from the regression model and the analysis repeated with the single predictor
variable z1 Given dwelling size, assessed value seems to add little to the pre diction of selling price •
Likeli hood Ratio Tests for the Regression Pa rameters
Part of regression analysis is concerned with assessing the effects of particular pre dictor variables on the response variable One null hypothesis of interest states that certain of the z/s do not influence the response Y These predictors will be labeled Zq+1, Zq+2, •
, z, The statement that Zq+1, Zq+2, .
, z, do not influence Y translates into the statistical hypothesis H0 :
(Z1Z1)-1Z1y
Result 7.6 Let Z have full rank r + 1 and e be distributed as Nn(O, o-21) The likelihood ratio test of H0 : f3(2) = 0 is equivalent to a test of H0 based on the extra sum of squares in (7-16) and s2 = (y -
Z p ) ' ( y - ZP)/(n - r - 1) In par ticular, the likelihood ratio test rejects (SSres(Z1 ) - SSres(Z) )/(r - q) H0 if
Trang 18Section 7.4 I nfe rences About the Reg ress ion Model 3 7 1
where Fr-q,n-r-1 (a) is the upper (100a )th percentile of an F-distribution with r - q
and n - r - 1 d.f
Proof Given the data and the normal assumption, the likelihood associated
with the parameters f3 and a-2 is
all coefficients in a subset are zero, fit the model with and without the terms corre sponding to these coefficients The improvement in the residual sum of squares (the extra sum of squares) is compared to the residual sum of squares for the full model via the F-ratio The same procedure applies even in analysis of variance situations where Z is not of full rank.4 More generally, it is possible to formulate null hypotheses concerning r - q linear combinations of f3 of the form H0 : C/3 = A0 • Let the (r - q) X (r + 1 ) ma trix C have full rank, let A0 = 0, and consider H0 : C/3 = 0
( This null hypothesis reduces to the previous choice when C = [o l ; (r-q)X(r-q) I ]·)
Under the full model, Cp is distributed as Nr-q(C/3, a-2C (Z'Z)-1C') We reject
4In situations where Z is not of full rank, rank(Z) replaces r + 1 and rank(Z1 ) replaces q + 1 in Result 7.6
Trang 19372 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
H0 : C/3 = 0 at level a if 0 does not lie in the 100(1 - a)% confidence ellipsoid for
C/3 Equivalently, we reject ( Cp )' ( C(Z' Z) -1C') -1( Cp) H0 : C/3 = 0 if
where (100a)thpercentileofanF-distributionwithr - qandn - r - 1d.f Thetestin(7-17) s2 = (y - Zp)'(y - ZP)/(n - r - 1) and Fr-q,n-r-1(a) is the upper
is the likelihood ratio test, and the numerator in the F-ratio is the extra residual sum of squares incurred by fitting the model, subject to the restriction that C f3 = 0 (See [22])
The next example illustrates how unbalanced experimental designs are easily
handled by the general theory just described Example 7.5 (Testi ng the importance of additional predictors
using the extra sum-of-sq uares approach)
Male and female patrons rated the service in three establishments (locations)
of a large restaurant chain The service ratings were converted into an index Table 7.2 contains the data for n = 18 customers Each data point in the table
is categorized according to location (1, 2, or 3) and gender (male =
0 and female = 1) This categorization has the format of a two-way table with un equal numbers of observations per cell For instance, the combination of loca tion 1 and male has 5 responses, while the combination of location 2 and female has 2 responses Introducing three dummy variables to account for location and two dummy variables to account for gender, we can develop a regression model linking the service index Y to location, gender, and their "interaction"
using the design matrix TABLE 7.2 RESTAU RANT-S E RVICE DATA
Trang 20Z =
Section 7.4 I nferences About the Reg ress ion Model 373
constant location gender interaction � � �
} 2 responses
The coefficient vector can be set out as
f3' = [f3o , ,81, ,82, ,(33, T1, T2, 'Y11' 'Y12' 'Y21, 'Y22, 'Y31' 'Y32]
where the ,8/s tion of service, the T/s represent the effects of gender on the service index, and ( i > 0) represent the effects of the locations on the determina the 'Yik's represent the location-gender interaction effects The design matrix Z is not of full rank (For instance, column 1 equals the sum of columns 2-4 or columns 5-6.) In fact, rank(Z) = 6
For the complete model, results from a computer program give
SSres(Z) = 2977.4 and n The model without the interaction terms has the design matrix zl con - rank(Z) = 18 - 6 = 12 sisting of the first six columns of Z We find that
SSres(Z1) = 3419.1 with n
Trang 21374 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
The F-ratio may be compared with an appropriate percentage point of an F-distribution with 2 and 12 d.f This F-ratio is not significant for any reason able significance level a Consequently, we conclude that the service index does not depend upon any location-gender interaction, and these terms can be dropped from the model Using the extra sum-of-squares approach, we may verify that there is no difference between locations (no location effect), but that gender is significant; that is, males and females do not give the same ratings to service In analysis-of-variance situations where the cell counts are unequal, the variation in the response attributable to different predictor variables and their interactions cannot usually be separated into independent amounts To evalu ate the relative influences of the predictors on the response in this case, it is necessary to fit the model with and without the terms in question and compute the appropriate F-test statistics •
7.5 I N FERENCES FROM TH E ESTIMATED REGRESSION FUNCTION
Once an investigator is satisfied with the fitted regression model, it can be used to solve two prediction problems �et z0 = [1, z01, , Zor] be selected values for the predictor variables Then z0 and f3 can be used (1) to estimate the regression func tion {30 + {31 z01 + · · · + f3rZor at z0 and (2) to estimate the value of the response Y
at z0
Esti mating the Reg ression Fu nction at z0
Let Y0 denote the value of the response when the predictor variables have values z0 = [1, z01, , ZorJ According to the model in (7-3), the expected value of Yo is
E(Yo I z o ) = f3o + {31Zo1 + · · · + f3rZor = Zo/3 (7-18)
"
Result 7.7 For the linear regression model in (7-3), z0f3 is the unbiased linear estimator of E(Y0 I z0) with minimum variance, Var(z0P) =
z0( Z'Z ) -1 z0o2 If the errors e are normally distributed, then a 100( 1 - a)% confidence interval for
Trang 22Section 7 5 I nferences from the Est i m ated Reg ress ion Fu nction 3 7 5
is distributed as x� -r - 1/(n - r - 1 ) Consequently, the linear combination z'oP is
(z'oP - z'o{J )j\1 o-2z'o(Z' Z)-1z0 (z'o{J - z'o{J )
v?J;?
is distributed as tn-r-1• The confidence interval follows
Forecasti ng a New Observation at z0
•
Prediction of a new observation, such as YQ, at than estimating the expected value of Y0 • According to the regression model of z'o = [1, z01 , , Zor J is more uncertain (7 -3),
or (new response Yo) = (expected value of Y0 = z'ofJ + eo Y0 at z0) + (new error)
where The errors eo is distributed as e N(O, o-2) and is independent of e and, hence, of p and s2
influence the estimators /3 and s2 through the responses Y, but eo
does not Result 7.8
Given the linear regression model of (7-3), a new observation Yo
has the unbiased predictor
"' "' "' "'
z'ofJ = f3o + {31 Zo1 + · · · + f3rZor
"'
The variance of the forecast error Yo - z0{J "' is 1
Var (Yo - z'o/3 ) = o-2(1 + z0(Z' Z)- z0)
When the errors e
have a normal distribution, a 100(1 - a)% prediction interval for
Yo is given by zaP ± t
n -r-l( � ) Vs2(1 + zo(z'zr1zo)
where tn-r-1 ( a/2) is the upper 100( a/2 )th percentile of a t-distribution with
n - r Proof - 1 degrees of freedom
We forecast Yo by z'o/3 , which estimates E(Yo I z0) By Result 7.7, z'o/3
-1
has Yo - z'o/3 = Zo/J + eo - zo/3 = eo + Zo( fJ -p) E(z'ofJ ) = z'o{J and Var (z'ofJ ) = z'o(Z' Z) z0o-2 • The forecast error is then
Thus, E(Yo - zo/3 ) = E(eo) +
E( z0( fJ - fJ ) ) = 0 "' so the predictor is unbiased Since "' 1 eo and fJ are independent, 1
Var (Y0 - z0{J ) = Var (eo) + Var (z0f3 ) = a-2 + z0(Z' Z)- z0o-2 = a-2(1 + z'o(Z' Z)- z0)
"'
If it is further assumed that e
has a normal distribution, then "' fJ is
normal-ly distributed, and so is the linear combination (Y0 - z'oP )j\1 o-2(1 + z0(Z' Z)-1z0) Y0 - z'o{J Consequently,
is distributed as N(O, 1 ) Dividing this ratio by
v?J;?, which is distributed as V X
� -r-1/(n - r "' - 1 ) , we obtain
(Yo - zo/3 ) Vs2(1 + z0(Z'Z)-1z0)
which is distributed as tn-r-1 The prediction interval follows immediately •
Trang 23376 Chapter 7 M u ltivariate Linear Reg ress ion Models
The prediction interval for Yo is wider than the confidence interval for estimating the value of the regression function E(Yo I z0) = z0/J The additional uncertainty in forecasting Y0 , which is represented by the extra term s2 in the expression
s2(1 + z0(Z' Z)-1z0) , comes from the presence of the unknown error term s0 •
Example 7.6 (I nterva l esti mates for a mean response
and a futu re response)
Companies considering the purchase of a computer must first assess their future needs in order to determine the proper equipment A computer scientist collected data from seven similar company sites so that a forecast equation of computer-hardware requirements for inventory management could be developed The data are given in Table 7.3 for
z 1 = customer orders (in thousands)
z2 = add-delete item count (in thousands)
Y = CPU (central processing unit) time (in hours)
TABLE 7.3 COM PUTE R DATA
Source: Data taken from H P Artis, Fo recasting Computer Require
ments: A Forecaster's Dilemma (Piscataway, NJ: Bell Laboratories, 1 979)
Construct a 95% confidence interval for the mean CPU time, E(Yo I z0) =
{30 + {31 Zo1 + {32z02 at z0 = [1, 130, 7.5] Also, find a 95% prediction interval for a new facility's CPU requirement corresponding to the same z0 •
A computer program provides the estimated regression function
Trang 24Section 7 6 M o d e l Check ing and Other As pects of Reg ress ion 377
and sVzO(Z' Zr1z0 = 1.204( 58928) = .71 We have t4( 025) = 2.776, so the
95% confidence interval for the mean CPU time at z0 is
zoP ± t4( 025)sV'zO(Z' Zr1zo = 151.97 ± 2.776( 71 )
or (150.00, 153.94)
Since sV . 1 _+_z_0 - (Z_'_Z_)_ -1z- 0
= (1.204) (1.16071 ) = 1.40, a 95% prediction interval for the CPU time at a new facility with conditions z0 is
z'oP ± t4(.025)sV'1 + z0(Z' Z)-1z0 = 151.97 ± 2.776( 1.40)
7.6 MODEL CH ECKI NG AND OTH ER ASPECTS OF REGRESSION
Does the Model Fit?
Assuming that the model is "correct," we have used the estimated regression function
to make inferences Of course, it is imperative to examine the adequacy of the model
If the model is valid, each residual ej is an estimate of the error sj , which is assumed to
be a normal random variable with mean zero and variance a-2 • Although the residuals
e have expected value O, their covariance matrix a-2 [I - Z (Z' Z)-1Z' ] = a-2 [I - H]
is not diagonal Residuals have unequal variances and nonzero correlations Fortunately, the correlations are often small and the variances are nearly equal
Because the residuals e have covariance matrix a-2 [I - H], the variances of the
s j can vary greatly if the diagonal elements of H, the leverages h j j , are substantially different Consequently, many statisticians prefer graphical diagnostics based on studentized residuals Using the residual mean square s2 as an estimate of a-2 , we have
Var ( ej) = s2(1 - hjj) ,
and the studentized residuals are
Trang 253 7 8 Chapter 7 M u ltiva riate Linear Reg ress ion Models
Residuals should be plotted in various ways to detect possible anomalies For general diagnostic purposes, the following are useful graphs:
1 Plot the residuals ej against the predicted values Yj = ffio + ffil Zjl + + ffirz; r Departures from the assumptions of the model are typically indicated by two types of phenomena:
(a) A dependence of the residuals on the predicted value This is illustrated in Figure 7.2(a) The numerical calculations are incorrect, or a {30 term has been omitted from the model
(b) The variance is not constant The pattern of residuals may be funnel shaped,
as in Figure 7.2(b ), so that there is large variability for large y and small variability for small y If this is the case, the variance of the error is not constant, and transformations or a weighted least squares approach (or both) are required (See Exercise 7.3.) In Figure 7.2( d), the residuals form
a horizontal band This is ideal and indicates equal variances and no dependence on y
2 Plot the residuals ej against a predictor variable, such as z1 , or products of pre dictor variables, such as zi or z1 z2 • A systematic pattern in these plots suggests the need for more terms in the model This situation is illustrated in Figure 7.2(c)
3 Q-Q plots and histograms Do the errors appear to be normally distributed? To answer this question, the residuals ej or ej can be examined using the techniques discussed in Section 4.6 The Q-Q plots, histograms, and dot diagrams help to detect the presence of unusual observations or severe departures from normality that may require special attention in the analysis If n is large, minor departures from normality will not greatly affect inferences about /3
Trang 26Section 7 6 Model Checking and Other Aspects of Reg ress ion 379
4 Plot the residuals versus time The assumption of independence is crucial, but hard to check If the data are naturally chronological, a plot of the residuals versus time may reveal a systematic pattern (A plot of the positions of the residuals in space may also reveal associations among the errors.) For instance, residuals that increase over time indicate a strong positive dependence A statistical test of independence can be constructed from the first autocorrelation,
n 2: e · "'2 1
j=l
(7-22)
of residuals from adj acent periods A popular test based on the statistic
� (Sj - ej-d I� e; 0 2(1 - rl) is called the Durbin-Watson test (See [13] for a description of this test and tables of critical values.)
Example 7.7 {Residual plots)
Three residual plots for the computer data discussed in Example 7.6 are shown
in Figure 7.3 The sample size n = 7 is really too small to allow definitive judgments; however, it appears as if the regression assumptions are tenable •
Trang 27380 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
If several observations of the response are available for the same values of the predictor variables, then a formal test for lack of fit can be carried out (See [12] for
a discussion of the pure-error lack-of-fit test.)
Leverage and I nfl uence
Although a residual analysis is useful in assessing the fit of a model, departures from the regression model are often hidden by the fitting process For example, there may
be "outliers" in either the response or explanatory variables that can have a considerable effect on the analysis yet are not easily detected from an examination of residual plots In fact, these outliers may determine the fit
The leverage h1j is associated with the jth data point and measures, in the space
of the explanatory variables, how far the jth observation is from the other n - 1
observations For simple linear regression with one explanatory variable z,
Observations that significantly affect inferences drawn from the data are said
to be influential Methods for assessi�g influence are typically based on the change
in the vector of parameter estimates, f3 , when observations are deleted Plots based upon leverage and influence statistics and their use in diagnostic checking of regression models are described in [2] , [4] , and [9] These references are recommended for anyone involved in an analysis of regression models
If, after the diagnostic checks, no serious violations of the assumptions are detected, we can make inferences about f3 and the future Y values with some assurance that we will not be misled
Additional Problems in Li near Reg ression
We shall briefly discuss several important aspects of regression that deserve and receive extensive treatments in texts devoted to regression analysis (See [9] , [10], [12], and [20].)
Selecting predictor variables from a large set In practice, it is often difficult
to formulate an appropriate regression function immediately Which predictor variables should be included? What form should the regression function take?
When the list of possible predictor variables is very large, not all of the variables can be included in the regression function Techniques and computer programs designed to select the "best" subset of predictors are now readily available The good ones try all subsets: z1 alone, z2 alone, , z1 and z2, • • The best choice is decided by
Trang 28Section 7 6 Model Check i n g a n d Oth e r Aspects of Reg ress ion 3 8 1
examining some criterion quantity like R2• [See (7-9).] However, R2 always increases
with the inclusion of additional predictor variables Although this problem can be circumvented by using the adjusted R2, R2 = 1 - (1 - R2)(n - 1)/(n - r - 1), a better statistic for selecting variables seems to be Mallow's C P statistic (see [11 ]),
( with (residual sum of squares for subset model p parameters, including an intercept) )
C =
- (n - 2p)
P (residual variance for full model)
A plot of the pairs (p, Cp) , one for each subset of predictors, will indicate models that forecast the observed responses well Good models typically have (p, Cp) coordinates near the 45 ° line In Figure 7.4, we have circled the point corresponding to the "best" subset of predictor variables
If the list of predictor variables is very long, cost considerations limit the number of models that can be examined Another approach, called stepwise regression (see [1 2]), attempts to select important predictors without considering all the possibili
ties The procedure can be described by listing the basic steps (algorithm) involved
in the computations:
Step 1 All possible simple linear regressions are considered The predictor variable that explains the largest significant proportion of the variation in Y (the variable that has the largest correlation with the response) is the first variable to enter the regression function
Figure 7.4 CP plot for computer data from Example 7.6 with th ree predictor va riables (z1 = orders, z2 =
add-delete count, z3 = number of items; see the example and orig inal sou rce)
Trang 29382 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
Step 2 The next variable to enter is the one (out of those not yet included) that makes the largest significant contribution to the regression sum of squares The significance of the contribution is determined by an F-test (See Result 7.6 )
The value of the F-statistic that must be exceeded before the contribution of a variable is deemed significant is often called the F to enter
Step 3 Once an additional variable has been included in the equation, the individual contributions to the regression sum of squares of the other variables already in the equation are checked for significance using F-tests If the F-statistic is less than the one (called the F to remove) corresponding to a prescribed significance level, the variable is deleted from the regression function Step 4 Steps 2 and 3 are repeated until all possible additions are nonsignificant and all possible deletions are significant At this point the selection stops Because of the step-by-step procedure, there is no guarantee that this approach will select, for example, the best three variables for prediction A second drawback
is that the (automatic) selection methods are not capable of indicating when transformations of variables are useful
Colinearity If Z is not of full rank, some linear combination, such as Za, must equal 0 In this situation, the columns are said to be colinear This implies that Z'Z
does not have an inverse For most regression analyses, it is unlikely that Za = 0 exactly Yet, if linear combinations of the columns of Z exist that are nearly 0, the calculation of (Z'Z)-1 is numerically unstable Typically, the di�gonal entries of (Z'Z)-1
will be large This yields large estimated variance� for the f3 /s and it is then difficult
to detect the "significant" regression coefficients f3i The problems caused by co linearity can be overcome somewhat by (1) deleting one of a pair of predictor variables that are strongly correlated or (2) relating the response Y to the principal compo nents of the predictor variables-that is, the rows zj of Z are treated as a sample, and
the first few principal components are calculated as is subsequently described in Section 8.3 The response Y is then regressed on these new predictor variables
Bias caused by a misspecified model Suppose some important predictor variables are omitted from the proposed regression model That is, suppose the true model has Z = [Z1 ! Z2] with rank r + 1 and
- + f3(2) (nX1) e ((r-q)X1)
(7-23)
where E( e) = 0 and Var( e) =
a21 However, the investigator unknowingly fits
a model using only the first q predictors by minimizing the error sum of squares (Y - Z1/3(1))'(Y - Z1f3(1))· The least squares estimator of f3(1) is P(1) :=
(Z1Z1)-1Z1Y Then, unlike the situation when the model is correct,
E(p(1)) =
(Z1Z1)-1Z1E(Y) =
(Z1Z1)-1Z1(Z1f3(1) + Z2P(2) + E(e)) (7-24)
Trang 30"
Section 7 7 M u ltivariate M u lt i p l e Reg ress ion 3 8 3
That is, 13(1) is a biased estimator of f3(I) unless the columns of Z1 are perpendicular
to those of Z2 (that is, Z1 Z2 � 0) If important variables are missing from the model,
the least squares estimates f3 (l) may be misleading
7.7 M U LTIVARIATE M U LTI PLE REGRESSION
In this section, we consider the problem of modeling the relationship between m re sponses Y]_, sponse is assumed to follow its own regression model, so that Y2, , Ym and a single set of predictor variables z1, z2, , z, Each re
Yi = 13ol + I31 1 Z1 + . · + l3r 1 Zr + e1 Y2 = 13o2 + I312 Z1 + · · · + 13r2 Zr + e2 (7-25)
Ym = 13om + 13I mZ1 + · · · + l3rmZr + em The error term e' = [e1 , e2, , em] has E(e) = 0 and Var(e) = I Thus, the error terms associated with different responses may be correlated To establish notation conforming to the classical linear regression model, let [ Zjo , Zjr, , Zjr] denote the values of the predictor variables for the jth trial, let Yj = [lj 1 , lj2, , �·m] be the responses, and let ej = [ ej1 , ej2, , ejm] be the er
rors In matrix notation, the design matrix
e1 e2
Trang 313 84 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
Simply stated, the ith response Y(i) follows the linear regression model
Y(i) = ZfJ(i) + B(i) , i = 1 , 2, , m (7-27 ) with Cov ( B(i)) = crii I However, the errors for different responses on the same trial can be correlated
Given the outcomes Y and the values of the preftictor variables Z with full column rank, we determine the least squares estimates fJ(i) exclusively from the observations Y(i) on the ith response In conformity with the single-response solution, we
take
Pu) = ( Z ' Z)-1Z' Y(i) Collecting these univariate least squares estimates, we obtain
or
(7-28)
(7-29)
For any choice of parameters B = [b(l) ! b(z) ! · · · l b(m) J , the matrix of errors
is Y - ZB The error sum of squares and cross products matrix is (Y - ZB) ' (Y - ZB )
= [ (Y(ll - Zb(lJ\(Y(ll - Zb(ll ) (Y(m) - Zb(m) ) (Y(l) - Zb(l) )
by the choice B = f3 Also, the generalized variance I (Y - ZB ) ' (Y - ZB ) I is min
imized by the least squares estimates /J (See Exercise 7.11 for an additional alized sum of squares property.) ,
gener-Using the least squares estimates f3, we can form the matrices of
Predicted values: Y = z{J = Z (Z' Z)-1Z' Y Residuals: e = Y - Y = [I - Z ( Z'Z)-1Z ' ] Y (7-3 1)
Trang 32Section 7 7 M u ltiva riate M u lt i p l e Reg ress ion 3 8 5
The orthogonality conditions among the residuals, predicted values, and columns of
Z, which hold in classical linear regression, hold in multivariate multiple regression They follow from Z' [I - Z ( Z ' Z )-1Z' ] = Z' - Z' = 0 Specifically,
Z' e = Z' [I - Z ( Z' Z)-1 Z' ] Y = 0 (7-32)
so the residuals e(i) are perpendicular to the columns of Z Also,
(7-33) confirming that }he predicted values Y(i) are perpendicular to all residual vectors e(k) Because Y = Y + e,
or
(total sum of squaresand cross products = ) (predicted sum of squaresand cross products ) + (resi0 cross pro uc s �ual ( error) squareds ant �urn)
The residual sum of squares and cross products can also be written as
e�e = Y' Y - Y'Y = Y'Y - iJ'z'ziJ
Example 7.8 (Fitti ng a mu ltivariate straight-l ine reg ression model)
to two responses Yi and Y2 using the data in Example 7.3 These data, augmented by observations on an additional response, are as follows:
0
1 -1
Trang 33386 Chapter 7 M u ltiva r i ate Li near Reg ress ion Models
PANEL 7.2 SAS ANALYSIS FOR EXAMPLE 7.8 USING PROC GLM
title 'Mu ltivariate Reg ression Analysis';
General Linear Models Proced u re
OUTPUT
F Value Pr > F
Y1 Mean 5.00000000
Y2 Mean
1 00000000
F Value Pr > F 7.50 0.07 1 4
Trang 34PANEL 7.2 (continued)
Statistic
Wilks' Lambda
Pillai's Trace
Hotel ling-Lawley Trace
Roy's G reatest Root
Section 7 7 M u ltiva r i ate M u ltiple Reg ress ion 387
Y1 Y2
Trang 35388 Chapter 7 M u ltiva r i ate Li near Reg ress ion M od e l s
Result 7.9 For the least squares estimator {J = [Jl(l) J1(2) i · · · i Jl(mJl
determined under the multivariate multiple regression model (7-26) with full rank(Z) = r + 1 < n,
Trang 36Section 7 7 M u ltiva r i ate M u lt i p l e Reg ress ion 389
Proof The ith response follows the multiple regression model
Y(i) = Zf3(i) + e(i), E( eu)) = 0, and E( e(i)e(i)) = a-iii
Also, as in (7-10),
e(i) = Y(i) - Y(i) = [I - Z(Z'Z)-1Z'J Y(i) = [I - Z(Z'Z)-1Z'] e(i)
so E( p(i)) = f3(i) and E( iu)) = 0
Next,
Co v ( P u) , P ( k) ) = E ( P ( i) - /3 o) ) ( P ( k) - /3 ( k) ) '
= (Z'Z)-1Z' E(eu)e(k))Z (Z'Z)-1 = o-ik(Z'Z)-1 Using Result 4.9 and the proof of Result 7.2, with U any random vector and
A a fixed matrix, we have that E[U' AU] Consequently, = E[tr(AUU')] = tr[AE(UU')J E( i(i)e(k)) = = E( e(i)(I - Z (Z' Z) -1Z') e(k)) = tr [(I - Z (Z' Z) -1Z') o-iki]
a-i k tr [ (I - Z ( Z' Z) -1 Z' ) ] = a-i k ( n - r - 1 )
as in the proof of Result 7.2 Dividing each entry e(i)e(k) of e ' e by n - r - 1 , we ob
tain the unbiased estimator of I Finally,
Cov(p0), e(k)) = = E[(Z'Z)-1Z'e(i)e(k)(I - Z(Z'Z)-1Z')]
E[zb(f3u) - Pu)) (f3(k) - P(k))'zo] = = zb(E(f3(i) - Pu)) (f3(k) - P(k))')zo
Trang 37390 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
The related problem is that of forecasting a new observation vector Y0 ==
[Y01, Y02, •
, Yom] at z0 According to the regression model, Yoi = z0fJ(i) + s0i where the "new" error Eo E(s0J = O and E(s0isok) = [ s01, s02, , sam] is independent of the errors = A o-ik· Theforecasterrorforthe ith component ofY0is A e and satisfies
Yoi - zafJ(i) = Yoi - zbfJu) A + zafJ(i) - zafJ(i)
= eoi - Zo(fJ(i) - fJ(i))
so E(Yoi - z0fJu)) = E(s0J - z0E(IJ(i) - fJu)) = 0, indicating that z0fJu) is an unbiased predictor of Yoi· The forecast errors have covariances A A
= E(soi - zo(IJ(i) - IJ(i)))(sok - zb(fJ(k) - fJ(k)))
= E(soisok) + zbE(Pu) - IJ(i)) (P(k) - fJ(k))'zo
- zbE((P(i) - IJ(i))sok) - E(soi(P(k) - fJ(k))')zo
Note that E( (P(i) - !l(i)) eok) = 0 sine� p(i) =
(Z' Zf1Z' e(i) + flu) is independent
of E0• A similar result holds for E(s0i(J3(k) - f3(k))') Maximum likelihood estimators and their distributions can be obtained when
the errors e have a normal distribution
Result 7.10 Let the multivariate multiple regression model in (7-26) hold with full rank (Z) = r + 1, n >
(r + 1) + m, and let the errors e have a normal distri bution Then
A
is t�e maximum likelihood estimator of fJ and fJ ha� a normal distribution with -1 E(/J) = /J and Cov(fJ(i), fJ(k)) = o-ik(Z'Z) Also,/J is independent ofthe max- imum likelihood estimator of the positive definite I given by
Proof According to the regression model, the likelihood is determined from the data Y = [Y1, Y2, , YnJ' whose rows are independent, with Y1 distributed as Nm(fJ'zj, I) We first note that Y - ZfJ = [Y1 - /J'z1� Y2 - /J'z2, , Yn - fJ'znJ' so n
(Y - Z/J)' (Y - Z/J) = j=1 2: (Yj - fJ'zj) (Yj - fJ'zj)'
Trang 38The , matrix Z (/3 - /J)I-1(/J - /J)'Z' is the form A' A, with
A = I-112(/J - /J)'Z', and, from Exercise 2.16, it is nonnegative definite Therefore, its eigenvalues are nonnegative also Since, by Result 4.9, "
and Pu) - f3u) *, 0, implies that Z ( f3(i) - f3(i) ) =I= 0, in which case
tr [?; (/3 - /J)I-1(/3 - /J)'Z' ] > c' I-1c > 0, where c' is any nonzero row of
Z;_(/J - /J) Applying Result 4.10 with B = e' e , b = n/2, and p = m, we find that
fJ and i = n-1e' e are the maximum likelihood estimators of fJ and I, respective
It remains to establish the distributional results From (7-36), we know that f3u)
and eu) are linear combinations of the elements of e Specifically,
p(i) = (Z' Z) -1Z' e(i) + f3u) e(i) = [I - Z (Z' Z)-1Z' J eu) ' i = 1, 2, , m
Trang 39392 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models
Therefore, by Result 4.3, {3(1) ' {3(2) ' 0 0 0 ' P(m) ' e(1) ' e(2) ' 0 0 ' e(m) are jointly normal Their mean vectors and covariance matrices are given in Result 7.9 Since e and {3
have a zero covariance matrix, by Result 4.5 they are independent Further, as in
e' e = e' [I - Z (Z' Z)-1Z' ] e = €=1 � e'eeeee = €=1 � VeV€
Result 7.10 provides additional �upport for using least squares estimates When the errors are normally distributed, /3 and n-1e' e are the maximum likelihood estimators of {3 and I, respectively Therefore, for large samples, they have nearly the smallest possible variances
Comment The multivariate multiple regression model poses no A new computational problems Least squares (maximum likelihood) estimates, f3(i) =
(Z'Z)-1Z'y(i) ' are computed individually for each response variable Note, however, that the model requires that the same predictor variables be used for all responses
Once a multivariate multiple regression model has been fit to the data, it should
be subjected to the diagnostic checks described in Section 7.6 for the single-response model The residual vectors [ ej1 , ej2, • • • , ejm] can be examined for normality or outliers using the techniques in Section 4.6
The remainder of this section is devoted to brief discussions of inference for the normal theory multivariate multiple regression model Extended accounts of these procedures appear in [1] and [22]
Li keli hood Ratio Tests for Regression Parameters
The multiresponse analog of (7-15), the hypothesis that the responses do not depend
on Zq+1 , Zq+2, • • • , Zr , becomes
H0 : /3(2) = 0 where {3 =
/3(1) ((q+1)Xm) /3(2) ((r-q)Xm)
(7-43)
Trang 40Section 7 7 M u ltivari ate M u ltiple Reg ress ion 3 9 3
Setting Z = [ (nX(q+l)) Z1 (nX(r-q)) Z2 ] 'we can write the general model as
Under H0: /3(2) = 0, Y = Z1/3(l) + e and the likelihood ratio test of H0 is based on the quantities involved in the
extra sum of squares and cross products
= n ( I 1 - I)
where {J(lJ = (Z!Z1r1ZlY and i1 = n -1(Y - zl{J(ll)' (Y - zl{J(lJ)·
From (7-42), the likelihood ratio, A, can be expressed in terms of generalized variances:
Equivalently, Wilks' lambda statistic
is equivalent to rejecting -2ln A = H0 -n for large values of ln (III) I il l = -n ln I ni + n(il - i) lnil I
For n large,5 the modified statistic
- [ n - r - 1 _ l:_ (m - r 2 + q + 1 ) ] ln ( 1�1) I I1 I
has, to a close approximation, a chi-square distribution with m( r - q) d.f
Proof (See Supplement 7A.)
"
•
If Z is not of full rank, but has rank r1 + 1, then {3 = (Z' Z)-Z'Y, where
(Z'Z)- is the generalized inverse discussed in [19] (See also Exercise 7.6.) The dis tributional conclusions stated in Result 7.11 remain the same, provided that r is re placed by r1 and q + 1 by rank (Z1) However, not all hypotheses concerning {3 can
5Technically, both n - r and n - m should also be large to obtain a good chi-square approximation