1. Trang chủ
  2. » Khoa Học Tự Nhiên

Ebook Applied multivariate statistical analysis (5th edition) Part 2

414 351 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 414
Dung lượng 15,6 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

(BQ) Part 2 book Applied multivariate statistical analysis has contents: Multivariate linear regression models, principal components, factor analysis and inference for structured covariance matrices; canonical correlation analysis; discrimination and classification; clustering, distance methods, and ordination.

Trang 1

on the responses Unfortunately, the name regression, culled from the title of the first paper on the subject by F Galton [14], in no way reflects either the importance

or breadth of application of this methodology In this chapter, we first discuss the multiple regression model for the predic­ tion of a single response This model is then generalized to handle the prediction of

several dependent variables Our treatment must be somewhat terse, as a vast lit­ erature exists on the subject (If you are interested in pursuing regression analysis, see the following books, in ascending order of difficulty: Bowerman and O'Connell [5], Neter, Wasserman, Kutner, and Nachtsheim [17], Draper and Smith [12], Cook and Weisberg [9], Seber [20], and Goldberger [15].) Our abbreviated treatment high­ lights the regression assumptions and their consequences, alternative formulations

of the regression model, and the general applicability of regression techniques to seemingly different situations

7.2 TH E CLASSICAL LIN EAR REGRESSION MODEL

3 54

Let z1 , z2, • • •

, Zr be r predictor variables thought to be related to a response variable

Y For example, with r ==

4, we might have

Y ==

current market value of home

Trang 2

Section 7 2 The Classica l Linear Reg ress ion M odel 3 5 5

square feet of living area z2 = location (indicator for zone of city) z3 = appraised value last year

z4 = quality of construction (price per square foot) The classical linear regression model states that in a continuous manner on the z/s, and a random error surement error and the effects of other variables not explicitly considered in the model Y is composed of a mean, which depends s, which accounts for mea­ The values of the predictor variables recorded from the experiment or set by the in­ vestigator are treated as fixed The error (and hence the response) is viewed as a ran­ dom variable whose behavior is characterized by a set of distributional assumptions Specifically, the linear regression model with a single response takes the form

Y = f3o + f31Z1 + · · · + {3,z, + s [Response] = [mean (depending on z1, z2, •

, Zr) ] + [error]

The term "linear" refers to the fact that the mean is a linear function of the unknown parameters as first -order terms {30 , {31 , , f3r · The predictor variables may or may not enter the model With n

independent observations on Y and the associated values of zi, the com­ plete model becomes Y1 = f3o + /31 Z1 1 + f32Z12 +

Yn = f3o + /31Zn 1 + f32Zn2 + · · · + f3rZnr + Bn where the error terms are assumed to have the following properties:

or

1 E(sj) = 0;

2 3 Var Cov (sj , sk) = O, j # k ( sj) = a-2 (constant); and

In matrix notation, (7-1) becomes Yi

and

2 Cov (e) = E(ee' ) = a-21

s1 + s2

Bn

(7-1) (7-2)

Trang 3

3 5 6 Chapter 7 M u ltiva r i ate Linear Reg ression Models

Note that a one in the first column of the design matrix Z is the multiplier of the constant term {30 • It is customary to introduce the artificial variable Zjo = 1, so that

We now provide some examples of the linear regression model

Example 7.1 (Fitti ng a straight-line regression model) Determine the linear regression model for fitting a straight line

Mean response = E(Y) = {30 + f31z1

to the data

Before the responses Y' = [Yi , }2, .. , YSJ are observed, the errors e' =

[ 81 , 82, , 85] are random, and we can write

The data for this model are contained in the observed response vector y and the

design matrix Z, where

Trang 4

Section 7.2 The Classica l Li near Reg ress ion Model 357

otherwise

if the observation is from population 3 otherwise

{ 1 if the observation is z2 == 0 from population otherwise 2

and {30 == JL, {31 == T1 , {32 == T2 , {33 == T3 • Then

1j == f3o + f3IZjl + f32 Zj2 + f33 Zj3 + Bj , j == 1, 2, ' 8 where we arrange the observations from the three populations in sequence Thus, we obtain the observed response vector and design matrix

Trang 5

3 5 8 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

7.3 LEAST SQUARES ESTI MATIO N

One of the obj ectives of regression analysis is t o develop an equation that will allow the investigator to predict the response for given values of the predictor variables Thus, it is necessary to "fit" the model in (7-3) to the observed yj corresponding to the known values 1 , Zj1, , Zjr· That is, we must determine the values for the regression coefficients f3 and the error variance a-2 consistent with the available data

Let b be trial values for /3 Consider the difference yj - b0 - b1 z j 1 - · · · - b, z 1 r between the observed response yj and the value b0 + b1zj1 + · + brZjr that would

be expected if b were the "true" parameter vector Typically, the differences

yj - b0 - b1zj1 - · - b,Zjr will not be zero, because the response fluctuates (in a manner characterized by the error term assumptions) about its expected value The

method of least squares selects b so as to minimize the sum of the squares of the differences:

n

S(b) == j=1 � (yj - bo - b1Zj1 - " · - brZjr)2

The coefficients b chosen by the least squares criterion are called least squqres esti­

mates of the regression parameters /3 They will henceforth be denoted by f3 to em­ phasize their role as es!imates of f3

The coefficients f3 are consistentA with !he data in th� sense that they produce estimated (fitted) mean responses, {30 + {31zj1 + · + f3rZjr, the sum of whose squares of the differences from the observed yj is as small as possible The deviations

Bj == Yj - f3o - /31 Zj1 - " · - f3rZjr, j == 1 , 2, .. , n (7-5) are called residuals The vector of residuals e == y - zp contains the information about the remaining unknown parameter a-2• (See Result 7.2.)

Result 7.1 Let Z have full rank r + 1 < n.1 The least squares estimate of f3

in (7-3) is given by

p == (Z' z)-1Z'y

Let y == zp == Hy denote the fitted values of y, where H == Z (Z'Z)-1Z' is called

"hat" matrix Then the residuals

== y'[I - Z (Z'Z) Z'Jy == y'y - y'Z/3

1 If Z is not full rank, (Z'Z)-1 is replaced by (Z'Z)-, a generalized inverse of Z'Z (See Exercise 7 6 )

Trang 6

Section 7 3 Least S q u a res Estimation 359

Proof Let f3 = (Z'Z) Z'y as asserted Then £

= y - y = y- Z/3 = [I - Z(Z'Z)-1Z']y The matrix [I - Z(Z'Z)-1Z'] satisfies

Consequently,Z'e = Z'(y - y) = Z'[I - Z(Z'Z)-1Z']y = O,soy'e = p 'Z'e = o

Additionally, � '£ = y'[I-Z(Z'Zr1z'] [I-�(Z'Zr1z']y = y'[I-z(z'zr1z']y

= y' y - y' Z f3 To verify the expression for f3, we write

so

" "' "' "'

y - Zb = y - Z f3 + Z f3 - Zb = y - Z f3 + Z ( f3 - b) S(b) = (y - Zb)'(y - Zb) "

" " "

= (y - Z/3)'(y - Z/3) + " " + (/3 - b)'Z'Z(/3 - b) 2(y - Zf3)'Z(f3 - b) " " "'

"

= (y - Z/3)'(y - Z/3) + (/3 - b)'Z'Z(/3 - b) since (y - ZP)'Z = e'Z = 0' Th�firstterminS(b) doesnotdepend�nb andthe sec�nd is the squared length of Z ( f3 - b) Because Z has full rank, Z ( f3 - b) f 0

if f3 (Z'Z)-1Z'y Note that (Z'Z)-1 exists sinceZ'Zhasrankr # b, so the minimum sum of squares is unique and occurs for b = f3 = + 1 < n (IfZ'Zisnot

of full rank, Z' Za = 0 for some a # 0, but then a' Z' Za = 0 or Za = 0, which con­

Result 7.1 shows how the least squares estimates p and the residuals e can be

obtained from the design matrix Z and responses y by simple matrix operations

Example 7.3 (Calcu lating the least squares estimates, the residuals,

and the residual sum of squares)

Calculate the least square estimates p , the residuals e, and the residual sum of squares for a straight-line model

fit to the data

Trang 7

360 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

y'y == (y + y - y)'(y + y - y) == (y + i)'(y + e) == y'y + e' e (7-7)

Trang 8

Section 7 3 Least Squares Esti mation 3 6 1

Since the first column of Z is 1, the condition Z' f; == 0 includes the requirement

The preceding sum of squares decomposition suggests that the quality of the models fit can be measured by the coefficient of determination

n

" "'2

£ J ej j=l

� (yj - y)2 j=l

� (yj - y)2 j=l

The quantity R2 gives the proportion o f the total variation in the Y/S "explained" by,

or attributable to, the predictor variables z1 , z2, • • • , Zr Here R2 (or the multiple correlation coefficient R == + Vfii) equals 1 if the fitted equation passes through all

tpe da!a points, s� that ej == 0 for all j At the other extreme, R2 is 0 if ffio == y and

{3 1 == {32 == · · · == f3r == 0 In this case, the predictor variables z1 , z2, • , Zr have no in­ fluence on the response

Geometry of Least Sq uares

A geometrical interpretation of the least squares technique highlights the nature of the concept According to the classical linear regression model,

Trang 9

362 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

y on the plane consisting of all linear combinations of the columns of Z The resid­ ual vector e = y - y is perpendicular to that plane This geometry holds even when

Z is not of full rank

When Z has full rank, the projection operation is expressed analytically as multiplication by the matrix Z ( Z' Z) -1 Z' To see this, we use the spectral decompo­ sition (2-16) to write

where A1 > A2 > · · · > Ar+1 > 0 are the eigenvalues of Z' Z and e1, e2, , er+l are the corresponding eigenvectors If Z is of full rank,

(Z'Z)-1 = -A1 e1e1 + -A2 e2e2 + · · · + Ar+1 er+1e�+1

Consider qi == Ai 112Zei, which is a linear combination of the columns of Z Then q; Qk

== A-:-1/2 A -k1/2e�Z' Zek l l = A-:-1/2 A -k112e�Akek l l = 0 if i # k or 1 if i = k • That is the ' r + 1

vectors qi are mutually perpendicular and have unit length Their linear combinations span the space of all linear combinations of the columns of Z Moreover,

Z (Z'Z)-1Z' = i=1 � Aj1ZeiejZ' � i=1 qiq;

Trang 10

Section 7 3 Least Sq u a res Esti mation 363

According to Result 2A.2 and Definition 2A.12, the projection of y on a linear

bination of { ql , q2, , q,+l } is � ( q;y) q; = � q;q; y = z (Z' Zf1Z'y = z{3

Thus, multiplication by Z (Z' Z) -1Z' projects a vector onto the space spanned by the columns of Similarly, Z.2 [I - Z (Z' Z) -1Z' J is the matrix for the proj ection of y on the plane perpendicular to the plane spanned by the columns of Z

Sampling Properties of Classical Least Squares Esti mators

The least squares estimator /3 and the residuals e have the sampling properties de­ tailed in the next result

Result 7.2 Under the general linear regression model in (7-3), the least squares

-1

estimator p = (Z'Z) Z'Y has

E(P) = P and Cov (P) = a-2(Z'Z)-1

The residuals e have the properties

E(e) = 0 and Cov (e) = a-2[I - Z (Z'Z)-1Z'] = a-2[I - H]

Also,E(e'e) = (n - r - 1)a-2, so defining

Moreover, p and e are uncorrelated

Proof Before the response Y = zp + e is observed, it is a random vector Now,

i = l

has rank r 1 + 1 and generates the unique projection o f y on the space spanned by the linearly dent columns of Z This is true for any choice of the generalized inverse (See [20] )

Trang 11

indepen-364 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

since [I - Z (Z'Z)-1Z'] Z = Z - Z = 0 From (2-24) and (2-45),

E(P) = f3 + (Z'Z)-1Z' E(e) = f3 Cov (p) = ( Z' Z)-1Z' Cov (e) Z ( Z' Z) -1 = a-2(Z' Z)-1Z' Z ( Z' Z) -1

= a-2 ( z' z) -1 E(e) = [I - Z(Z'Z)-1Z']E(e) = 0

Cov(e) = [I - Z(Z'Z)-1Z']Cov(e) [I - Z(Z'Z)-1Z'J'

= a-2[I - Z(Z'Z)-1Z']

where the last equality follows from (7-6) Also,

Cov( p , e) = E[( p - f3)e'] = (Z'Z)-1Z'E(ee')[I - Z(Z'Z)-1Z']

= a-2(Z'Z)-1Z'[I - Z(Z'Z)-1Z'J = 0

because Z'[I - Z(Z'Z)-1Z'] = 0 From (7-10), (7-6), and Result 4.9,

e'e = e'[I - Z(Z'Z)-1Z'] [I - Z(Z'Z)-1Z']e

= e'[I - Z(Z'Z)-1Z'Je

= tr [ e' (I - Z ( Z' Z) -1 Z' ) e J -1

= tr([I - Z(Z'Z) Z'Jee') Now, for an arbitrary n X n random matrix W,

E(tr (W)) = E(W11 = E(W11) + + W22 E(W22) + · · · + + · · · Wnn) + E(Wnn) = tr[E(W)]

Thus, using Result 2A.12, we obtain

E(e'e) = tr([I - Z(Z'Z)-1Z']E(ee'))

The least squares estimator f3 possesses a minimum variance property that was first established by Gauss The following result concerns "best" estimators of linear

parametric functions of the form c' f3 = c0{30 + c1{31 + · · · + crf3r for any c

Result 7.3 (Gauss'3 least squares theorem) Let Y = Z/3 + e, where E( e) = 0, Cov (e) = a-2 I, and Z has full rank " " " r + 1 For any " c, the estimator

c' f3 = Cof3o + C1 {31 + + Crf3r

3Much later, Markov proved a less general result, which misled many writers into attaching his

name to this theorem

Trang 12

Section 7 4 I nfe rences About the Reg ress ion M odel 3 6 5

of c' f3 has the smallest possible variance among all linear estimators of the form

a'Y = a1Y1 + a2Y2 + · · · + anYn

that are unbiased for c' f3

Proof For any fixed c, let a'Y be any unbiased estimator of c' f3 Then E(a'Y) = c' /3, whatever the value of f3 Also, by assumption, E(a'Y) = E( a' Z/3 + a' e) = a' Z/3 Equating the two expected value expressions yields a'Z/3 = c' f3 or (c' - a'Z)/3 = 0 for all /3, including the choice f3 = (c' - a'Z)' This implies that c' = a' Z for any unbiased estimator Now, c' f3 = c'(Z'Z) 1Z'Y = a*'Y with a* = Z (Z'Z) c Moreover, from " - -1 Result7.2E(P) = f3,soc'P = a*'Yis anunbiasedestimatorofc'f3 Thus,for any

a satisfying the unbiased requirement c' =

a' Z, Var(a'Y) = Var(a'Z/3 + a'e) = Var(a'e) = a'Io2a

= a-2(a - a* + a*)'(a - a* + a*)

= a-2 [ (a - a*)' (a - a*) + a*' a*]

since (a - a*)'a* = (a - a*)'Z(Z'Z)-1c = 0 from the condition (a - a*)'Z = a'Z - a*'Z = c' - c' = 0' Because a* is fixed and (a - a*)'(a - a*) is positive unless a= a*, Var(a'Y) is minimized by the choice a*'Y = c'(Z'Zf1Z'Y = c' {3 •

"

This powerful result states that substitution of fJ for f3 leads to the best estimator "

of c' f3 for any c of interest In statistical terminology, the estimator c' f3 is called the best (minimum-variance) linear unbiased estimator (BLUE) of c' /3

7.4 I N FERENCES ABOUT TH E REGRESSION M O D E L

We describe inferential procedures based on the classical linear regression model in (7-3) with the additional (tentative) assumption that the errors e have a normal dis­ tribution Methods for checking the general adequacy of the model are considered

in Section 7.6

Inferences Concerning the Regression Parameters

Before we can assess the importance of particular variables in the regression function

E(Y) = f3o + {31 Z1 + · · · + f3r Zr (7-1 1)

we must determine the sampling distributions of {3 and the residual sum of squares,

e' e To do so, we shall assume that the errors e have a normal distribution

Result 7.4 Let Y = Z/3 + e, where Z has full rank r + 1 and e is distributed

as Nn(O, o-21) Theil the maximum likelihood estimator of f3 is the same as the least squares estimator /3 Moreover,

P = (Z'Zf1Z'Y is distributed as N,+l ( /3, cr2(Z'Zf1)

Trang 13

366 Chapter 7 M u ltiva riate Linear Reg ress ion Models

and is distributed independently of the residuals na2 e = Y - zp Further,

= e' e is distributed as o-2 X�-r-1 where (J-2 is the maximum likelihood estimator of o-2•

Proof Given the data and the normal assumption for the errors, the likeli­ hood function for f3 , o-2 is

c-j=1 V2iT 1 lT (21T )nf2lTn

e-( y- z {3) I ( y- z {3) /2a2 (21T )nf2lTn

For a fixed value o-2, the likelihood is maximized by minimizing (y - Z/3)' (y - Z/3) But this minimization yields the least squares estimate {J = ( Z' Z) -1 Z'y, which does not depend upon o-2• Therefore, under the normal assumption, !he maximum likeli­ hood and least squares approaches provide the same estimator f3 Next, maximizing

L(p, o-2) over o-2 [see (4-18)] gives

L({3, 02) = (21T )nf2( (J-2)n/2 1 e-n/2 where cl-2 = (y - z{3)'(y - z{3) n (7-12

) From (7-10), we can express e {3 and e as linear combinations of the normal variables Specifically,

Because Z is fixed, Result 4.3 implies the joint normality of {3 and e Their mean vec­

tors and covariance matrices were obtained in Result 7.2 Again, using (7-6), we get

Since Cov ( p , e) = 0 for the normal random vectors p and e , these vectors are in­ dependent (See Result 4.5.) Next, let (A, e) be any eigenvalue-eigenvector pair for I - Z (Z' Z)-1Z' Then,

by (7-6), [I - Z (Z'Z)-1Z']2 = [I - Z (Z'Z)-1Z'] so

Ae = [I - Z(Z'Z)-1Z']e = [I - Z(Z'Z)-1Z'J2e = A[I - Z(Z'Z)-1Z'Je = A2e

That is, A = 0 or 1 Now, tr [I- Z (Z'Z)-1Z'] = n - r - 1 (see the proof of

Result 7.2),and from Result 4.9,tr[I - Z(Z'Z)-1Z'] = A1 + A2 + ··· + An, where A1 >

Trang 14

Section 7.4 I nferences About the Reg ress ion M odel 3 6 7

where e1, e2, , en-r-1 are the normalized eigenvectors associated with the eigen­ values A1 = A2 = · · · = An-r-1 = 1 Let

V = - - - - -e2 � - - - - e

e�-r-1 Then V is normal with mean vector Cov(V;, Vk) = { e' 21e - 2e'e - 2 o:u 0 and k - u ; k - u '

That is, the Vi are independent N(O, o-2) and by (7-10),

i = k otherwise na2 = e' e = e' [l - Z (Z'Z)-1Z'J e = Vi + V� + · · · + V�-r-1

d "b d 2 2

IS

A confidence ellipsoid for fJ is easily constructed It is expressed in terms of the

estimated covariancematrixs2(Z'Z)-1,where s2 = e'ej(n - r - 1)

Result 7.5 Let Y = ZfJ + e, where Z has full rank r + 1 and e is Nn(O, o-21) Then a 100(1 - a)% confidence region for fJ is given by

(fJ-p)'Z'Z(fJ-P) < (r + 1)s2Fr+1,n-r-1(a) where Fr+1,n-r-1 (a) is the upper (100a)th percentile of an F-distribution with r and n - r - 1 d.f + 1 Also, simultaneous 100(1 - a)% confidence intervals for the f3i are given by

�i ± � V(r + 1)Fr+l,n-r-l(a), i = 0, 1, , r

- /\

where Var(f3J is the diagonal element of s2(Z'Z) corresponding to f3i· Proof

Consider the symmetric square-root matrix (Z' Z )112 [See (2-22).] Set 1/2 /\

V =

(Z'Z) (fJ - fJ) and note that E(V) = 0,

Cov(V) = (Z'Z)112 Cov(p) (Z'Z)112 = o-2(Z'Z)112(Z'Z)-1(Z'Z)1;2 = a-21 "

and V is normally distributed, since it consists of linear combinations of the {3/s Therefore, V'V = (P - fJ)'(Z'Z)112(Z'Z)112(P - fJ) = (P - fJ)' (Z'Z)(P - fJ) is distributed as o-2 X;+1• By Result 7.4 (n - r - 1) s2 = e' e is distributed as o-2 x�-r-1, independently of {3 and, hence, independently of V Consequently, [X;+1/(r + 1)]/[x�-r-1/(n - r - 1)] = [V'V/(r + 1)]/s2 has an Fr+1,n-r-1, distrib­ ution, and the confidence ellipsoid for fJ follows Projecting this ellipsoid for ( fJ - fJ) using Result 5A.1 with A-1 = Z'Z/s2, c2 = (r + 1)Fr+1,n-r-1(a), and u' = [0, ,0, 1,0, ,0] yields l /3; - �; I < V(r + 1)Fr+l,n-r-l(a) � ' where

- /\

Var (f3i) is the diagonal element of s2( Z' Z) corresponding to f3i· •

Trang 15

368 Chapter 7 M u ltiva r i ate Li near Reg ress ion Models

The confidence ellipsoid is centered at the maximum likelihood estimate {3 , and its orientation and size are determined by the eigenvalues and eigenvectors of Z' Z If an eigenvalue is nearly zero, the confidence ellipsoid will be very long in the direction of the corresponding eigenvector Practitioners often ignore the "simultaneous" confidence property of the in­ terval estimates in Result 7.5 Instead, they replace (r + 1)Fr+1,n-r-1(a) with the

one-at-a-time t value tn-r-1 ( a/2) and use the intervals

when searching for important predictor variables

Example 7.4 (Fitti ng a regression model to real-estate data)

The assessment data in Table 7.1 were gathered from 20 homes in a Milwaukee,

Wisconsin, neighborhood Fit the regression model lj = f3o + {31 Zj1 + {32Zj2 + Bj where z1 =

total dwelling size (in hundreds of square feet), z2 =

assessed value (in thousands of dollars), and Y =

selling price (in thousands of dollars), to these data using the method of least squares A computer calculation yields TABLE 7 1 R EAL- ESTATE DATA

Z1 Total dwelling size (100 ft2)

15.31 15.20 16.25 14.33 14.57 17.33 14.48 14.91 15.25 13.89 15.18 14.44 14.87 18.63 15.20 25.76 19.05 15.37 18.06 16.35

Z2 Assessed value ($1000) 57.3 63.8 65.4 57.0 63.8 63.2 60.2 57.7 56.4 55.6 62.6 63.4 60.2 67.2 57.1 89.6 68.6 60.1 66.3 65.8

y Selling price ($1000) 74.8 74.0 72.9 70.0 74.9 76.0 72.0 73.5 74.5 73.5 71.5 71.0 78.9 86.5 68.0 102.0 84.0 69.0 88.0 76.0

Trang 16

30.967 (7.88) + 2.634z1 (.785) + .045z2 (.285) with s == 3.473 The numbers in parentheses are the estimated standard devi­ ations of the least squares coefficients Also, R2 == .834, indicating that the data exhibit a strong regression relationship (See Panel 7.1, which contains there­ gression analysis of these data using the SAS statistical software package.) If

the residuals e pass the diagnostic checks described in Section 7 6, the fitted PANEL 7.1 SAS ANALYSIS FOR EXAMPLE 7.4 USING PROC REG

title 'Reg ression Ana lysis';

Sq uares

1 032.87506 204.99494

1 237.87000

76.55000 4.53630

Mean Square

51 6.43753

1 2.05853

Adj R-sq Parameter Estimates

PROGRAM COMMANDS

F value 42.828

0.81 49

T for HO:

Parameter = 0

3.929 3.353 0.1 58

OUTPUT

Prob > F 0.0001

Prob > ITI 0.001 1 0.0038 0.8760

Trang 17

370 Chapter 7 M u ltiva r i ate Linear Reg ress ion M ode ls

equation could be used to predict the selling price of another house in the neigh­ borhood from its size and assessed value We note that a 95% confidence in­

terval for {32 [see (7 -14)] is given by

�2 ± t17(.025) � =

.045 ± 2.110(.285)

or

( - 556, 647) Since the confidence interval includes {32 = 0, the variable z2 might be dropped from the regression model and the analysis repeated with the single predictor

variable z1 Given dwelling size, assessed value seems to add little to the pre­ diction of selling price •

Likeli hood Ratio Tests for the Regression Pa rameters

Part of regression analysis is concerned with assessing the effects of particular pre­ dictor variables on the response variable One null hypothesis of interest states that certain of the z/s do not influence the response Y These predictors will be labeled Zq+1, Zq+2, •

, z, The statement that Zq+1, Zq+2, .

, z, do not influence Y translates into the statistical hypothesis H0 :

(Z1Z1)-1Z1y

Result 7.6 Let Z have full rank r + 1 and e be distributed as Nn(O, o-21) The likelihood ratio test of H0 : f3(2) = 0 is equivalent to a test of H0 based on the extra sum of squares in (7-16) and s2 = (y -

Z p ) ' ( y - ZP)/(n - r - 1) In par­ ticular, the likelihood ratio test rejects (SSres(Z1 ) - SSres(Z) )/(r - q) H0 if

Trang 18

Section 7.4 I nfe rences About the Reg ress ion Model 3 7 1

where Fr-q,n-r-1 (a) is the upper (100a )th percentile of an F-distribution with r - q

and n - r - 1 d.f

Proof Given the data and the normal assumption, the likelihood associated

with the parameters f3 and a-2 is

all coefficients in a subset are zero, fit the model with and without the terms corre­ sponding to these coefficients The improvement in the residual sum of squares (the extra sum of squares) is compared to the residual sum of squares for the full model via the F-ratio The same procedure applies even in analysis of variance situations where Z is not of full rank.4 More generally, it is possible to formulate null hypotheses concerning r - q linear combinations of f3 of the form H0 : C/3 = A0 • Let the (r - q) X (r + 1 ) ma­ trix C have full rank, let A0 = 0, and consider H0 : C/3 = 0

( This null hypothesis reduces to the previous choice when C = [o l ; (r-q)X(r-q) I ]·)

Under the full model, Cp is distributed as Nr-q(C/3, a-2C (Z'Z)-1C') We reject

4In situations where Z is not of full rank, rank(Z) replaces r + 1 and rank(Z1 ) replaces q + 1 in Result 7.6

Trang 19

372 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

H0 : C/3 = 0 at level a if 0 does not lie in the 100(1 - a)% confidence ellipsoid for

C/3 Equivalently, we reject ( Cp )' ( C(Z' Z) -1C') -1( Cp) H0 : C/3 = 0 if

where (100a)thpercentileofanF-distributionwithr - qandn - r - 1d.f Thetestin(7-17) s2 = (y - Zp)'(y - ZP)/(n - r - 1) and Fr-q,n-r-1(a) is the upper

is the likelihood ratio test, and the numerator in the F-ratio is the extra residual sum of squares incurred by fitting the model, subject to the restriction that C f3 = 0 (See [22])

The next example illustrates how unbalanced experimental designs are easily

handled by the general theory just described Example 7.5 (Testi ng the importance of additional predictors

using the extra sum-of-sq uares approach)

Male and female patrons rated the service in three establishments (locations)

of a large restaurant chain The service ratings were converted into an index Table 7.2 contains the data for n = 18 customers Each data point in the table

is categorized according to location (1, 2, or 3) and gender (male =

0 and female = 1) This categorization has the format of a two-way table with un­ equal numbers of observations per cell For instance, the combination of loca­ tion 1 and male has 5 responses, while the combination of location 2 and female has 2 responses Introducing three dummy variables to account for location and two dummy variables to account for gender, we can develop a regression model linking the service index Y to location, gender, and their "interaction"

using the design matrix TABLE 7.2 RESTAU RANT-S E RVICE DATA

Trang 20

Z =

Section 7.4 I nferences About the Reg ress ion Model 373

constant location gender interaction � � �

} 2 responses

The coefficient vector can be set out as

f3' = [f3o , ,81, ,82, ,(33, T1, T2, 'Y11' 'Y12' 'Y21, 'Y22, 'Y31' 'Y32]

where the ,8/s tion of service, the T/s represent the effects of gender on the service index, and ( i > 0) represent the effects of the locations on the determina­ the 'Yik's represent the location-gender interaction effects The design matrix Z is not of full rank (For instance, column 1 equals the sum of columns 2-4 or columns 5-6.) In fact, rank(Z) = 6

For the complete model, results from a computer program give

SSres(Z) = 2977.4 and n The model without the interaction terms has the design matrix zl con­ - rank(Z) = 18 - 6 = 12 sisting of the first six columns of Z We find that

SSres(Z1) = 3419.1 with n

Trang 21

374 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

The F-ratio may be compared with an appropriate percentage point of an F-distribution with 2 and 12 d.f This F-ratio is not significant for any reason­ able significance level a Consequently, we conclude that the service index does not depend upon any location-gender interaction, and these terms can be dropped from the model Using the extra sum-of-squares approach, we may verify that there is no difference between locations (no location effect), but that gender is significant; that is, males and females do not give the same ratings to service In analysis-of-variance situations where the cell counts are unequal, the variation in the response attributable to different predictor variables and their interactions cannot usually be separated into independent amounts To evalu­ ate the relative influences of the predictors on the response in this case, it is necessary to fit the model with and without the terms in question and compute the appropriate F-test statistics •

7.5 I N FERENCES FROM TH E ESTIMATED REGRESSION FUNCTION

Once an investigator is satisfied with the fitted regression model, it can be used to solve two prediction problems �et z0 = [1, z01, , Zor] be selected values for the predictor variables Then z0 and f3 can be used (1) to estimate the regression func­ tion {30 + {31 z01 + · · · + f3rZor at z0 and (2) to estimate the value of the response Y

at z0

Esti mating the Reg ression Fu nction at z0

Let Y0 denote the value of the response when the predictor variables have values z0 = [1, z01, , ZorJ According to the model in (7-3), the expected value of Yo is

E(Yo I z o ) = f3o + {31Zo1 + · · · + f3rZor = Zo/3 (7-18)

"

Result 7.7 For the linear regression model in (7-3), z0f3 is the unbiased linear estimator of E(Y0 I z0) with minimum variance, Var(z0P) =

z0( Z'Z ) -1 z0o2 If the errors e are normally distributed, then a 100( 1 - a)% confidence interval for

Trang 22

Section 7 5 I nferences from the Est i m ated Reg ress ion Fu nction 3 7 5

is distributed as x� -r - 1/(n - r - 1 ) Consequently, the linear combination z'oP is

(z'oP - z'o{J )j\1 o-2z'o(Z' Z)-1z0 (z'o{J - z'o{J )

v?J;?

is distributed as tn-r-1• The confidence interval follows

Forecasti ng a New Observation at z0

Prediction of a new observation, such as YQ, at than estimating the expected value of Y0 • According to the regression model of z'o = [1, z01 , , Zor J is more uncertain (7 -3),

or (new response Yo) = (expected value of Y0 = z'ofJ + eo Y0 at z0) + (new error)

where The errors eo is distributed as e N(O, o-2) and is independent of e and, hence, of p and s2

influence the estimators /3 and s2 through the responses Y, but eo

does not Result 7.8

Given the linear regression model of (7-3), a new observation Yo

has the unbiased predictor

"' "' "' "'

z'ofJ = f3o + {31 Zo1 + · · · + f3rZor

"'

The variance of the forecast error Yo - z0{J "' is 1

Var (Yo - z'o/3 ) = o-2(1 + z0(Z' Z)- z0)

When the errors e

have a normal distribution, a 100(1 - a)% prediction interval for

Yo is given by zaP ± t

n -r-l( � ) Vs2(1 + zo(z'zr1zo)

where tn-r-1 ( a/2) is the upper 100( a/2 )th percentile of a t-distribution with

n - r Proof - 1 degrees of freedom

We forecast Yo by z'o/3 , which estimates E(Yo I z0) By Result 7.7, z'o/3

-1

has Yo - z'o/3 = Zo/J + eo - zo/3 = eo + Zo( fJ -p) E(z'ofJ ) = z'o{J and Var (z'ofJ ) = z'o(Z' Z) z0o-2 • The forecast error is then

Thus, E(Yo - zo/3 ) = E(eo) +

E( z0( fJ - fJ ) ) = 0 "' so the predictor is unbiased Since "' 1 eo and fJ are independent, 1

Var (Y0 - z0{J ) = Var (eo) + Var (z0f3 ) = a-2 + z0(Z' Z)- z0o-2 = a-2(1 + z'o(Z' Z)- z0)

"'

If it is further assumed that e

has a normal distribution, then "' fJ is

normal-ly distributed, and so is the linear combination (Y0 - z'oP )j\1 o-2(1 + z0(Z' Z)-1z0) Y0 - z'o{J Consequently,

is distributed as N(O, 1 ) Dividing this ratio by

v?J;?, which is distributed as V X

� -r-1/(n - r "' - 1 ) , we obtain

(Yo - zo/3 ) Vs2(1 + z0(Z'Z)-1z0)

which is distributed as tn-r-1 The prediction interval follows immediately •

Trang 23

376 Chapter 7 M u ltivariate Linear Reg ress ion Models

The prediction interval for Yo is wider than the confidence interval for estimating the value of the regression function E(Yo I z0) = z0/J The additional uncertainty in forecasting Y0 , which is represented by the extra term s2 in the expression

s2(1 + z0(Z' Z)-1z0) , comes from the presence of the unknown error term s0 •

Example 7.6 (I nterva l esti mates for a mean response

and a futu re response)

Companies considering the purchase of a computer must first assess their future needs in order to determine the proper equipment A computer scientist col­lected data from seven similar company sites so that a forecast equation of computer-hardware requirements for inventory management could be devel­oped The data are given in Table 7.3 for

z 1 = customer orders (in thousands)

z2 = add-delete item count (in thousands)

Y = CPU (central processing unit) time (in hours)

TABLE 7.3 COM PUTE R DATA

Source: Data taken from H P Artis, Fo recasting Computer Require­

ments: A Forecaster's Dilemma (Piscataway, NJ: Bell Laboratories, 1 979)

Construct a 95% confidence interval for the mean CPU time, E(Yo I z0) =

{30 + {31 Zo1 + {32z02 at z0 = [1, 130, 7.5] Also, find a 95% prediction interval for a new facility's CPU requirement corresponding to the same z0 •

A computer program provides the estimated regression function

Trang 24

Section 7 6 M o d e l Check ing and Other As pects of Reg ress ion 377

and sVzO(Z' Zr1z0 = 1.204( 58928) = .71 We have t4( 025) = 2.776, so the

95% confidence interval for the mean CPU time at z0 is

zoP ± t4( 025)sV'zO(Z' Zr1zo = 151.97 ± 2.776( 71 )

or (150.00, 153.94)

Since sV . 1 _+_z_0 - (Z_'_Z_)_ -1z- 0

= (1.204) (1.16071 ) = 1.40, a 95% prediction interval for the CPU time at a new facility with conditions z0 is

z'oP ± t4(.025)sV'1 + z0(Z' Z)-1z0 = 151.97 ± 2.776( 1.40)

7.6 MODEL CH ECKI NG AND OTH ER ASPECTS OF REGRESSION

Does the Model Fit?

Assuming that the model is "correct," we have used the estimated regression function

to make inferences Of course, it is imperative to examine the adequacy of the model

If the model is valid, each residual ej is an estimate of the error sj , which is assumed to

be a normal random variable with mean zero and variance a-2 • Although the residuals

e have expected value O, their covariance matrix a-2 [I - Z (Z' Z)-1Z' ] = a-2 [I - H]

is not diagonal Residuals have unequal variances and nonzero correlations Fortu­nately, the correlations are often small and the variances are nearly equal

Because the residuals e have covariance matrix a-2 [I - H], the variances of the

s j can vary greatly if the diagonal elements of H, the leverages h j j , are substantially different Consequently, many statisticians prefer graphical diagnostics based on stu­dentized residuals Using the residual mean square s2 as an estimate of a-2 , we have

Var ( ej) = s2(1 - hjj) ,

and the studentized residuals are

Trang 25

3 7 8 Chapter 7 M u ltiva riate Linear Reg ress ion Models

Residuals should be plotted in various ways to detect possible anomalies For general diagnostic purposes, the following are useful graphs:

1 Plot the residuals ej against the predicted values Yj = ffio + ffil Zjl + + ffirz; r Departures from the assumptions of the model are typically indicated by two types of phenomena:

(a) A dependence of the residuals on the predicted value This is illustrated in Figure 7.2(a) The numerical calculations are incorrect, or a {30 term has been omitted from the model

(b) The variance is not constant The pattern of residuals may be funnel shaped,

as in Figure 7.2(b ), so that there is large variability for large y and small variability for small y If this is the case, the variance of the error is not constant, and transformations or a weighted least squares approach (or both) are required (See Exercise 7.3.) In Figure 7.2( d), the residuals form

a horizontal band This is ideal and indicates equal variances and no de­pendence on y

2 Plot the residuals ej against a predictor variable, such as z1 , or products of pre­ dictor variables, such as zi or z1 z2 • A systematic pattern in these plots suggests the need for more terms in the model This situation is illustrated in Figure 7.2(c)

3 Q-Q plots and histograms Do the errors appear to be normally distributed? To answer this question, the residuals ej or ej can be examined using the techniques discussed in Section 4.6 The Q-Q plots, histograms, and dot diagrams help to detect the presence of unusual observations or severe departures from nor­mality that may require special attention in the analysis If n is large, minor de­partures from normality will not greatly affect inferences about /3

Trang 26

Section 7 6 Model Checking and Other Aspects of Reg ress ion 379

4 Plot the residuals versus time The assumption of independence is crucial, but hard to check If the data are naturally chronological, a plot of the residuals ver­sus time may reveal a systematic pattern (A plot of the positions of the resid­uals in space may also reveal associations among the errors.) For instance, residuals that increase over time indicate a strong positive dependence A sta­tistical test of independence can be constructed from the first autocorrelation,

n 2: e · "'2 1

j=l

(7-22)

of residuals from adj acent periods A popular test based on the statistic

� (Sj - ej-d I� e; 0 2(1 - rl) is called the Durbin-Watson test (See [13] for a description of this test and tables of critical values.)

Example 7.7 {Residual plots)

Three residual plots for the computer data discussed in Example 7.6 are shown

in Figure 7.3 The sample size n = 7 is really too small to allow definitive judg­ments; however, it appears as if the regression assumptions are tenable •

Trang 27

380 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

If several observations of the response are available for the same values of the predictor variables, then a formal test for lack of fit can be carried out (See [12] for

a discussion of the pure-error lack-of-fit test.)

Leverage and I nfl uence

Although a residual analysis is useful in assessing the fit of a model, departures from the regression model are often hidden by the fitting process For example, there may

be "outliers" in either the response or explanatory variables that can have a consid­erable effect on the analysis yet are not easily detected from an examination of resid­ual plots In fact, these outliers may determine the fit

The leverage h1j is associated with the jth data point and measures, in the space

of the explanatory variables, how far the jth observation is from the other n - 1

observations For simple linear regression with one explanatory variable z,

Observations that significantly affect inferences drawn from the data are said

to be influential Methods for assessi�g influence are typically based on the change

in the vector of parameter estimates, f3 , when observations are deleted Plots based upon leverage and influence statistics and their use in diagnostic checking of regres­sion models are described in [2] , [4] , and [9] These references are recommended for anyone involved in an analysis of regression models

If, after the diagnostic checks, no serious violations of the assumptions are de­tected, we can make inferences about f3 and the future Y values with some assur­ance that we will not be misled

Additional Problems in Li near Reg ression

We shall briefly discuss several important aspects of regression that deserve and re­ceive extensive treatments in texts devoted to regression analysis (See [9] , [10], [12], and [20].)

Selecting predictor variables from a large set In practice, it is often difficult

to formulate an appropriate regression function immediately Which predictor vari­ables should be included? What form should the regression function take?

When the list of possible predictor variables is very large, not all of the vari­ables can be included in the regression function Techniques and computer programs designed to select the "best" subset of predictors are now readily available The good ones try all subsets: z1 alone, z2 alone, , z1 and z2, • • The best choice is decided by

Trang 28

Section 7 6 Model Check i n g a n d Oth e r Aspects of Reg ress ion 3 8 1

examining some criterion quantity like R2• [See (7-9).] However, R2 always increases

with the inclusion of additional predictor variables Although this problem can be cir­cumvented by using the adjusted R2, R2 = 1 - (1 - R2)(n - 1)/(n - r - 1), a better statistic for selecting variables seems to be Mallow's C P statistic (see [11 ]),

( with (residual sum of squares for subset model p parameters, including an intercept) )

C =

- (n - 2p)

P (residual variance for full model)

A plot of the pairs (p, Cp) , one for each subset of predictors, will indicate models that forecast the observed responses well Good models typically have (p, Cp) co­ordinates near the 45 ° line In Figure 7.4, we have circled the point corresponding to the "best" subset of predictor variables

If the list of predictor variables is very long, cost considerations limit the num­ber of models that can be examined Another approach, called stepwise regression (see [1 2]), attempts to select important predictors without considering all the possibili­

ties The procedure can be described by listing the basic steps (algorithm) involved

in the computations:

Step 1 All possible simple linear regressions are considered The predictor variable that explains the largest significant proportion of the variation in Y (the variable that has the largest correlation with the response) is the first vari­able to enter the regression function

Figure 7.4 CP plot for computer data from Example 7.6 with th ree predictor va riables (z1 = orders, z2 =

add-delete count, z3 = number of items; see the example and orig inal sou rce)

Trang 29

382 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

Step 2 The next variable to enter is the one (out of those not yet included) that makes the largest significant contribution to the regression sum of squares The significance of the contribution is determined by an F-test (See Result 7.6 )

The value of the F-statistic that must be exceeded before the contribution of a variable is deemed significant is often called the F to enter

Step 3 Once an additional variable has been included in the equation, the in­dividual contributions to the regression sum of squares of the other variables already in the equation are checked for significance using F-tests If the F-statistic is less than the one (called the F to remove) corresponding to a pre­scribed significance level, the variable is deleted from the regression function Step 4 Steps 2 and 3 are repeated until all possible additions are nonsignifi­cant and all possible deletions are significant At this point the selection stops Because of the step-by-step procedure, there is no guarantee that this approach will select, for example, the best three variables for prediction A second drawback

is that the (automatic) selection methods are not capable of indicating when trans­formations of variables are useful

Colinearity If Z is not of full rank, some linear combination, such as Za, must equal 0 In this situation, the columns are said to be colinear This implies that Z'Z

does not have an inverse For most regression analyses, it is unlikely that Za = 0 ex­actly Yet, if linear combinations of the columns of Z exist that are nearly 0, the cal­culation of (Z'Z)-1 is numerically unstable Typically, the di�gonal entries of (Z'Z)-1

will be large This yields large estimated variance� for the f3 /s and it is then difficult

to detect the "significant" regression coefficients f3i The problems caused by co lin­earity can be overcome somewhat by (1) deleting one of a pair of predictor variables that are strongly correlated or (2) relating the response Y to the principal compo­ nents of the predictor variables-that is, the rows zj of Z are treated as a sample, and

the first few principal components are calculated as is subsequently described in Sec­tion 8.3 The response Y is then regressed on these new predictor variables

Bias caused by a misspecified model Suppose some important predictor vari­ables are omitted from the proposed regression model That is, suppose the true model has Z = [Z1 ! Z2] with rank r + 1 and

- + f3(2) (nX1) e ((r-q)X1)

(7-23)

where E( e) = 0 and Var( e) =

a21 However, the investigator unknowingly fits

a model using only the first q predictors by minimizing the error sum of squares (Y - Z1/3(1))'(Y - Z1f3(1))· The least squares estimator of f3(1) is P(1) :=

(Z1Z1)-1Z1Y Then, unlike the situation when the model is correct,

E(p(1)) =

(Z1Z1)-1Z1E(Y) =

(Z1Z1)-1Z1(Z1f3(1) + Z2P(2) + E(e)) (7-24)

Trang 30

"

Section 7 7 M u ltivariate M u lt i p l e Reg ress ion 3 8 3

That is, 13(1) is a biased estimator of f3(I) unless the columns of Z1 are perpendicular

to those of Z2 (that is, Z1 Z2 � 0) If important variables are missing from the model,

the least squares estimates f3 (l) may be misleading

7.7 M U LTIVARIATE M U LTI PLE REGRESSION

In this section, we consider the problem of modeling the relationship between m re­ sponses Y]_, sponse is assumed to follow its own regression model, so that Y2, , Ym and a single set of predictor variables z1, z2, , z, Each re­

Yi = 13ol + I31 1 Z1 + . · + l3r 1 Zr + e1 Y2 = 13o2 + I312 Z1 + · · · + 13r2 Zr + e2 (7-25)

Ym = 13om + 13I mZ1 + · · · + l3rmZr + em The error term e' = [e1 , e2, , em] has E(e) = 0 and Var(e) = I Thus, the error terms associated with different responses may be correlated To establish notation conforming to the classical linear regression model, let [ Zjo , Zjr, , Zjr] denote the values of the predictor variables for the jth trial, let Yj = [lj 1 , lj2, , �·m] be the responses, and let ej = [ ej1 , ej2, , ejm] be the er­

rors In matrix notation, the design matrix

e1 e2

Trang 31

3 84 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

Simply stated, the ith response Y(i) follows the linear regression model

Y(i) = ZfJ(i) + B(i) , i = 1 , 2, , m (7-27 ) with Cov ( B(i)) = crii I However, the errors for different responses on the same trial can be correlated

Given the outcomes Y and the values of the preftictor variables Z with full col­umn rank, we determine the least squares estimates fJ(i) exclusively from the obser­vations Y(i) on the ith response In conformity with the single-response solution, we

take

Pu) = ( Z ' Z)-1Z' Y(i) Collecting these univariate least squares estimates, we obtain

or

(7-28)

(7-29)

For any choice of parameters B = [b(l) ! b(z) ! · · · l b(m) J , the matrix of errors

is Y - ZB The error sum of squares and cross products matrix is (Y - ZB) ' (Y - ZB )

= [ (Y(ll - Zb(lJ\(Y(ll - Zb(ll ) (Y(m) - Zb(m) ) (Y(l) - Zb(l) )

by the choice B = f3 Also, the generalized variance I (Y - ZB ) ' (Y - ZB ) I is min­

imized by the least squares estimates /J (See Exercise 7.11 for an additional alized sum of squares property.) ,

gener-Using the least squares estimates f3, we can form the matrices of

Predicted values: Y = z{J = Z (Z' Z)-1Z' Y Residuals: e = Y - Y = [I - Z ( Z'Z)-1Z ' ] Y (7-3 1)

Trang 32

Section 7 7 M u ltiva riate M u lt i p l e Reg ress ion 3 8 5

The orthogonality conditions among the residuals, predicted values, and columns of

Z, which hold in classical linear regression, hold in multivariate multiple regression They follow from Z' [I - Z ( Z ' Z )-1Z' ] = Z' - Z' = 0 Specifically,

Z' e = Z' [I - Z ( Z' Z)-1 Z' ] Y = 0 (7-32)

so the residuals e(i) are perpendicular to the columns of Z Also,

(7-33) confirming that }he predicted values Y(i) are perpendicular to all residual vectors e(k) Because Y = Y + e,

or

(total sum of squaresand cross products = ) (predicted sum of squaresand cross products ) + (resi0 cross pro uc s �ual ( error) squareds ant �urn)

The residual sum of squares and cross products can also be written as

e�e = Y' Y - Y'Y = Y'Y - iJ'z'ziJ

Example 7.8 (Fitti ng a mu ltivariate straight-l ine reg ression model)

to two responses Yi and Y2 using the data in Example 7.3 These data, aug­mented by observations on an additional response, are as follows:

0

1 -1

Trang 33

386 Chapter 7 M u ltiva r i ate Li near Reg ress ion Models

PANEL 7.2 SAS ANALYSIS FOR EXAMPLE 7.8 USING PROC GLM

title 'Mu ltivariate Reg ression Analysis';

General Linear Models Proced u re

OUTPUT

F Value Pr > F

Y1 Mean 5.00000000

Y2 Mean

1 00000000

F Value Pr > F 7.50 0.07 1 4

Trang 34

PANEL 7.2 (continued)

Statistic

Wilks' Lambda

Pillai's Trace

Hotel ling-Lawley Trace

Roy's G reatest Root

Section 7 7 M u ltiva r i ate M u ltiple Reg ress ion 387

Y1 Y2

Trang 35

388 Chapter 7 M u ltiva r i ate Li near Reg ress ion M od e l s

Result 7.9 For the least squares estimator {J = [Jl(l) J1(2) i · · · i Jl(mJl

determined under the multivariate multiple regression model (7-26) with full rank(Z) = r + 1 < n,

Trang 36

Section 7 7 M u ltiva r i ate M u lt i p l e Reg ress ion 389

Proof The ith response follows the multiple regression model

Y(i) = Zf3(i) + e(i), E( eu)) = 0, and E( e(i)e(i)) = a-iii

Also, as in (7-10),

e(i) = Y(i) - Y(i) = [I - Z(Z'Z)-1Z'J Y(i) = [I - Z(Z'Z)-1Z'] e(i)

so E( p(i)) = f3(i) and E( iu)) = 0

Next,

Co v ( P u) , P ( k) ) = E ( P ( i) - /3 o) ) ( P ( k) - /3 ( k) ) '

= (Z'Z)-1Z' E(eu)e(k))Z (Z'Z)-1 = o-ik(Z'Z)-1 Using Result 4.9 and the proof of Result 7.2, with U any random vector and

A a fixed matrix, we have that E[U' AU] Consequently, = E[tr(AUU')] = tr[AE(UU')J E( i(i)e(k)) = = E( e(i)(I - Z (Z' Z) -1Z') e(k)) = tr [(I - Z (Z' Z) -1Z') o-iki]

a-i k tr [ (I - Z ( Z' Z) -1 Z' ) ] = a-i k ( n - r - 1 )

as in the proof of Result 7.2 Dividing each entry e(i)e(k) of e ' e by n - r - 1 , we ob­

tain the unbiased estimator of I Finally,

Cov(p0), e(k)) = = E[(Z'Z)-1Z'e(i)e(k)(I - Z(Z'Z)-1Z')]

E[zb(f3u) - Pu)) (f3(k) - P(k))'zo] = = zb(E(f3(i) - Pu)) (f3(k) - P(k))')zo

Trang 37

390 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

The related problem is that of forecasting a new observation vector Y0 ==

[Y01, Y02, •

, Yom] at z0 According to the regression model, Yoi = z0fJ(i) + s0i where the "new" error Eo E(s0J = O and E(s0isok) = [ s01, s02, , sam] is independent of the errors = A o-ik· Theforecasterrorforthe ith component ofY0is A e and satisfies

Yoi - zafJ(i) = Yoi - zbfJu) A + zafJ(i) - zafJ(i)

= eoi - Zo(fJ(i) - fJ(i))

so E(Yoi - z0fJu)) = E(s0J - z0E(IJ(i) - fJu)) = 0, indicating that z0fJu) is an unbiased predictor of Yoi· The forecast errors have covariances A A

= E(soi - zo(IJ(i) - IJ(i)))(sok - zb(fJ(k) - fJ(k)))

= E(soisok) + zbE(Pu) - IJ(i)) (P(k) - fJ(k))'zo

- zbE((P(i) - IJ(i))sok) - E(soi(P(k) - fJ(k))')zo

Note that E( (P(i) - !l(i)) eok) = 0 sine� p(i) =

(Z' Zf1Z' e(i) + flu) is independent

of E0• A similar result holds for E(s0i(J3(k) - f3(k))') Maximum likelihood estimators and their distributions can be obtained when

the errors e have a normal distribution

Result 7.10 Let the multivariate multiple regression model in (7-26) hold with full rank (Z) = r + 1, n >

(r + 1) + m, and let the errors e have a normal distri­ bution Then

A

is t�e maximum likelihood estimator of fJ and fJ ha� a normal distribution with -1 E(/J) = /J and Cov(fJ(i), fJ(k)) = o-ik(Z'Z) Also,/J is independent ofthe max- imum likelihood estimator of the positive definite I given by

Proof According to the regression model, the likelihood is determined from the data Y = [Y1, Y2, , YnJ' whose rows are independent, with Y1 distributed as Nm(fJ'zj, I) We first note that Y - ZfJ = [Y1 - /J'z1� Y2 - /J'z2, , Yn - fJ'znJ' so n

(Y - Z/J)' (Y - Z/J) = j=1 2: (Yj - fJ'zj) (Yj - fJ'zj)'

Trang 38

The , matrix Z (/3 - /J)I-1(/J - /J)'Z' is the form A' A, with

A = I-112(/J - /J)'Z', and, from Exercise 2.16, it is nonnegative definite Therefore, its eigenvalues are nonnegative also Since, by Result 4.9, "

and Pu) - f3u) *, 0, implies that Z ( f3(i) - f3(i) ) =I= 0, in which case

tr [?; (/3 - /J)I-1(/3 - /J)'Z' ] > c' I-1c > 0, where c' is any nonzero row of

Z;_(/J - /J) Applying Result 4.10 with B = e' e , b = n/2, and p = m, we find that

fJ and i = n-1e' e are the maximum likelihood estimators of fJ and I, respective­

It remains to establish the distributional results From (7-36), we know that f3u)

and eu) are linear combinations of the elements of e Specifically,

p(i) = (Z' Z) -1Z' e(i) + f3u) e(i) = [I - Z (Z' Z)-1Z' J eu) ' i = 1, 2, , m

Trang 39

392 Chapter 7 M u ltiva r i ate Linear Reg ress ion Models

Therefore, by Result 4.3, {3(1) ' {3(2) ' 0 0 0 ' P(m) ' e(1) ' e(2) ' 0 0 ' e(m) are jointly normal Their mean vectors and covariance matrices are given in Result 7.9 Since e and {3

have a zero covariance matrix, by Result 4.5 they are independent Further, as in

e' e = e' [I - Z (Z' Z)-1Z' ] e = €=1 � e'eeeee = €=1 � VeV€

Result 7.10 provides additional �upport for using least squares estimates When the errors are normally distributed, /3 and n-1e' e are the maximum likelihood esti­mators of {3 and I, respectively Therefore, for large samples, they have nearly the smallest possible variances

Comment The multivariate multiple regression model poses no A new computational problems Least squares (maximum likelihood) estimates, f3(i) =

(Z'Z)-1Z'y(i) ' are computed individually for each response variable Note, however, that the model requires that the same predictor variables be used for all responses

Once a multivariate multiple regression model has been fit to the data, it should

be subjected to the diagnostic checks described in Section 7.6 for the single-response model The residual vectors [ ej1 , ej2, • • • , ejm] can be examined for normality or out­liers using the techniques in Section 4.6

The remainder of this section is devoted to brief discussions of inference for the normal theory multivariate multiple regression model Extended accounts of these procedures appear in [1] and [22]

Li keli hood Ratio Tests for Regression Parameters

The multiresponse analog of (7-15), the hypothesis that the responses do not depend

on Zq+1 , Zq+2, • • • , Zr , becomes

H0 : /3(2) = 0 where {3 =

/3(1) ((q+1)Xm) /3(2) ((r-q)Xm)

(7-43)

Trang 40

Section 7 7 M u ltivari ate M u ltiple Reg ress ion 3 9 3

Setting Z = [ (nX(q+l)) Z1 (nX(r-q)) Z2 ] 'we can write the general model as

Under H0: /3(2) = 0, Y = Z1/3(l) + e and the likelihood ratio test of H0 is based on the quantities involved in the

extra sum of squares and cross products

= n ( I 1 - I)

where {J(lJ = (Z!Z1r1ZlY and i1 = n -1(Y - zl{J(ll)' (Y - zl{J(lJ)·

From (7-42), the likelihood ratio, A, can be expressed in terms of generalized variances:

Equivalently, Wilks' lambda statistic

is equivalent to rejecting -2ln A = H0 -n for large values of ln (III) I il l = -n ln I ni + n(il - i) lnil I

For n large,5 the modified statistic

- [ n - r - 1 _ l:_ (m - r 2 + q + 1 ) ] ln ( 1�1) I I1 I

has, to a close approximation, a chi-square distribution with m( r - q) d.f

Proof (See Supplement 7A.)

"

If Z is not of full rank, but has rank r1 + 1, then {3 = (Z' Z)-Z'Y, where

(Z'Z)- is the generalized inverse discussed in [19] (See also Exercise 7.6.) The dis­ tributional conclusions stated in Result 7.11 remain the same, provided that r is re­ placed by r1 and q + 1 by rank (Z1) However, not all hypotheses concerning {3 can

5Technically, both n - r and n - m should also be large to obtain a good chi-square approximation

Ngày đăng: 19/05/2017, 08:45

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
2. Adriaans, P. , and D. Zantinge. Data Mining. Harlow, England: Addison-Wesley, 1 996 Sách, tạp chí
Tiêu đề: Data Mining
Tác giả: P. Adriaans, D. Zantinge
Nhà XB: Addison-Wesley
Năm: 1996
3. Anderberg, M. R. Cluster Analysis for Applications. New York: Academic Press, 1973 Sách, tạp chí
Tiêu đề: Cluster Analysis for Applications
Tác giả: M. R. Anderberg
Nhà XB: Academic Press
Năm: 1973
7. Cormack, R. M. "A Review of Classification (with discussion)." Journal of the Royal Sta­tistical Society (A) , 134, no. 3 (1971 ), 321-367 Sách, tạp chí
Tiêu đề: A Review of Classification (with discussion)
9. Gower, J. C. "Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis." Biometrika, 53 (1966), 325-338 Sách, tạp chí
Tiêu đề: Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis
Tác giả: J. C. Gower
Nhà XB: Biometrika
Năm: 1966
10. Gower, J. C. "Multivariate Analysis and Multidimensional Geometry." The Statistician, 17 (1967) , 13-25 .1 1 . Gower, J. C., and D. J. Hand. Biplots. London: Chapman and Hall, 1996 Sách, tạp chí
Tiêu đề: Multivariate Analysis and Multidimensional Geometry
13. Greenacre, M. J., "Correspondence Analysis of Square Asymmetric Matrices," Applied Statistics, 49, (2000) 297-3 10 Sách, tạp chí
Tiêu đề: Correspondence Analysis of Square Asymmetric Matrices
15. Hartigan, J. A. Clustering Algorithms. New York: John Wiley, 1975 Sách, tạp chí
Tiêu đề: Clustering Algorithms
Tác giả: Hartigan, J. A
Nhà XB: John Wiley
Năm: 1975
17. Kruskal, J. B. "Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis." Psychometrika, 29, no. 1 (1964), 1-27 Sách, tạp chí
Tiêu đề: Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis
Tác giả: Kruskal, J. B
Nhà XB: Psychometrika
Năm: 1964
18. Kruskal, J. B. "Non-metric Multidimensional Scaling: A Numerical Method." Psy­chometrika, 29, no. 1 (1964) , 115-129 Sách, tạp chí
Tiêu đề: Non-metric Multidimensional Scaling: A Numerical Method
19. Kruskal, J. B. , and M. Wish. "Multidimensional S caling." Sage University Paper series on Quantitative Applications in the Social Sciences, 07-01 1 . Beverly Hills and London:Sage Publications, 1978 Sách, tạp chí
Tiêu đề: Multidimensional S caling
20. LaPointe, F-J, and P. Legendre. "A Classification of Pure Malt Scotch Whiskies." Applied Statistics, 43, no. 1 (1994), 237-257 Sách, tạp chí
Tiêu đề: A Classification of Pure Malt Scotch Whiskies
Tác giả: LaPointe, F-J, and P. Legendre. "A Classification of Pure Malt Scotch Whiskies." Applied Statistics, 43, no. 1
Năm: 1994
22. MacQueen, J. B. "Some Methods for Classification and Analysis of Multivariate Obser­vations." Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Prob­ability, 1, Berkeley, CA: University of California Press (1967) , 281-297 Sách, tạp chí
Tiêu đề: Some Methods for Classification and Analysis of Multivariate Obser­vations
26. Shepard, R. N. "Multidimensional Scaling, Tree-Fitting, and Clustering." Science, 210, no. 4468 (1980) , 390-398 Sách, tạp chí
Tiêu đề: Multidimensional Scaling, Tree-Fitting, and Clustering
Tác giả: R. N. Shepard
Nhà XB: Science
Năm: 1980
27. Sibson, R. "Studies in the Robustness of Multidimensional Scaling" Journal of the Royal Statistical Society (B) , 40 (1978), 234-238 Sách, tạp chí
Tiêu đề: Studies in the Robustness of Multidimensional Scaling
Tác giả: Sibson, R. "Studies in the Robustness of Multidimensional Scaling" Journal of the Royal Statistical Society (B) , 40
Năm: 1978
28. Takane, Y. , F. W. Young, and J. De Leeuw. "Non-metric Individual Differences Multi­dimensional Scaling: Alternating Least Squares with Optimal Scaling Features." Psy­cometrika, 42 (1977), 7-67 Sách, tạp chí
Tiêu đề: Non-metric Individual Differences Multi­dimensional Scaling: Alternating Least Squares with Optimal Scaling Features
Tác giả: Takane, Y. , F. W. Young, and J. De Leeuw. "Non-metric Individual Differences Multi­dimensional Scaling: Alternating Least Squares with Optimal Scaling Features." Psy­cometrika, 42
Năm: 1977
29. Ward, Jr. , J. H. "Hierarchical Grouping to Optimize an Objective Function." Journal o f the American Statistical Association, 58 (1963), 236-244 Sách, tạp chí
Tiêu đề: Hierarchical Grouping to Optimize an Objective Function
Tác giả: Ward, Jr. , J. H. "Hierarchical Grouping to Optimize an Objective Function." Journal o f the American Statistical Association, 58
Năm: 1963
4. Berry, M. J. A., and G. Linoff. Data Mining Techniques. New York: John Wiley, 1997 Khác
5. Berry, M. J. A., and G. Linoff. Mastering Data Mining. New York: John Wiley, 2000 Khác
6. Berthold, M . , and D. J. Hand. Intelligent Data Analysis. Berlin, Germany: Springer­Verlag, 1 999 Khác
8. Everitt, B. S. Cluster Analysis (3d ed.). London: Edward Arnold, 1993 Khác

TỪ KHÓA LIÊN QUAN