Chapter 15: THE MULTIVARIATE NORMAL DISTRIBUTION

CHAPTER 15* The multivariate normal distribution 15.1 Multivariate distributions The multivariate normal distribution is by far the most important distribution in statistical inference

Trang 1

CHAPTER 15*

The multivariate normal distribution

15.1 Multivariate distributions

The multivariate normal distribution is by far the most important distribution in statistical inference for a variety of reasons including the fact that some of the statistics based on sampling from such a distribution have tractable distributions themselves It forms the backbone of Part IV on statistical models in econometrics and thus a closer study of this distribution wiil greatly simplify the discussion that follows Before we

consider the multivariate normal distribution, however, let us introduce

some notation and various simple results related to random vectors and their distributions in general

Let X=(X,, X>, , X,)’ be an nx 1 random vector defined on the probability space (S, am? The mean vector E(X) is defined by

LÓ k(X Bị E(X)= L ; ae =n, annx I vector (15.1)

‘|

E(X,) | and the covariance matrix Cov(X) by

Cov(X) = E(X — m\(X — m))

L Var(X,),CowX,X;) - Cov(X,Xj,) |

_ | CovXaX) Var(X,) COUR: 2X, " =x (152)

| Cov(X,,X 1), CovX,X;) -'' Var(X,) |

Trang 2

15.1 Muitivariate distribution 313 where ¥ is an nx n symmetric non-negative definite matrix, 1.e L'=X and

a Xa20 for any eR" The ith element of © is

6j;= E(X;- nM MX;— pu), LJ= 1,2, m (15.3)

In relation to z#Z>0 we can show that If there exIsts an ze ", z0 sụch that Var(«’X)=a’Za=0 then Pr(a’K=c)=1 where c is a constant (only constants have zero variance), i.e there is a linear relationship holding among the r.v.’s X,, , X, with probability one

Lemma 15.1

If the random vector X has a continuous density function then &>0 This is because Pr(a’X=c)=0 for all ~ and c in this case Lemma 15.2

If X has mean p and covariance X for Z=AX+b

(i) E(Z)= AE(X)+b=Ap+b;

(ii) Cov(Z)= E[(AX +b —(Ap+b))(AX + b—(Ap +b))]

= AE(X — p)(X — p)'A’= ADA’

Let X and Y be nx 1 and mx 1 random vectors with E(X)=

E(Y)= Hy, then

Cov(X,Y)= [(Cov(X,Y,)„] = EL(X —m,)(Ý —mJ] (154)

Correlation

So far correlation has been defined for random variables (r.v.’s) only and the question arises whether it can be generalised to random vectors Let

E(X)=0 (without any loss of generality), Cov(X) = LZ, X nx 1, and partition

X into

X

Define Z=2'X, and let us consider the correlation between X, and Z

Ø1;

đị

oi (a! 22a)?

This is maximised for the value of « which minimises

(see Chapter 7), which is = ¿7 ø;; and we deñne the mulriple correlation

Trang 3

314 The multivariate normal distribution

coefficient to be

L

(22% 2021) >Corr(X;,.zX;), 0<R<l (15.8)

G41

R=

In the case where

x-[ } X,:kxI, X:(n—k)x 1, t~{ „ "), k>1,

we could define the r.v.s Z,=z¡X; and Z;=z;X; whose correlation coefficient is

a Ly 9a

Cort(Z,, Z,)=———" 44 1 (15.9)

m3 (%1Xi1#1)?(%25>22#2)?

From the above inequality it follows that for 2,=Z7}2,,0,

(222772

(a2, ,@,)?

Ly; 2,.257L,, has at most k non-zero eigenvalues which measure the association between X, and X, and are called canonical correlations Let

X=|[ X,] > where X3:(n—2)x 1

G1, G12 G13

L=] 023 G22 G23

đại G32 233

Another form of correlation of interest in this context is the correlation between X, and X, given that the effect of X, is taken away For this we form the r.v.’s

Y,=X,-b,X; and Y,=X,—b,X, and Corr(Y¡,Y¿)

is maximised by b; =X33'63, and b,=;,'63, as seen above Hence we define the partial correlation coefficient between X, and X , given X, to be

Ới 264323304)

es [oy 64323363; "L027 — 073233 632]? vee

(15.11)

Trang 4

15.2 The multivariate normal distribution 315

15.2 The multivariate normal distribution

The univariate normal density function discussed above was of the form

ƒ(xị H,ø2)=(2nø') ÌeXp4 —+ 3 (X—H) ¿: (15.12)

The density funcuion of X=(X;, X;, X„} when the X,s are HD normally distributed r.v.s was shown to be of the form

n

fxs w07)= [] fixe 4,07)

i=1

=0)""10)ˆ"2exp| 2, ; = (15.13)

i=l

Similarly, the density function of X when the X;s are only independent, i.e X,~ Nu;, 07), i= 1, 2, , , takes the form

ƒ(X: Hị, Hạ, Ø1 On On) =[] Ses s2)

1 12 c—n N2

=0) "øl,ø} o0 lep| ~5 y (s") L (15.14)

Go;

¡=1

Comparing the above three density functions we can discern a pattern developing which is very suggestive for the density function of an arbitrary normal vector X with E(X)= yp and Cov(X)= 2, which takes the form

f(x; WZ) =(2n) "(det E)”?exp{ —3xT— 8# ˆ!'x—g)}, (15.15)

and we write X ~ N(, £) If the Xs are IID r.v.’s L=o7I, and (det L) =(07)"

On the other hand, if the Xs are independent but not identically distributed

n

L=diag(o?, ,07) and (det ¥)= [] (o o?, ,0),

In the case of n=2

(detE)=ø7ø3(1—pø?)>0_ for —l<ø<l

>

Trang 5

and

1 —p

(—p*) } -p 1

O10, 93 Thus the bivariate normal density function is

22/4 2\7-3 I—ø?) 1

xIÍ- — | —2ø| ——— || —— “ |+Í ——

(15.17) (see Chapter 6) The standard bivariate density function can be obtained by defining the new r.v.’s

X.—H,

z=(” “) i= 1,2,

G;

whose density function is

(I—ø?)? ị (I—ø?)"!

A (21,225 p)y=— 2n > (cf —2pz,2 +23)

(15.18)

(1) Properties

(NI) — Let X~N(p,X) then Y =(AX +b)~ N(Aut+b, ALA’) for A:mxn

and b: mx 1 constant matrices e.g if ¥ =cX,c#0, Y ~ N(cp, c72) This property shows that if X is normally distributed then any linear function of X is also normally distributed

(N2) Let X,~N(m,,2,), t=1, 2, ., T; be independently distributed

random vectors, then for any arbitrary constant matrices A,, t= 1,2,

¬

(x AX.) -XÍ Am lA#Aj)

The converse also holds If the X,s are IID then pg, = w, £,=2,t=1,2, , T;

and

=>} X.|~N|m—>]

(zŠ%)e=z>)

Trang 6

15.2 The multivariate normal distribution 317

(N3) Let X~ N( 2) then the X;s are independent if and only if o;;=0,

i#j,ij=1,2, ,n, ie L=diag(o,,, , ¢,,) In general, zero covariance does not imply independence but in the case of normality the two are equivalent

(N4) 1ƒ X~ Nựu,) then the marginal distribution of any k x 1 subset X,

where

(2) mt) eB) 2 Hạ x, 2»

X,~N(H,,2,,) This follows from property N1 for A=(I,: 9), (k xn), b=0 Similarly, X; ~ N(Hạ, Ð;;)

These can be verified directly using

f(%5 6) = | fo @)dx, and ƒ(x;;Ø;)= | 0 6) dx,,

although the manipulations involved are rather too cumbersome Taking k=1, this property implies that each component of X~ N(#,Z) is also normally distributed; the converse, however, is not true

(N5) For the same partition of X considered in N4 the conditional

distribution of X, given X, takes the form

(X1/X2), Nw + Ey 2h 37 (X2— Ha), Xịi —Xị¿Š2; Fại) This follows from property N1 for

I, —2,.E52

0 L,-: nt

since

AC NI CouAx)=( nen Đi 0 )

(15.20) From this we can deduce that if &,,=0 then X, and X, are independent since (X,/X)~ N(#,, £,,) Moreover, forany Z, »,(X, —Z,,257X,)andX, are independent given that their covariance is zero Similarly, (X,/X,)~ N(#p+23,2),'(X, — 44), L22-L 2,277), ») In the case n=2

(X/XaxNỤu +p (X2— 1H) 71 1?) (15.21)

2

These results can be verified using the formula

Trang 7

N5 suggests that the regression function

E(X,/X,=x,)=4, + 2,225) (x.—g)) is linear in x, (15.23) and the skedasticity function

Cov(X,/X;=x;)=E2E,2E22E,, 1S ƒfee 0ƒ X¿

(15.24) These are very important properties of the multivariate normal distribution and will play a crucial role in Part IV

(2) Multiple correlation

Without any loss of generality let X~N(0,%), X: nx 1 and define the partition

x=(x') p=(% 2),

where X,: (n—1)x 1 and X,:1x 1 The squared multiple correlation coefficient takes the form

_Var(X,/X2) đizE22đai

(3) Partial correlation

Let X, be partitioned further into

X

x;={ ) X;:1xI, X;:(n—2)x I

X;

with

G1, ly ƠI

22> y ) = G2; G22 G23

G32 33

53, 637 X33 The partial correlation between X, and X, given X, takes the form

Cov(X;,X;/ÄX:) (1937 [VarlXu/XJIVaxX¿/X;JƑ

42-4 134533032

[oy — 1323363; ]*[022 623233 032)

Trang 8

with

(Q2)

(Q4)

(x)-M5 We = 61323363; 012-61 3233 632 )

X;/X: 673033 X3/\ 621 —073253632 922-6 23233 932

(15.27)

Quadratic forms related to the normal distribution

Let X~ N(u, X), where U>p, X:nx 1, then

(i) (X ~ py EO '(X —p)~ 77(n) — chi-square;

(ii) XE: !X~zŸ(n; ò) — non-central chi-square;

where 6=p'X"'p These results depend crucially on X being a positive definite matrix because for L>0 there exists a non-singular matrix H, 2=HH'=Z=H™ x p)~ N(O,I,), ie the Zjs are independent and (X~p) =" '( j= S?_, Z? Similarly for (ii) For the MLE of un,

i=(> y X)~AÍz 72) from N2

t=1

> aA ˆ

Ti wD w)~ zŸ(n)

Let X~ Nịu, l„), then for À a symmetric (A'= À) matrix

(i) (X — py A(X — pt) ~ 77 (tr A);

(ii) X’AX~y*(tr A; 0), =wAh,

if and only if A is eit (i.e, A7=A) Note tr A refers to the

trace of A(trA=)%_y aj;

Let X~N(p, X), X>0 a A is a symmetric matrix, then

(i) (X — py A(X —p)~ zŸ (tr A);

(ii) X'AX~y*(tr A; 6), ô=mwA;

if and only if AL is idempotent (i.e ALA= A)

Let

X X~N(u,Z), E>0, and x= Ì ;=("} bà 2)

for Xị:kx 1 the diJerence

[(X—wfE~'(X—m)—(X: =8 #i'(Xị — m.)]~ x0 —É).

Trang 9

(Q5) Let X~N(H,Đ)., then for A and B symmetric and idempotent

matrices, q,=X'AX and q,=X'BX are independent if and only if ALB=0

(Q6) Let X~ N(u,1,), then for Aa symmetric and idempotent matrix and

Bakxn matrix, then X'AX and BX are independent if BA=0 (Q7) Let X~N(u,1,) and Z~N(O,1,,) then for A and B symmetric

idempotent matrices

~F(tr A,trB; 6), d=p'Ap

15.4 Estimation

Let X=(X,,X., ,X7)' bea random sample from N(u, 5), i.e X,~ N(y, 2), f=1,2, , T X being a T xn matrix The likelihood function takes the form

L(O; X) = k(X ITU ~" (det Z)-? expt —3(X, ~ pL" '(X,—y)}]

1 T

=k(X)(2m) ""T °(đet cưa nai, 5 Y (xX wEaal,

i=1

(15.28)

[ T

log L(@; X)=c—" log 2n—F log (det X) )~5 5X 1X,—-p)

t=1

(15.29)

Since

TÍ;+—M/E- '(X„—w) =tr®~'!A + T(Xz+— g/*~ '{X; — m)

for

log L(@; X)=c* ~3 log(det X) 73 trẺ_~!A

(X¡— >7 '\(Xr— mg) (15.31)

Trang 10

15.4 Estimation 321

| :X %

x

ou

(108 HON) Ty 1 a0 2 B= LY (X,-Kj(X, Ky), (15.33)

exo} 2° 2 TT

A2 loo L(0: X Clog L(@; X T

=1

Hence, X; and Ê are the MLE of w and & respectively

(1) Properties

Looking at X,; and £ we can see that they correspond directly to the MLE’s

in the univariate case It turns out that the analogy between the univariate and multivariate cases extends to the properties of X; and L

In order to discuss the small sample properties of X; and ¥ we need their distributions Since X; is a linear function of normally distributed random vectors it is itself normally distributed

The distribution of £ is a direct generalisation of the chi-square distribution, the so-called Wishart distribution with T—1 degrees of freedom (see Appendix 24.1), i.e

From these we can deduce that E(X,)=y — unbiased estimator of w and E(Š)=[(T-— D/T]E - biased estimator of Ð S=[1/(T-— 1] >.,(X,—Xz) (X,—X¿Y is an unbiased estimator of ©

X¿ and Ê are independent and jointly suficient for (ụ,®)

(2) Useful distributions

Using the distribution (T— 1)S ~ W(X, T— 1) the following results relating

to the sample correlations can be derived (see Muirhead (1982)):

(i) Simple correlation

ry= Si; ; S=[s;;];,;, ijad,2, n (15.37)

SuŠj/

Trang 11

If

+ Vi;

6,;=9, (12) ae TMT 2)

ij

For M=[r,;];; when

#=diag(, ,Ø„„, —2]log(det(M)) ~ 77(4n(n—1)) — (15.38)

x

(ii) Multiple correlation

311 Under

In particular,

E(R?)= ( 7) r—1 › Vat(Ñ})=—>——_———— ar( ?) (T?—1)(T—1) ( 15.41 )

The distribution of R? when R¥0 is rather complicated and instead we commonly use its asymptotic distribution:

w- Đ(Ñ+— R?) ~ N(0,4R?(1— R?)?), 0<R?<1 (15.42)

On the other hand, under

x

A closely related sample equivalent to R? is the quantity

ap (Aaa A |

x,: 7x 1,X,: Txk

The sampling distribution of R? was derived by Fisher (1928) but it is far too complicated to be of direct interest Its mean, however, and variance are

of some interest

B(Ñ)=R?+z— R9+( r2 ¡8 (1—R2) +0(T~2)(15.45)

Trang 12

15.5 Hypothesis testing and confidence regions 323

Var R= Eo ;

(see Muirhead (1982)) On the O(-) notation (see Chapter 10) Hence, the mean of R? increases as k increases and for R?=0

E\Ñ?)=+—; +Oˆ?) (15.47)

(iil) Partial correlation

-1

S12 —§13933 S32

trả (Sy, — $1 3833 $31)°(S22 — 823833 83)"

Under

(—pt2.3)

15.5 Hypothesis testing and confidence regions

Hypothesis testing in the context of the multivariate normal distribution will form the backbone of testing in Part IV where the normal distribution plays a very important role

For expositional purposes let us consider an example of testing and confidence estimation in the context of the statistical model of Section 15.4 Consider the null hypothesis Hy: „=0 against H,: „#0 when X ¡s unknown Using the likelihood ratio test procedure with

max L(Ø; x)=c*(det Ê)-7?(det TÊ)~⁄7—"-exp{ —‡TH}, (15.50)

0cØ

max L{Ø; x)=c*(det(Ê + ;X;)-T7?(det TÊ)#⁄T~"~Đ exp{ —1Tn},

060,

(15.51)

we get

(15.52) where

Định dạng
Số trang	14
Dung lượng	305,08 KB