CHAPTER 15* The multivariate normal distribution 15.1 Multivariate distributions The multivariate normal distribution is by far the most important distribution in statistical inference
Trang 1CHAPTER 15*
The multivariate normal distribution
15.1 Multivariate distributions
The multivariate normal distribution is by far the most important distribution in statistical inference for a variety of reasons including the fact that some of the statistics based on sampling from such a distribution have tractable distributions themselves It forms the backbone of Part IV on statistical models in econometrics and thus a closer study of this distribution wiil greatly simplify the discussion that follows Before we
consider the multivariate normal distribution, however, let us introduce
some notation and various simple results related to random vectors and their distributions in general
Let X=(X,, X>, , X,)’ be an nx 1 random vector defined on the probability space (S, am? The mean vector E(X) is defined by
LÓ k(X Bị E(X)= L ; ae =n, annx I vector (15.1)
‘|
E(X,) | and the covariance matrix Cov(X) by
Cov(X) = E(X — m\(X — m))
L Var(X,),CowX,X;) - Cov(X,Xj,) |
_ | CovXaX) Var(X,) COUR: 2X, " =x (152)
| Cov(X,,X 1), CovX,X;) -'' Var(X,) |
Trang 215.1 Muitivariate distribution 313 where ¥ is an nx n symmetric non-negative definite matrix, 1.e L'=X and
a Xa20 for any eR" The ith element of © is
6j;= E(X;- nM MX;— pu), LJ= 1,2, m (15.3)
In relation to z#Z>0 we can show that If there exIsts an ze ", z0 sụch that Var(«’X)=a’Za=0 then Pr(a’K=c)=1 where c is a constant (only constants have zero variance), i.e there is a linear relationship holding among the r.v.’s X,, , X, with probability one
Lemma 15.1
If the random vector X has a continuous density function then &>0 This is because Pr(a’X=c)=0 for all ~ and c in this case Lemma 15.2
If X has mean p and covariance X for Z=AX+b
(i) E(Z)= AE(X)+b=Ap+b;
(ii) Cov(Z)= E[(AX +b —(Ap+b))(AX + b—(Ap +b))]
= AE(X — p)(X — p)'A’= ADA’
Let X and Y be nx 1 and mx 1 random vectors with E(X)=
E(Y)= Hy, then
Cov(X,Y)= [(Cov(X,Y,)„] = EL(X —m,)(Ý —mJ] (154)
Correlation
So far correlation has been defined for random variables (r.v.’s) only and the question arises whether it can be generalised to random vectors Let
E(X)=0 (without any loss of generality), Cov(X) = LZ, X nx 1, and partition
X into
X
Define Z=2'X, and let us consider the correlation between X, and Z
Ø1;
đị
oi (a! 22a)?
This is maximised for the value of « which minimises
(see Chapter 7), which is = ¿7 ø;; and we deñne the mulriple correlation
Trang 3314 The multivariate normal distribution
coefficient to be
L
(22% 2021) >Corr(X;,.zX;), 0<R<l (15.8)
G41
R=
In the case where
x-[ } X,:kxI, X:(n—k)x 1, t~{ „ "), k>1,
we could define the r.v.s Z,=z¡X; and Z;=z;X; whose correlation coefficient is
a Ly 9a
Cort(Z,, Z,)=———" 44 1 (15.9)
m3 (%1Xi1#1)?(%25>22#2)?
From the above inequality it follows that for 2,=Z7}2,,0,
(222772
(a2, ,@,)?
Ly; 2,.257L,, has at most k non-zero eigenvalues which measure the association between X, and X, and are called canonical correlations Let
X=|[ X,] > where X3:(n—2)x 1
G1, G12 G13
L=] 023 G22 G23
đại G32 233
Another form of correlation of interest in this context is the correlation between X, and X, given that the effect of X, is taken away For this we form the r.v.’s
Y,=X,-b,X; and Y,=X,—b,X, and Corr(Y¡,Y¿)
is maximised by b; =X33'63, and b,=;,'63, as seen above Hence we define the partial correlation coefficient between X, and X , given X, to be
Ới 264323304)
es [oy 64323363; "L027 — 073233 632]? vee
(15.11)
Trang 415.2 The multivariate normal distribution 315
15.2 The multivariate normal distribution
The univariate normal density function discussed above was of the form
ƒ(xị H,ø2)=(2nø') ÌeXp4 —+ 3 (X—H) ¿: (15.12)
The density funcuion of X=(X;, X;, X„} when the X,s are HD normally distributed r.v.s was shown to be of the form
n
fxs w07)= [] fixe 4,07)
i=1
=0)""10)ˆ"2exp| 2, ; = (15.13)
i=l
Similarly, the density function of X when the X;s are only independent, i.e X,~ Nu;, 07), i= 1, 2, , , takes the form
ƒ(X: Hị, Hạ, Ø1 On On) =[] Ses s2)
1 12 c—n N2
=0) "øl,ø} o0 lep| ~5 y (s") L (15.14)
Go;
¡=1
Comparing the above three density functions we can discern a pattern developing which is very suggestive for the density function of an arbitrary normal vector X with E(X)= yp and Cov(X)= 2, which takes the form
f(x; WZ) =(2n) "(det E)”?exp{ —3xT— 8# ˆ!'x—g)}, (15.15)
and we write X ~ N(, £) If the Xs are IID r.v.’s L=o7I, and (det L) =(07)"
On the other hand, if the Xs are independent but not identically distributed
n
L=diag(o?, ,07) and (det ¥)= [] (o o?, ,0),
In the case of n=2
(detE)=ø7ø3(1—pø?)>0_ for —l<ø<l
>
Trang 5316 The multivariate normal distribution
and
1 —p
(—p*) } -p 1
O10, 93 Thus the bivariate normal density function is
22/4 2\7-3 I—ø?) 1
xIÍ- — | —2ø| ——— || —— “ |+Í ——
(15.17) (see Chapter 6) The standard bivariate density function can be obtained by defining the new r.v.’s
X.—H,
z=(” “) i= 1,2,
G;
whose density function is
(I—ø?)? ị (I—ø?)"!
A (21,225 p)y=— 2n > (cf —2pz,2 +23)
(15.18)
(1) Properties
(NI) — Let X~N(p,X) then Y =(AX +b)~ N(Aut+b, ALA’) for A:mxn
and b: mx 1 constant matrices e.g if ¥ =cX,c#0, Y ~ N(cp, c72) This property shows that if X is normally distributed then any linear function of X is also normally distributed
(N2) Let X,~N(m,,2,), t=1, 2, ., T; be independently distributed
random vectors, then for any arbitrary constant matrices A,, t= 1,2,
¬
(x AX.) -XÍ Am lA#Aj)
The converse also holds If the X,s are IID then pg, = w, £,=2,t=1,2, , T;
and
=>} X.|~N|m—>]
(zŠ%)e=z>)
Trang 615.2 The multivariate normal distribution 317
(N3) Let X~ N( 2) then the X;s are independent if and only if o;;=0,
i#j,ij=1,2, ,n, ie L=diag(o,,, , ¢,,) In general, zero covariance does not imply independence but in the case of normality the two are equivalent
(N4) 1ƒ X~ Nựu,) then the marginal distribution of any k x 1 subset X,
where
(2) mt) eB) 2 Hạ x, 2»
X,~N(H,,2,,) This follows from property N1 for A=(I,: 9), (k xn), b=0 Similarly, X; ~ N(Hạ, Ð;;)
These can be verified directly using
f(%5 6) = | fo @)dx, and ƒ(x;;Ø;)= | 0 6) dx,,
although the manipulations involved are rather too cumbersome Taking k=1, this property implies that each component of X~ N(#,Z) is also normally distributed; the converse, however, is not true
(N5) For the same partition of X considered in N4 the conditional
distribution of X, given X, takes the form
(X1/X2), Nw + Ey 2h 37 (X2— Ha), Xịi —Xị¿Š2; Fại) This follows from property N1 for
I, —2,.E52
0 L,-: nt
since
AC NI CouAx)=( nen Đi 0 )
(15.20) From this we can deduce that if &,,=0 then X, and X, are independent since (X,/X)~ N(#,, £,,) Moreover, forany Z, »,(X, —Z,,257X,)andX, are independent given that their covariance is zero Similarly, (X,/X,)~ N(#p+23,2),'(X, — 44), L22-L 2,277), ») In the case n=2
(X/XaxNỤu +p (X2— 1H) 71 1?) (15.21)
2
These results can be verified using the formula
Trang 7318 The multivariate normal distribution
N5 suggests that the regression function
E(X,/X,=x,)=4, + 2,225) (x.—g)) is linear in x, (15.23) and the skedasticity function
Cov(X,/X;=x;)=E2E,2E22E,, 1S ƒfee 0ƒ X¿
(15.24) These are very important properties of the multivariate normal distribution and will play a crucial role in Part IV
(2) Multiple correlation
Without any loss of generality let X~N(0,%), X: nx 1 and define the partition
x=(x') p=(% 2),
where X,: (n—1)x 1 and X,:1x 1 The squared multiple correlation coefficient takes the form
_Var(X,/X2) đizE22đai
(3) Partial correlation
Let X, be partitioned further into
X
x;={ ) X;:1xI, X;:(n—2)x I
X;
with
G1, ly ƠI
22> y ) = G2; G22 G23
G32 33
53, 637 X33 The partial correlation between X, and X, given X, takes the form
Cov(X;,X;/ÄX:) (1937 [VarlXu/XJIVaxX¿/X;JƑ
42-4 134533032
[oy — 1323363; ]*[022 623233 032)
Trang 8with
(Q2)
(Q4)
(x)-M5 We = 61323363; 012-61 3233 632 )
X;/X: 673033 X3/\ 621 —073253632 922-6 23233 932
(15.27)
Quadratic forms related to the normal distribution
Let X~ N(u, X), where U>p, X:nx 1, then
(i) (X ~ py EO '(X —p)~ 77(n) — chi-square;
(ii) XE: !X~zŸ(n; ò) — non-central chi-square;
where 6=p'X"'p These results depend crucially on X being a positive definite matrix because for L>0 there exists a non-singular matrix H, 2=HH'=Z=H™ x p)~ N(O,I,), ie the Zjs are independent and (X~p) =" '( j= S?_, Z? Similarly for (ii) For the MLE of un,
i=(> y X)~AÍz 72) from N2
t=1
> aA ˆ
Ti wD w)~ zŸ(n)
Let X~ Nịu, l„), then for À a symmetric (A'= À) matrix
(i) (X — py A(X — pt) ~ 77 (tr A);
(ii) X’AX~y*(tr A; 0), =wAh,
if and only if A is eit (i.e, A7=A) Note tr A refers to the
trace of A(trA=)%_y aj;
Let X~N(p, X), X>0 a A is a symmetric matrix, then
(i) (X — py A(X —p)~ zŸ (tr A);
(ii) X'AX~y*(tr A; 6), ô=mwA;
if and only if AL is idempotent (i.e ALA= A)
Let
X X~N(u,Z), E>0, and x= Ì ;=("} bà 2)
for Xị:kx 1 the diJerence
[(X—wfE~'(X—m)—(X: =8 #i'(Xị — m.)]~ x0 —É).
Trang 9320 The multivariate normal distribution
(Q5) Let X~N(H,Đ)., then for A and B symmetric and idempotent
matrices, q,=X'AX and q,=X'BX are independent if and only if ALB=0
(Q6) Let X~ N(u,1,), then for Aa symmetric and idempotent matrix and
Bakxn matrix, then X'AX and BX are independent if BA=0 (Q7) Let X~N(u,1,) and Z~N(O,1,,) then for A and B symmetric
idempotent matrices
~F(tr A,trB; 6), d=p'Ap
15.4 Estimation
Let X=(X,,X., ,X7)' bea random sample from N(u, 5), i.e X,~ N(y, 2), f=1,2, , T X being a T xn matrix The likelihood function takes the form
L(O; X) = k(X ITU ~" (det Z)-? expt —3(X, ~ pL" '(X,—y)}]
1 T
=k(X)(2m) ""T °(đet cưa nai, 5 Y (xX wEaal,
i=1
(15.28)
[ T
log L(@; X)=c—" log 2n—F log (det X) )~5 5X 1X,—-p)
t=1
(15.29)
Since
TÍ;+—M/E- '(X„—w) =tr®~'!A + T(Xz+— g/*~ '{X; — m)
for
log L(@; X)=c* ~3 log(det X) 73 trẺ_~!A
(X¡— >7 '\(Xr— mg) (15.31)
Trang 1015.4 Estimation 321
| :X %
x
ou
(108 HON) Ty 1 a0 2 B= LY (X,-Kj(X, Ky), (15.33)
exo} 2° 2 TT
A2 loo L(0: X Clog L(@; X T
=1
Hence, X; and Ê are the MLE of w and & respectively
(1) Properties
Looking at X,; and £ we can see that they correspond directly to the MLE’s
in the univariate case It turns out that the analogy between the univariate and multivariate cases extends to the properties of X; and L
In order to discuss the small sample properties of X; and ¥ we need their distributions Since X; is a linear function of normally distributed random vectors it is itself normally distributed
The distribution of £ is a direct generalisation of the chi-square distribution, the so-called Wishart distribution with T—1 degrees of freedom (see Appendix 24.1), i.e
From these we can deduce that E(X,)=y — unbiased estimator of w and E(Š)=[(T-— D/T]E - biased estimator of Ð S=[1/(T-— 1] >.,(X,—Xz) (X,—X¿Y is an unbiased estimator of ©
X¿ and Ê are independent and jointly suficient for (ụ,®)
(2) Useful distributions
Using the distribution (T— 1)S ~ W(X, T— 1) the following results relating
to the sample correlations can be derived (see Muirhead (1982)):
(i) Simple correlation
ry= Si; ; S=[s;;];,;, ijad,2, n (15.37)
SuŠj/
Trang 11322 The multivariate normal distribution
If
+ Vi;
6,;=9, (12) ae TMT 2)
ij
For M=[r,;];; when
#=diag(, ,Ø„„, —2]log(det(M)) ~ 77(4n(n—1)) — (15.38)
x
(ii) Multiple correlation
311 Under
In particular,
E(R?)= ( 7) r—1 › Vat(Ñ})=—>——_———— ar( ?) (T?—1)(T—1) ( 15.41 )
The distribution of R? when R¥0 is rather complicated and instead we commonly use its asymptotic distribution:
w- Đ(Ñ+— R?) ~ N(0,4R?(1— R?)?), 0<R?<1 (15.42)
On the other hand, under
x
A closely related sample equivalent to R? is the quantity
ap (Aaa A |
x,: 7x 1,X,: Txk
The sampling distribution of R? was derived by Fisher (1928) but it is far too complicated to be of direct interest Its mean, however, and variance are
of some interest
B(Ñ)=R?+z— R9+( r2 ¡8 (1—R2) +0(T~2)(15.45)
Trang 1215.5 Hypothesis testing and confidence regions 323
Var R= Eo ;
(see Muirhead (1982)) On the O(-) notation (see Chapter 10) Hence, the mean of R? increases as k increases and for R?=0
E\Ñ?)=+—; +Oˆ?) (15.47)
(iil) Partial correlation
-1
S12 —§13933 S32
trả (Sy, — $1 3833 $31)°(S22 — 823833 83)"
Under
(—pt2.3)
15.5 Hypothesis testing and confidence regions
Hypothesis testing in the context of the multivariate normal distribution will form the backbone of testing in Part IV where the normal distribution plays a very important role
For expositional purposes let us consider an example of testing and confidence estimation in the context of the statistical model of Section 15.4 Consider the null hypothesis Hy: „=0 against H,: „#0 when X ¡s unknown Using the likelihood ratio test procedure with
max L(Ø; x)=c*(det Ê)-7?(det TÊ)~⁄7—"-exp{ —‡TH}, (15.50)
0cØ
max L{Ø; x)=c*(det(Ê + ;X;)-T7?(det TÊ)#⁄T~"~Đ exp{ —1Tn},
060,
(15.51)
we get
(15.52) where