Given a random vectorε of independent variables εi with zero expected valueE[εi] = 0 and identical second and fourth moments.. More About the Univariate Case By definition,z is a standar
Trang 1CHAPTER 9
Random Matrices
The step from random vectors to random matrices (and higher order randomarrays) is not as big as the step from individual random variables to random vectors
We will first give a few quite trivial verifications that the expected value operator
is indeed a linear operator, and them make some not quite as trivial observationsabout the expected values and higher moments of quadratic forms
9.1 Linearity of Expected ValuesDefinition 9.1.1 LetZ be a random matrix with elementszij Then E[Z] isthe matrix with elements E[zij]
Trang 2Theorem 9.1.2 If A, B, and C are constant matrices, then E [AZB + C] =
A E[Z]B + C
Proof by multiplying out
Theorem 9.1.3 E [Z>] = (E[Z])>; E[trZ] = tr E[Z]
Theorem 9.1.4 For partitioned matrices E [
XY
] =E[X]E[Y]
.Special cases: If C is a constant, then E[C] = C, E[AX+ BY] = A E[X] +
B E[Y], and E[a ·X+ b ·Y] = a · E[X] + b · E[Y]
If X andY are random matrices, then the covariance of these two matrices is
a four-way array containing the covariances of all elements of X with all elements
of Y Certain conventions are necessary to arrange this four-way array in a dimensional scheme that can be written on a sheet of paper Before we developthose, we will first define the covariance matrix for two random vectors
two-Definition 9.1.5 The covariance matrix of two random vectors is defined as(9.1.1) C[x,y] = E[(x− E[x])(y− E[y])>]
Theorem 9.1.6 C[x,y] = E[xy>] − (E[x])(E[y])>
Theorem 9.1.7 C[Ax+ b, Cy+ d] = A C[x,y]C>
Trang 3Problem 152 Prove theorem9.1.7.
Theorem 9.1.8 C[
xy
,
uv
] =C[x,u] C[x,v]C[y,u] C[y,v]
.Special case: C[Ax+By, Cu+Dv] = A C[x,u]C>+A C[x,v]D>+B C[y,u]C>+
B C[y,v]D> To show this, express each of the arguments as a partitioned matrix,then use theorem9.1.7
Definition 9.1.9 V[x] = C[x,x] is called the dispersion matrix
It follows from theorem9.1.8that
com-ponent y Then the whole dispersion matrix V[y] exists
Trang 4Theorem 9.1.11 V[x] is singular if and only if a vector a exists so that a>x
is almost surely a constant
Proof: Call V[x] = ΣΣΣ Then ΣΣΣ singular iff an a exists with ΣΣΣa = o iff an a existswith a>ΣΣa = var[a>x] = 0 iff an a exists so that a>xis almost surely a constant
This means, singular random variables have a restricted range, their values arecontained in a linear subspace This has relevance for estimators involving singularrandom variables: two such estimators (i.e., functions of a singular random variable)should still be considered the same if their values coincide in that subspace in whichthe values of the random variable is concentrated—even if elsewhere their valuesdiffer
Problem154 [Seb77, exercise 1a–3 on p 13] Letx= [x1, ,xn]>be a vector
of random variables, and let y1 =x1 andyi =xi−xi−1 for i = 2, 3, , n Whatmust the dispersion matrix V[x] be so that the yi are uncorrelated with each otherand each have unit variance?
Trang 5Answer cov[xi, xj] = min(i, j).
Trang 6Proof Writeyas the sum of η andε=y− η; then
y>Ay= (ε+ η)>A(ε+ η)(9.2.2)
=ε>Aε+ε>Aη + η>Aε+ η>Aη(9.2.3)
η>Aη is nonstochastic, and since E[ε] = o it follows
Trang 7are writing now σ2Ψ = ΣΣΣ, follows E[yy>] = ηη>+ ΣΣΣ, therefore
" yy
#
=ηη+ ΣΣ ; therefore
Trang 8Answer Write y =y1 y2 yn> and Σ Σ Σ = diag(σ 2 σ 2 σ 2
n
) Then the vector y1− ¯ y y2− ¯ y yn− ¯ y> can be written as (I − n1ιι>)y. 1nιι> is idempotent, therefore D = I −n1ιι > is idempotent too Our estimator is n(n−1)1 y > Dy, and since the mean vector η = ιη satisfies Dη = o, theorem 9.2.1 gives
E[y>Dy] = tr[DΣ Σ Σ] = tr[Σ Σ Σ] − 1
ntr[ιι
>
Σ Σ]
Divide this by n(n − 1) to get (σ 2 + · · · + σ 2
n )/n 2 , which is var[¯ y], as claimed
For the variances of quadratic forms we need the third and fourth moments ofthe underlying random variables
Problem 156 Let µi= E[(y− E[y])i] be the ith centered moment ofy, and let
σ =√
µ2be its standard deviation Then the skewness is defined as γ1= µ3/σ3, andkurtosis is γ2= (µ4/σ4) − 3 Show that skewness and kurtosis of ay+ b are equal tothose of y if a > 0; for a < 0 the skewness changes its sign Show that skewness γ1and kurtosis γ2 always satisfy
Trang 9Answer Define ε = y − µ, and apply Cauchy-Schwartz for the variables ε and ε 2 :
(9.2.14) (σ3γ1)2= (E[ε3])2= cov[ε, ε2]2≤ var[ε] var[ε 2 ] = σ6(γ2 + 2)
the skewness and kurtosis of a random variable
Answer To show that all combinations satisfying this inequality are possible, define
Theorem 9.2.2 Given a random vectorε of independent variablesεi with zeroexpected value E[εi] = 0, and whose second and third moments are identical Callvar[ε] = σ2, and E[ε3] = σ3γ (where σ is the positive square root of σ2) Here γ is
Trang 10called the skewness of these variables Then the following holds for the third mixedmoments:
(9.2.16) E[εiεjεk] =
(
σ3γ1 if i = j = k
0 otherwiseand from (9.2.16) follows that for any n × 1 vector a and symmetric n × n matrices
C whose vector of diagonal elements is c,
One would like to have a matrix notation for (9.2.16) from which (9.2.17) follows by
a trivial operation This is not easily possible in the usual notation, but it is possible
Trang 12Since n ∆ C is the vector of diagonal elements of C, called c, the last term
in equation (9.2.21) is the scalar product a>c
Given a random vectorε of independent variables εi with zero expected valueE[εi] = 0 and identical second and fourth moments Call var[εi] = σ2 and E[ε4i] =
σ4(γ2+3), where γ2is the kurtosis Then the following holds for the fourth moments:
Trang 13is easy in tile notation:
matrices A and B, whose vectors of diagonal elements are a and b,
(9.2.24) E[(ε>Aε)(ε>Bε)] = σ4tr A tr B + 2 tr(AB) + γ a>b
Trang 14Answer ( 9.2.24 ) is an immediate consequence of ( 9.2.23 ); this step is now trivial due to linearity of the expected value:
A
B + σ4
A
B + γ2 σ4
A
∆
B
The first term is tr AB The second is tr AB>, but since A and B are symmetric, this is equal
to tr AB The third term is tr A tr B What is the fourth term? Diagonal arrays exist with any number of arms, and any connected concatenation of diagonal arrays is again a diagonal array, see ( B.2.1 ) For instance,
∆
∆
.
Trang 15From this together with ( B.1.4 ) one can see that the fourth term is the scalar product of the diagonal
and denote the vector of diagonal elements by a Let x= θ +εwhereεsatisfies theconditions of theorem 9.2.2and equation (9.2.23) Then
(9.2.27) var[x>Ax] = 4σ2θ>A2θ + 4σ3γ1θ>Aa + σ4γ2a>a + 2 tr(A2)
Answer Proof: var[x>Ax] = E[(x>Ax) 2 ] − (E[x>Ax 2 Since by assumption V[x] = σ 2 I, the second term is, by theorem 9.2.1 , (σ 2 tr A + θ>Aθ) 2 Now look at first term Again using the notation ε = x − θ it follows from ( 9.2.3 ) that
(x>Ax) 2 = (ε>Aε) 2 + 4(θ>Aε) 2 + (θ>Aθ) 2
(9.2.28)
+ 4ε>Aε θ>Aε + 2ε>Aε θ>Aθ + 4 θ>Aε θ>Aθ.
(9.2.29)
Trang 16We will take expectations of these terms one by one Use ( 9.2.24 ) for first term:
To deal with the second term in ( 9.2.29 ) define b = Aθ; then
(θ>Aε)2= (b>ε)2= b>ε εε ε>b = tr(b>ε εε ε>b) = tr(ε εε ε>bb>) (9.2.31)
E[(θ>Aε)2] = σ2tr(bb>) = σ2b>b = σ2θ>A2θ (9.2.32)
The third term is a constant which remains as it is; for the fourth term use ( 9.2.17 )
ε>Aε θ>Aε = ε>Aε >ε (9.2.33)
E[ε>Aε θ>Aε] = σ3γ1a>b = σ3γ1 a>Aθ
(9.2.34)
If one takes expected values, the fifth term becomes 2σ 2 tr(A) θ>Aθ, and the last term falls away Putting the pieces together the statement follows
Trang 17CHAPTER 10
The Multivariate Normal Probability Distribution
10.1 More About the Univariate Case
By definition,z is a standard normal variable, in symbols, z∼N(0, 1), if it hasthe density function
2πe−x2 +y22 In order to
Trang 18see that this joint density integrates to 1, go over to polar coordinates x = r cos φ,
y = r sin φ, i.e., compute the joint distribution ofrand φfrom that ofxandy: theabsolute value of the Jacobian determinant is r, i.e., dx dy = r dr dφ, therefore
−x2 +y22 dx dy =
Z 2π φ=0
Z ∞ r=0
12πe
√
2 =
√ π.
Trang 19
A univariate normal variable with mean µ and variance σ2is a variablexwhosestandardized version z = x−µσ ∼ N(0, 1) In this transformation from xto z, theJacobian determinant is dz
dis-tributed variable y ∼ N(µ, 1) Show that the sample mean ¯y is a sufficient tic for µ Here is a formulation of the factorization theorem for sufficient statis-tics, which you will need for this question: Given a family of probability densities
statis-fy(y1, , yn; θ) defined on Rn, which depend on a parameter θ ∈ Θ The statistic
T : Rn → R, y1, , yn 7→ T (y1, , yn) is sufficient for parameter θ if and only ifthere exists a function of two variables g : R × Θ → R, t, θ 7→ g(t; θ), and a function
of n variables h : Rn→ R, y1, , yn7→ h(y1, , yn) so that
Trang 2010.2 Definition of Multivariate NormalThe multivariate normal distribution is an important family of distributions withvery nice properties But one must be a little careful how to define it One mightnaively think a multivariate Normal is a vector random variable each component
of which is univariate Normal But this is not the right definition Normality ofthe components is a necessary but not sufficient condition for a multivariate normalvector If u =
xy
with both x and y multivariate normal, u is not necessarilymultivariate normal
Here is a recursive definition from which one gets all multivariate normal butions:
distri-(1) The univariate standard normalz, considered as a vector with one nent, is multivariate normal
compo-(2) Ifxandy are multivariate normal and they are independent, thenu=
xy
is multivariate normal
(3) If y is multivariate normal, and A a matrix of constants (which need not
be square and is allowed to be singular), and b a vector of constants, then Ay+ b
Trang 21is multivariate normal In words: A vector consisting of linear combinations of thesame set of multivariate normal variables is again multivariate normal.
For simplicity we will go over now to the bivariate Normal distribution
10.3 Special Case: Bivariate NormalThe following two simple rules allow to obtain all bivariate Normal randomvariables:
(1) If x and y are independent and each of them has a (univariate) normaldistribution with mean 0 and the same variance σ2, then they are bivariate normal.(They would be bivariate normal even if their variances were different and theirmeans not zero, but for the calculations below we will use only this special case, whichtogether with principle (2) is sufficient to get all bivariate normal distributions.)
(2) If x=
xy
is bivariate normal and P is a 2 × 2 nonrandom matrix and µ
a nonrandom column vector with two elements, then Px+ µ is bivariate normal aswell
All other properties of bivariate Normal variables can be derived from this.First let us derive the density function of a bivariate Normal distribution Write
Trang 22vector xis bivariate normal Take any nonsingular 2 × 2 matrix P and a 2 vector
(10.3.1) fx, y(x, y) = 1
2πσ2exp− 1
2σ2(x2+ y2)
For the next step, remember that we have to express the old variable in terms
of the new one: x = P−1(u− µ) The Jacobian determinant is therefore J =det(P−1) Also notice that, after the substitution x
nent in the joint density function of xand y is − 1
2σ 2(x2+ y2) = − 1
xy
>
xy
Trang 23functions gives
(10.3.2) fu, v(u, v) = 1
2πσ2
det(P−1) exp− 1
uv
] = σ2P P> = σ2Ψ, say Since P−1>P−1P P>= I,
it follows P−1>P−1= Ψ−1 and det(P−1) = 1/pdet(Ψ), therefore
(10.3.3) fu, v(u, v) = 1
2πσ2
1pdet(Ψ)exp
− 12σ2
(10.3.4) fx(x) = (2πσ2)−n/2(det Ψ)−1/2exp− 1
2σ2(x − µ)>Ψ−1(x − µ)
Problem 163 1 point Show that the matrix product of (P−1)>P−1 and P P>
is the identity matrix
Trang 24Problem 164 3 points All vectors in this question are n × 1 column vectors.Lety= α+ε, where α is a vector of constants andεis jointly normal with E[ε] = o.Often, the covariance matrix V[ε] is not given directly, but a n×n nonsingular matrix
T is known which has the property that the covariance matrix of Tε is σ2 times the
n × n unit matrix, i.e.,
Show that in this case the density function ofy is
(10.3.6) fy(y) = (2πσ2)−n/2|det(T )| exp− 1
2σ2 T (y − α)>
T (y − α).Hint: definez= Tε, write down the density function of z, and make a transforma-tion between z andy
Answer Since E[z] = o and V[z] = σ 2 In, its density function is (2πσ 2 ) −n/2 exp(−z > z/2σ 2 ) Now express z, whose density we know, as a function of y, whose density function we want to know.
z = T (y − α) or
z1 = t11(y1 − α1) + t12(y2 − α2) + · · · + t1n(yn − αn) (10.3.7)
(10.3.8)
zn = tn1(y1 − α1 ) + tn2(y1 − α2) + · · · + tnn(yn − αn) (10.3.9)
therefore the Jacobian determinant is det(T ) This gives the result.
Trang 2510.3.1 Most Natural Form of Bivariate Normal Density
most natural form For this we set the multiplicative “nuisance parameter” σ2= 1,i.e., write the covariance matrix as Ψ instead of σ2Ψ
• a 1 point Write the covariance matrix Ψ = V[
uv
] in terms of the standarddeviations σu and σv and the correlation coefficient ρ
• b 1 point Show that the inverse of a 2 × 2 matrix has the following form:
• c 2 points Show that
q2=
u− µ v− ν Ψ−1u− µ
v− ν
(10.3.11)
Trang 26• d 2 points Show the following quadratic decomposition:
• f 1 point Show that d =√det Ψ can be split up, not additively but tively, as follows: d = σu· σvp
The second factor in (10.3.15) is the density of a N(ρσv
σuu, (1 − ρ2)σ2) evaluated
at v, and the first factor does not depend on v Therefore if I integrate v out toget the marginal density of u, this simply gives me the first factor The conditionaldensity of v given u= u is the joint divided by the marginal, i.e., it is the second
Trang 27factor In other words, by completing the square we wrote the joint density function
in its natural form as the product of a marginal and a conditional density function:
fu,v(u, v) = fu(u) · fv|u(v; u)
From this decomposition one can draw the following conclusions:
• u∼N(0, σ2
u) is normal and, by symmetry,v is normal as well Note thatu
(orv) can be chosen to be any nonzero linear combination ofxandy Anynonzero linear transformation of independent standard normal variables istherefore univariate normal
• If ρ = 0 then the joint density function is the product of two independentunivariate normal density functions In other words, if the variables arenormal, then they are independent whenever they are uncorrelated Forgeneral distributions only the reverse is true
• The conditional density of v conditionally on u= u is the second term onthe rhs of (10.3.15), i.e., it is normal too
• The conditional mean is
(10.3.16) E[v|u= u] = ρσv
σuu,
Trang 28i.e., it is a linear function of u If the (unconditional) means are not zero,then the conditional mean is
which can also be written as
(10.3.20) var[v|u= u] = var[v] −(cov[u,v])
2
var[u] .
We did this in such detail because any bivariate normal with zero mean has thisform A multivariate normal distribution is determined by its means and variancesand covariances (or correlations coefficients) If the means are not zero, then thedensities merely differ from the above by an additive constant in the arguments, i.e.,
Trang 29if one needs formulas for nonzero mean, one has to replace u and v in the aboveequations by u − µu and v − µv du and dv remain the same, because the Jacobian
of the translation u 7→ u − µu, v 7→ v − µv is 1 While the univariate normal wasdetermined by mean and standard deviation, the bivariate normal is determined bythe two means µuand µv, the two standard deviations σuand σv, and the correlationcoefficient ρ
10.3.2 Level Lines of the Normal Density
of δ, the covariance matrix (??) has the form
σu2 σuσvcos δσuσvcos δ σ2
satisfies x>Ψ−1x = r2 The opposite holds too, all vectors x satisfying x>Ψ−1x =
r2 can be written in the form (10.3.22) for some φ, but I am not asking to provethis This formula can be used to draw level lines of the bivariate Normal densityand confidence ellipses, more details in (??)
Trang 30Problem 167 The ellipse in Figure1contains all the points x, y for which(10.3.23) x − 1 y − 1
0.5 −0.25
... wasdetermined by mean and standard deviation, the bivariate normal is determined bythe two means µuand µv, the two standard deviations σuand σv, and the... Trang 27factor In other words, by completing the square we wrote the joint density function
in its...
T (y − α).Hint: definez= Tε, write down the density function of z, and make a transforma-tion between z andy
Answer Since E[z] = o and V[z] = σ In, its density function