Besides the simple correlation coefficient ρxybetween two scalar variablesy and x, one can also define the squared multiple correlation coefficient ρ2yxbetween onescalar variabley and a
Trang 1CHAPTER 19
Digression about Correlation Coefficients
19.1 A Unified Definition of Correlation Coefficients
Correlation coefficients measure linear association The usual definition of thesimple correlation coefficient between two variables ρxy (sometimes we also use thenotation corr[x,y]) is their standardized covariance
Trang 2Problem254 Given the constant scalars a 6= 0 and c 6= 0 and b and d arbitrary.Show that corr[x,y] = ± corr[ax+ b, cy+ d], with the + sign being valid if a and chave the same sign, and the − sign otherwise.
Answer Start with cov[ax + b, cy + d] = ac cov[x, y] and go from there
Besides the simple correlation coefficient ρxybetween two scalar variablesy and
x, one can also define the squared multiple correlation coefficient ρ2y(x)between onescalar variabley and a whole vector of variablesx, and the partial correlation coef-ficient ρ12 xbetween two scalar variablesy1 andy2, with a vector of other variables
x “partialled out.” The multiple correlation coefficient measures the strength of
a linear association between y and all components of x together, and the partialcorrelation coefficient measures the strength of that part of the linear associationbetweeny1andy2which cannot be attributed to their joint association withx Onecan also define partial multiple correlation coefficients If one wants to measure thelinear association between two vectors, then one number is no longer enough, butone needs several numbers, the “canonical correlations.”
The multiple or partial correlation coefficients are usually defined as simple relation coefficients involving the best linear predictor or its residual But all these
Trang 3cor-correlation coefficients share the property that they indicate a proportionate tion in the MSE See e.g [Rao73, pp 268–70] Problem 255 makes this point forthe simple correlation coefficient:
reduc-Problem 255 4 points Show that the proportionate reduction in the MSE ofthe best predictor of y, if one goes from predictors of the form y∗ = a to predictors
of the formy∗= a + bx, is equal to the squared correlation coefficient betweeny and
x You are allowed to use the results of Problems229and240 To set notation, callthe minimum MSE in the first prediction (Problem 229) MSE[constant term;y], andthe minimum MSE in the second prediction (Problem 240) MSE[constant term and
2
yx
Answer The minimum MSE with only a constant is var[y] and (18.2.32) says that MSE[constant term and x; y] = var[y]−(cov[x,y]) 2 / var[x] Therefore the difference in MSE’s is (cov[x, y]) 2 / var[x and if one divides by var[y] to get the relative difference, one gets exactly the squared correlation
Trang 4Multiple Correlation Coefficients Now assume x is a vector whiley remains ascalar Their joint mean vector and dispersion matrix are
(19.1.3)
xy
∼µν
, σ2ΩΩxx ωxy
ω>xy ωyy
By theorem ??, the best linear predictor of y based onxhas the formula
(19.1.4) y∗= ν + ωω>xyΩ−xx(x− µ)
y∗ has the following additional extremal value property: no linear combination b>x
has a higher squared correlation with ythany∗ This maximal value of the squaredcorrelation is called the squared multiple correlation coefficient
>
xyΩ−xxωxy
ωyyThe multiple correlation coefficient itself is the positive square root, i.e., it is alwaysnonnegative, while some other correlation coefficients may take on negative values
The squared multiple correlation coefficient can also defined in terms of tionate reduction in MSE It is equal to the proportionate reduction in the MSE ofthe best predictor of y if one goes from predictors of the form y∗= a to predictors
Trang 5propor-of the formy∗= a + b>x, i.e.,
(19.1.6) ρ2y(x)= MSE[constant term;y] − MSE[constant term andx;y]
MSE[constant term;y]There are therefore two natural definitions of the multiple correlation coefficient.These two definitions correspond to the two formulas for R2in (18.3.6)
Partial Correlation Coefficients Now assume y =
y1 y2> is a vector withtwo elements and write
Trang 6The partial correlation coefficient can be defined as the relative reduction in theMSE if one adds y2 toxas a predictor ofy1:
(19.1.8)
ρ212.x=MSE[constant term andx;y2] − MSE[constant term,x, andy1;y2]
MSE[constant term and x;y2] .
Problem256 Using the definitions in terms of MSE’s, show that the followingrelationship holds between the squares of multiple and partial correlation coefficients:
(19.1.11) ρ22(x,1)= ρ22(x)+ (1 − ρ22(x))ρ221.x
An alternative proof of (19.1.11) is given in [Gra76, pp 116/17]
Trang 7Mixed cases: One can also form multiple correlations coefficients with some ofthe variables partialled out The dot notation used here is due to Yule, [Yul07] Thenotation, definition, and formula for the squared correlation coefficient is
ρ2y(x z= MSE[constant term andz;y] − MSE[constant term,z, andx;y]
MSE[constant term and z;y](19.1.12)
=ω
>
xy zΩ−xx.zωxy.z
ωyy.z(19.1.13)
19.2 Correlation Coefficients and the Associated Least Squares ProblemOne can define the correlation coefficients also as proportionate reductions inthe objective functions of the associated GLS problems However one must reversepredictor and predictand, i.e., one must look at predictions of a vector x by linearfunctions of a scalar y
Here it is done for multiple correlation coefficients: The value of the GLS tive function if one predictsxby the best linear predictorx∗, which is the minimumattainable when the scalar observation y is given and the vector x can be chosen
Trang 8objec-freely, as long as it satisfies the constraint x = µ + ΩΩxxq for some q, is
On the other hand, the value of the GLS objective function when one predicts
xby the best constant x = µ is
y − ν
=
(19.2.3) = (y − ν)>ωyy−.x(y − ν)
The proportionate reduction in the objective function is
(19.2.4) SSE[y; x = µ] −SSE[y; best x]
Trang 919.3 Canonical CorrelationsNow what happens with the correlation coefficients if both predictor and predic-tand are vectors? In this case one has more than one correlation coefficient One firstfinds those two linear combinations of the two vectors which have highest correlation,then those which are uncorrelated with the first and have second highest correlation,and so on Here is the mathematical construction needed:
Letxandybe two column vectors consisting of p and q scalar random variables,respectively, and let
xy
] = σ2ΩΩxx Ωxy
Ωyx Ωyy
,
where ΩΩxx and ΩΩyyare nonsingular, and let r be the rank of ΩΩxy Then there existtwo separate transformations
] = σ2 Ip Λ
Λ> Iq
Trang 10
where Λ is a (usually rectangular) diagonal matrix with only r diagonal elementspositive, and the others zero, and where these diagonal elements are sorted in de-scending order.
Proof: One obtains the matrix Λ by a singular value decomposition of ΩΩ−1/2xx ΩxyΩ−1/2yy =
A, say Let A = P>ΛQ be its singular value decomposition with fully orthogonalmatrices, as in equation (A.9.8) Define L = P ΩΩ−1/2xx and M = QΩΩ−1/2yy Therefore
LΩΩxxL>= I, MΩΩyyM>= I, and LΩΩxyM>= P ΩΩ−1/2xx ΩxyΩ−1/2yy Q>= P AQ>=Λ
The next problems show how one gets from this the maximization property ofthe canonical correlation coefficients:
Problem 257 Show that for every p-vector l and q-vector m,
(19.3.4)
corr(l>x, m>y)
≤ λ1where λ1 is the first (and therefore biggest) diagonal element of Λ Equality in(19.3.4) holds if l = l1, the first row in L, and m = m1, the first row in M
Answer: If l or m is the null vector, then there is nothing to prove If neither ofthem is a null vector, then one can, without loss of generality, multiply them withappropriate scalars so that p = (L−1)>l and q = (M−1)>m satisfy p>p = 1 and
Trang 11> o>
o> q>
uv
] = σ2p> o>
Since the matrix at the righthand side has ones in the diagonal, it is the correlation
matrix, i.e., p>Λq = corr(l>x, m>y) Therefore (19.3.4) follows from Problem258
Problem 258 IfP p2
i =P q2
i = 1, and λi≥ 0, show that |P piλiqi| ≤ max λi
Hint: first get an upper bound for |P piλiqi| through a Cauchy-Schwartz-type
Problem 259 Show that for every p-vector l and q-vector m such that l>xis
uncorrelated with l>1x, and m>y is uncorrelated with m>1y,
(19.3.6)
corr(l>x, m>y)
≤ λ2where λ2 is the second diagonal element of Λ Equality in (19.3.6) holds if l = l2,
the second row in L, and m = m2, the second row in M
Answer If l or m is the null vector, then there is nothing to prove If neither of them is a
null vector, then one can, without loss of generality, multiply them with appropriate scalars so that
Trang 12p = (L−1)>l and q = (M−1)>m satisfy p>p = 1 and q>q = 1 Now write e 1 for the first unit
vector, which has a 1 as first component and zeros everywhere else:
(19.3.7) cov[l>x, l>1x] = cov[p>Lx, e>1Lx] = p>Λe 1 = p>e 1 λ 1
This covariance is zero iff p 1 = 0 Furthermore one also needs the following, directly from the proof
p> o>
o> q>
u v
] = σ 2
Since the matrix at the righthand side has ones in the diagonal, it is the correlation matrix, i.e.,
p > Λq = corr(l>x, m > y) Equation (19.3.6) follows from Problem 258 if one lets the subscript i
Problem 260 (Not eligible for in-class exams) Extra credit question for good
mathematicians: Reformulate the above treatment of canonical correlations without
the assumption that ΩΩxx and ΩΩyy are nonsingular
19.4 Some Remarks about the Sample Partial Correlation Coefficients
The definition of the partial sample correlation coefficients is analogous to that of
the partial population correlation coefficients: Given two data vectorsyand z, and
the matrix X (which includes a constant term), and let M = I −X(X>X)−1X>be
Trang 13the “residual maker” with respect to X Then the squared partial sample correlation
is the squared simple correlation between the least squares residuals:
>My)2(z>Mz)(y>My)
Alternatively, one can define it as the proportionate reduction in theSSE Although
X is assumed to incorporate a constant term, I am giving it here separately, in order
to show the analogy with (19.1.8):
(19.4.2)
rzy2 .X= SSE[constant term and X;y] −SSE[constant term, X, andz;y]
SSE[constant term and X;y] .
[Gre97, p 248] considers it unintuitive that this can be computed usingt-statistics.Our approach explains why this is so First of all, note that the square of the t
statistic is theF-statistic Secondly, the formula for theF-statistic for the inclusion
ofz into the regression is
(19.4.3)
t2= F = SSE[constant term and X;y] −SSE[constant term, X, andz;y]
SSE[constant term, X, andz;y]/(n − k − 1)
Trang 14This is very similar to the formula for the squared partial correlation coefficient.From (19.4.3) follows
(19.4.4) F + n − k − 1 = SSE[constant term and X;y](n − k − 1)
SSE[constant term, X, andz;y]and therefore
F + n − k − 1which is [Gre97, (6-29) on p 248]
It should also be noted here that [Gre97, (6-36) on p 254] is the sample alent of (19.1.11)
Trang 15equiv-CHAPTER 20
Numerical Methods for computing OLS Estimates
20.1 QR DecompositionOne precise and fairly efficient method to compute the Least Squares estimates
is the QR decomposition It amounts to going over to an orthonormal basis in R[X]
It uses the following mathematical fact:
Every matrix X, which has full column rank, can be decomposed in the product
of two matrices QR, where Q has the same number of rows and columns as X, and
is “suborthogonal” or “incomplete orthogonal,” i.e., it satisfies Q>Q = I The otherfactor R is upper triangular and nonsingular
To construct the least squares estimates, make a QR decomposition of the matrix
of explanatory variables X (which is assumed to have full column rank) With
Trang 16X = QR, the normal equations read
Problem261 2 points You have a QR-decomposition X = QR, where Q>Q =
I, and R is upper triangular and nonsingular For an estimate of V[βˆ] you need(X>X)−1 How can this be computed without computing X>X? And why wouldyou want to avoid computing X>X?
X>X = R>Q>QR = R>R, its inverse is therefore R−1R−1>
Trang 17Problem 262 Compute the QR decomposition of
where q>1q1= q>2q2= q>3q3= 1 and q>1q2= q>1q3 = q>2q3 = 0 First column: x 1 = q1r 11 and
q1 must have unit length This gives q>1 =1/2 1/2 1/2 1/2and r 11 = 2 Second column: (20.1.8) x 2 = q1r 12 + q2r 22
Trang 18q2 =−1/2 1/2 −1/2 1/2and r 22 = 4 The rest remains a homework problem But I am
Problem 263 2 points Compute trace and determinant of
Trang 19above, we have now
To prove this, and also for the numerical procedure, we will build Q> as theproduct of several orthogonal matrices, each converting one column of X into onewith zeros below the diagonal
First note that for every vector v, the matrix I −v>2vvv> is orthogonal Given
X, let x be the first column of X If x = o, then go on to the next column
otherwise (Mathematically, either σ − +1 or σ = −1 would do; but if one gives σthe same sign as x , then the first element of v gets largest possible absolute value,
Trang 20which improves numerical accuracy.) Then
v>v = (x211+ 2σx11
√
x>x + x>x) + x221+ · · · + x2n1(20.2.2)
= 2(x>x + σx11
√
x>x)(20.2.3)
v>x = x>x + σx11
√
x>x(20.2.4)
.0
> vv> This generates zeros below the diagonal Instead of writing
Trang 21the zeros into that matrix, it uses the “free” space to store the vector v There isalmost enough room; the first nonzero element of v must be stored elsewhere This
is why the QR decomposition in Splus has two main components: qr is a matrixlike a, and qraux is a vector of length ncols(a)
LINPACK does not use or store exactly the same v as given here, but uses
u = v/(σ√
x>x) instead The normalization does not affect the resulting orthogonaltransformation; its advantage is that the leading element of each vector, that which
is stored in qraux, is at the same time equal u>u/2 In other words, qraux doubles
up as the divisor in the construction of the orthogonal matrices
In Splus type help(qr) At the end of the help file a program is given whichshows how the Q might be constructed from the fragments qr and qraux
... 263 points Compute trace and determinant of Trang 19above, we have now
To prove this, and also... generates zeros below the diagonal Instead of writing
Trang 21the zeros into that matrix, it uses the... nonsingular
19.4 Some Remarks about the Sample Partial Correlation Coefficients
The definition of the partial sample correlation coefficients is analogous to that of
the partial