Class Notes in Statistics and Econometrics Part 10 ppt

Besides the simple correlation coefficient ρxybetween two scalar variablesy and x, one can also define the squared multiple correlation coefficient ρ2yxbetween onescalar variabley and a

Trang 1

CHAPTER 19

Digression about Correlation Coefficients

19.1 A Unified Definition of Correlation Coefficients

Correlation coefficients measure linear association The usual definition of thesimple correlation coefficient between two variables ρxy (sometimes we also use thenotation corr[x,y]) is their standardized covariance

Trang 2

Problem254 Given the constant scalars a 6= 0 and c 6= 0 and b and d arbitrary.Show that corr[x,y] = ± corr[ax+ b, cy+ d], with the + sign being valid if a and chave the same sign, and the − sign otherwise.

Answer Start with cov[ax + b, cy + d] = ac cov[x, y] and go from there

Besides the simple correlation coefficient ρxybetween two scalar variablesy and

x, one can also define the squared multiple correlation coefficient ρ2y(x)between onescalar variabley and a whole vector of variablesx, and the partial correlation coef-ficient ρ12 xbetween two scalar variablesy1 andy2, with a vector of other variables

x “partialled out.” The multiple correlation coefficient measures the strength of

a linear association between y and all components of x together, and the partialcorrelation coefficient measures the strength of that part of the linear associationbetweeny1andy2which cannot be attributed to their joint association withx Onecan also define partial multiple correlation coefficients If one wants to measure thelinear association between two vectors, then one number is no longer enough, butone needs several numbers, the “canonical correlations.”

The multiple or partial correlation coefficients are usually defined as simple relation coefficients involving the best linear predictor or its residual But all these

Trang 3

cor-correlation coefficients share the property that they indicate a proportionate tion in the MSE See e.g [Rao73, pp 268–70] Problem 255 makes this point forthe simple correlation coefficient:

reduc-Problem 255 4 points Show that the proportionate reduction in the MSE ofthe best predictor of y, if one goes from predictors of the form y∗ = a to predictors

of the formy∗= a + bx, is equal to the squared correlation coefficient betweeny and

x You are allowed to use the results of Problems229and240 To set notation, callthe minimum MSE in the first prediction (Problem 229) MSE[constant term;y], andthe minimum MSE in the second prediction (Problem 240) MSE[constant term and

2

yx

Answer The minimum MSE with only a constant is var[y] and (18.2.32) says that MSE[constant term and x; y] = var[y]−(cov[x,y]) 2 / var[x] Therefore the difference in MSE’s is (cov[x, y]) 2 / var[x and if one divides by var[y] to get the relative difference, one gets exactly the squared correlation

Trang 4

Multiple Correlation Coefficients Now assume x is a vector whiley remains ascalar Their joint mean vector and dispersion matrix are

(19.1.3)

xy

∼µν

, σ2ΩΩxx ωxy

ω>xy ωyy

By theorem ??, the best linear predictor of y based onxhas the formula

(19.1.4) y∗= ν + ωω>xyΩ−xx(x− µ)

y∗ has the following additional extremal value property: no linear combination b>x

has a higher squared correlation with ythany∗ This maximal value of the squaredcorrelation is called the squared multiple correlation coefficient

>

xyΩ−xxωxy

ωyyThe multiple correlation coefficient itself is the positive square root, i.e., it is alwaysnonnegative, while some other correlation coefficients may take on negative values

The squared multiple correlation coefficient can also defined in terms of tionate reduction in MSE It is equal to the proportionate reduction in the MSE ofthe best predictor of y if one goes from predictors of the form y∗= a to predictors

Trang 5

propor-of the formy∗= a + b>x, i.e.,

(19.1.6) ρ2y(x)= MSE[constant term;y] − MSE[constant term andx;y]

MSE[constant term;y]There are therefore two natural definitions of the multiple correlation coefficient.These two definitions correspond to the two formulas for R2in (18.3.6)

Partial Correlation Coefficients Now assume y =

y1 y2> is a vector withtwo elements and write

Trang 6

The partial correlation coefficient can be defined as the relative reduction in theMSE if one adds y2 toxas a predictor ofy1:

(19.1.8)

ρ212.x=MSE[constant term andx;y2] − MSE[constant term,x, andy1;y2]

MSE[constant term and x;y2] .

Problem256 Using the definitions in terms of MSE’s, show that the followingrelationship holds between the squares of multiple and partial correlation coefficients:

(19.1.11) ρ22(x,1)= ρ22(x)+ (1 − ρ22(x))ρ221.x

An alternative proof of (19.1.11) is given in [Gra76, pp 116/17]

Trang 7

Mixed cases: One can also form multiple correlations coefficients with some ofthe variables partialled out The dot notation used here is due to Yule, [Yul07] Thenotation, definition, and formula for the squared correlation coefficient is

ρ2y(x z= MSE[constant term andz;y] − MSE[constant term,z, andx;y]

MSE[constant term and z;y](19.1.12)

=ω

>

xy zΩ−xx.zωxy.z

ωyy.z(19.1.13)

19.2 Correlation Coefficients and the Associated Least Squares ProblemOne can define the correlation coefficients also as proportionate reductions inthe objective functions of the associated GLS problems However one must reversepredictor and predictand, i.e., one must look at predictions of a vector x by linearfunctions of a scalar y

Here it is done for multiple correlation coefficients: The value of the GLS tive function if one predictsxby the best linear predictorx∗, which is the minimumattainable when the scalar observation y is given and the vector x can be chosen

Trang 8

objec-freely, as long as it satisfies the constraint x = µ + ΩΩxxq for some q, is

On the other hand, the value of the GLS objective function when one predicts

xby the best constant x = µ is

y − ν

=

(19.2.3) = (y − ν)>ωyy−.x(y − ν)

The proportionate reduction in the objective function is

(19.2.4) SSE[y; x = µ] −SSE[y; best x]

Trang 9

19.3 Canonical CorrelationsNow what happens with the correlation coefficients if both predictor and predic-tand are vectors? In this case one has more than one correlation coefficient One firstfinds those two linear combinations of the two vectors which have highest correlation,then those which are uncorrelated with the first and have second highest correlation,and so on Here is the mathematical construction needed:

Letxandybe two column vectors consisting of p and q scalar random variables,respectively, and let

xy

] = σ2ΩΩxx Ωxy

Ωyx Ωyy

,

where ΩΩxx and ΩΩyyare nonsingular, and let r be the rank of ΩΩxy Then there existtwo separate transformations

] = σ2 Ip Λ

Λ> Iq

Trang 10

where Λ is a (usually rectangular) diagonal matrix with only r diagonal elementspositive, and the others zero, and where these diagonal elements are sorted in de-scending order.

Proof: One obtains the matrix Λ by a singular value decomposition of ΩΩ−1/2xx ΩxyΩ−1/2yy =

A, say Let A = P>ΛQ be its singular value decomposition with fully orthogonalmatrices, as in equation (A.9.8) Define L = P ΩΩ−1/2xx and M = QΩΩ−1/2yy Therefore

LΩΩxxL>= I, MΩΩyyM>= I, and LΩΩxyM>= P ΩΩ−1/2xx ΩxyΩ−1/2yy Q>= P AQ>=Λ

The next problems show how one gets from this the maximization property ofthe canonical correlation coefficients:

Problem 257 Show that for every p-vector l and q-vector m,

(19.3.4)

corr(l>x, m>y)

≤ λ1where λ1 is the first (and therefore biggest) diagonal element of Λ Equality in(19.3.4) holds if l = l1, the first row in L, and m = m1, the first row in M

Answer: If l or m is the null vector, then there is nothing to prove If neither ofthem is a null vector, then one can, without loss of generality, multiply them withappropriate scalars so that p = (L−1)>l and q = (M−1)>m satisfy p>p = 1 and

Trang 11

> o>

o> q>

uv

] = σ2p> o>

Since the matrix at the righthand side has ones in the diagonal, it is the correlation

matrix, i.e., p>Λq = corr(l>x, m>y) Therefore (19.3.4) follows from Problem258

Problem 258 IfP p2

i =P q2

i = 1, and λi≥ 0, show that |P piλiqi| ≤ max λi

Hint: first get an upper bound for |P piλiqi| through a Cauchy-Schwartz-type

Problem 259 Show that for every p-vector l and q-vector m such that l>xis

uncorrelated with l>1x, and m>y is uncorrelated with m>1y,

(19.3.6)

corr(l>x, m>y)

≤ λ2where λ2 is the second diagonal element of Λ Equality in (19.3.6) holds if l = l2,

the second row in L, and m = m2, the second row in M

Answer If l or m is the null vector, then there is nothing to prove If neither of them is a

null vector, then one can, without loss of generality, multiply them with appropriate scalars so that

Trang 12

p = (L−1)>l and q = (M−1)>m satisfy p>p = 1 and q>q = 1 Now write e 1 for the first unit

vector, which has a 1 as first component and zeros everywhere else:

(19.3.7) cov[l>x, l>1x] = cov[p>Lx, e>1Lx] = p>Λe 1 = p>e 1 λ 1

This covariance is zero iff p 1 = 0 Furthermore one also needs the following, directly from the proof

p> o>

o> q>

u v

] = σ 2

Since the matrix at the righthand side has ones in the diagonal, it is the correlation matrix, i.e.,

p > Λq = corr(l>x, m > y) Equation (19.3.6) follows from Problem 258 if one lets the subscript i

Problem 260 (Not eligible for in-class exams) Extra credit question for good

mathematicians: Reformulate the above treatment of canonical correlations without

the assumption that ΩΩxx and ΩΩyy are nonsingular

19.4 Some Remarks about the Sample Partial Correlation Coefficients

The definition of the partial sample correlation coefficients is analogous to that of

the partial population correlation coefficients: Given two data vectorsyand z, and

the matrix X (which includes a constant term), and let M = I −X(X>X)−1X>be

Trang 13

the “residual maker” with respect to X Then the squared partial sample correlation

is the squared simple correlation between the least squares residuals:

>My)2(z>Mz)(y>My)

Alternatively, one can define it as the proportionate reduction in theSSE Although

X is assumed to incorporate a constant term, I am giving it here separately, in order

to show the analogy with (19.1.8):

(19.4.2)

rzy2 .X= SSE[constant term and X;y] −SSE[constant term, X, andz;y]

SSE[constant term and X;y] .

[Gre97, p 248] considers it unintuitive that this can be computed usingt-statistics.Our approach explains why this is so First of all, note that the square of the t

statistic is theF-statistic Secondly, the formula for theF-statistic for the inclusion

ofz into the regression is

(19.4.3)

t2= F = SSE[constant term and X;y] −SSE[constant term, X, andz;y]

SSE[constant term, X, andz;y]/(n − k − 1)

Trang 14

This is very similar to the formula for the squared partial correlation coefficient.From (19.4.3) follows

(19.4.4) F + n − k − 1 = SSE[constant term and X;y](n − k − 1)

SSE[constant term, X, andz;y]and therefore

F + n − k − 1which is [Gre97, (6-29) on p 248]

It should also be noted here that [Gre97, (6-36) on p 254] is the sample alent of (19.1.11)

Trang 15

equiv-CHAPTER 20

Numerical Methods for computing OLS Estimates

20.1 QR DecompositionOne precise and fairly efficient method to compute the Least Squares estimates

is the QR decomposition It amounts to going over to an orthonormal basis in R[X]

It uses the following mathematical fact:

Every matrix X, which has full column rank, can be decomposed in the product

of two matrices QR, where Q has the same number of rows and columns as X, and

is “suborthogonal” or “incomplete orthogonal,” i.e., it satisfies Q>Q = I The otherfactor R is upper triangular and nonsingular

To construct the least squares estimates, make a QR decomposition of the matrix

of explanatory variables X (which is assumed to have full column rank) With

Trang 16

X = QR, the normal equations read

Problem261 2 points You have a QR-decomposition X = QR, where Q>Q =

I, and R is upper triangular and nonsingular For an estimate of V[βˆ] you need(X>X)−1 How can this be computed without computing X>X? And why wouldyou want to avoid computing X>X?

X>X = R>Q>QR = R>R, its inverse is therefore R−1R−1>

Trang 17

Problem 262 Compute the QR decomposition of

where q>1q1= q>2q2= q>3q3= 1 and q>1q2= q>1q3 = q>2q3 = 0 First column: x 1 = q1r 11 and

q1 must have unit length This gives q>1 =1/2 1/2 1/2 1/2and r 11 = 2 Second column: (20.1.8) x 2 = q1r 12 + q2r 22

Trang 18

q2 =−1/2 1/2 −1/2 1/2and r 22 = 4 The rest remains a homework problem But I am

Problem 263 2 points Compute trace and determinant of

Trang 19

above, we have now

To prove this, and also for the numerical procedure, we will build Q> as theproduct of several orthogonal matrices, each converting one column of X into onewith zeros below the diagonal

First note that for every vector v, the matrix I −v>2vvv> is orthogonal Given

X, let x be the first column of X If x = o, then go on to the next column

otherwise (Mathematically, either σ − +1 or σ = −1 would do; but if one gives σthe same sign as x , then the first element of v gets largest possible absolute value,

Trang 20

which improves numerical accuracy.) Then

v>v = (x211+ 2σx11

√

x>x + x>x) + x221+ · · · + x2n1(20.2.2)

= 2(x>x + σx11

√

x>x)(20.2.3)

v>x = x>x + σx11

√

x>x(20.2.4)

.0

> vv> This generates zeros below the diagonal Instead of writing

Trang 21

the zeros into that matrix, it uses the “free” space to store the vector v There isalmost enough room; the first nonzero element of v must be stored elsewhere This

is why the QR decomposition in Splus has two main components: qr is a matrixlike a, and qraux is a vector of length ncols(a)

LINPACK does not use or store exactly the same v as given here, but uses

u = v/(σ√

x>x) instead The normalization does not affect the resulting orthogonaltransformation; its advantage is that the leading element of each vector, that which

is stored in qraux, is at the same time equal u>u/2 In other words, qraux doubles

up as the divisor in the construction of the orthogonal matrices

In Splus type help(qr) At the end of the help file a program is given whichshows how the Q might be constructed from the fragments qr and qraux

Trang 19

above, we have now

To prove this, and also... generates zeros below the diagonal Instead of writing

Trang 21

the zeros into that matrix, it uses the... nonsingular

19.4 Some Remarks about the Sample Partial Correlation Coefficients

The definition of the partial sample correlation coefficients is analogous to that of

the partial

Định dạng
Số trang	21
Dung lượng	362,92 KB