Joint Moments, Covariance, and Correlation

Một phần của tài liệu Mathematical statistics for economics and business (second edition) part 1 (Trang 178 - 186)

10. Hypothesis Testing Methods and Confidence Regions 609

3.10 Joint Moments, Covariance, and Correlation

In the case of multivariate random variables, the concept of joint moments becomes relevant. The formal definitions of joint moments about the origin and about the mean are as follows:

Definition 3.22 Joint Moment About the Origin

LetX and Y be two random variables having joint density function f(x,y).

Then the (r,s)th joint moment of (X,Y) (or off(x,y)) about the origin is defined by

m0r;sẳ X X

x2RðXịy2RðYị

xrysf xð ;yị ðdiscreteị

m0r;sẳ Z 1

1

Z 1

1xrysf x;ð yịdxdy ðcontinuousị

Definition 3.23 Joint Moments About the Mean (or Central Joint Moment)

Let X and Y be two random variables having joint density function f(x,y).

Then the (r,s)th joint moment of (X,Y) (or off(x,y)) about the mean is defined by

mr;sẳ X X

x2RðXịy2RðYị

xEðXị

ð ịrðyEðYịịsf x;ð yị ðdiscreteị

mr;sẳ Z 1

1

Z 1

1ðxEðXịịrðyEðYịịsf xð ;yịdxdy ðcontinuousị

3.10.1 Covariance and Correlation

Regarding joint moments, our immediate interest is on a particular joint moment about the mean, m1,1, and the relationship between this moment and moments about the origin. The central momentm1,1is given a special name and symbol, and we will see thatm1,1is useful as a measure of “linear association”

betweenXandY.

Definition 3.24

Covariance The central joint moment m1,1ẳE(XE(X))(YE(Y)) is called the covari- ance betweenXandY, and is denoted by the symbolsXY, or by cov(X,Y).

Note that there is a simple relationship betweensXYand moments about the origin that can be used for the calculation of the covariance.

Theorem 3.30 Covariance in Terms of Moments About the Origin

sXY ẳEðXYị EðXịEðYị:

Proof This result follows directly from the properties of the expectation operation. In particular, by definition

sXY ẳE Xðð EðXịịðYEðYịịị ẳEðXYðEðXịịYXEðYị ỵEðXịEðYịị

ẳEðXYị EðXịEðYị n

Example 3.25 Covariance Calculation

Let the bivariate random variable (X,Y) have a joint density function f(x,y)

ẳ(xỵy)I[0,1](x)I[0,1](y). Find cov(X,Y).

Answer: Note that EðXYị ẳ

Z 1 0

Z 1 0

xyðxỵyịdxdyẳ Z 1

0

Z 1 0

x2yþxy2

dxdyẳ1 3 EðXị ẳ

Z 1 0

Z 1 0

x xð ỵyịdxdyẳ Z 1

0

Z 1 0

x2þxy

dxdyẳ 7 12 EðYị ẳ

Z 1 0

Z 1 0

y xð ỵyịdxdyẳ Z 1

0

Z 1 0

yxþy2

dxdyẳ 7 12:

Then, by Theorem 3.30, cov(X,Y)ẳ1/3(7/12)(7/12)ẳ(1/144). □ 150 Chapter 3 Expectations and Moments of Random Variables

A useful corollary to Theorem 3.30 is thatthe expectation of a product of two random variables is the product of the expectations iff sXYẳ0, formally stated as follows.

Corollary 3.5 Expectation of Product Equals Product of Expectations

EðXYị ẳEðXịEðYịiff sXY ẳ0:

Proof This follows directly from Theorem 3.30 upon settingsXYto zero (sufficiency) or

setting E(XY) equal to E(X) E(Y) (necessity). n

What doessXYmeasure? The covariance is a measure of thelinear associa- tionbetween two random variables, where the precise meaning of linear associ- ation will be made clear shortly. Our discussion will be facilitated by observing that the value ofsXYexhibits a definite upper bound in absolute value which is expressible as a function of the variances of the two random variables involved.

The bound onsXYfollows from the following inequality.

Theorem 3.31 Cauchy-Schwarz Inequality

ðEðWZịị2EðW2ịEðZ2ị

Proof The quantity E((l1W+ l2Z)2) must be greater than or equal to 0 8(l1, l2) since (l1W+l2Z)2is a random variable having only non-negative outcomes. Thus l21E(W2) +l22E(Z2) + 2l1l2E(WZ) 08(l1,l2), which in matrix terms can be represented as

l1 l2

ẵ EW2

EðWZị EðWZị E Z2

l1

l2

" #

08ðl1;l2ị:

The last inequality is precisely the defining property of positive semidefi- niteness for the (22) matrix in brackets,15and the matrix in brackets will be positive semidefinite iff E(W2)E(Z2)(E(WZ))2 0 (see the Appendix

Section3.12). n

The covariance bound we seek is stated in the following theorem.

Theorem 3.32

Covariance Bound jsXYj sXsY:

Proof LetWẳ(XE(X)) andZẳ(YE(Y)) in the Cauchy-Schwarz inequality. Then EððXEðXịịðYEðYịịị2

EððXEðXịị2ịEððYEðYịị2ị, or equivalently,s2XY

s2X s2Y which holdsiff|sXY | sXsY. n

15Recall that a matrixAis positive semidefiniteifft0At08t, andAis positive definiteifft0At>08t6ẳ0.

Thus, the covariance betweenXandYis upper-bounded in absolute value by the product of the standard deviations of X and Y. Using this bound, we can define a useful scaled version of the covariance, called thecorrelationbetweenX andY, as follows.

Definition 3.25

Correlation The correlation between two random variables X and Y is defined by corrðX;Yị ẳrXY ẳssXXYsY.

Example 3.26 Correlation Calculation

Refer to Example 3.25. Note that

E X2 ẳ Z 1

0

Z 1 0

x2ðxỵyịdxdyẳ 5 12 E Y2 ẳ

Z 1 0

Z 1 0

y2ðxỵyịdxdyẳ 5 12; so that

s2XẳE X2 ðEðXịị2ẳ5=12 ð7=12ị2ẳ11=144; and

s2YẳE Y2 ðEðYịị2ẳ5=12ð7=12ị2ẳ11=144: Then the correlation betweenXandYis given by rXY ẳ sXY

sXsYẳ 1=144

ð11=144ị1=2ð11=144ị1=2ẳ1

11: □

Bounds on the correlation betweenXandYfollow directly from the bounds on the covariance betweenXandY.

Theorem 3.33

Correlation Bound 1rXY1:

Proof This follows directly from Theorem 3.32 via division bysxsy. n The covariance equals its upper bound value of sX sY iff the correlation equals its upper bound value of 1, and the covariance equals its lower bound value ofsXsYiffthe correlation equals its lower bound value of1.

Assuming that the covariance exists, anecessarycondition for the indepen- dence ofXandYis thatsXYẳ0 (or equivalently, thatrXYẳ0 ifsXsY6ẳ0).

Theorem 3.34 Relationship Between Independence and Covariance

If X and Y are independent,thensXYẳ0 (assuming the covariance exists).

152 Chapter 3 Expectations and Moments of Random Variables

Proof IfXandYare independent, thenf(x,y)ẳfX(x)fY(y). It follows that sXY ẳ

Z 1

1

Z 1

1ðxEðXịịðyEðYịịdF xð ;yị

ẳ Z 1

1ðxEðXịịdFXðxị Z 1

1ðyEðYịịdFYðyị

ẳðEðXị EðXịịðEðYị EðYịị ẳ00ẳ0: n The converse of Theorem 3.34 is not true—there can be dependence between XandY, evenfunctionaldependence, and the covariance betweenXandYcould nonetheless be zero, as the following example illustrates.

Example 3.27 Bivariate Function Dependence with sXYẳ0

Let Xand Ybe two random variables having a joint density function given by f(x,y)ẳ1.5 I[1,1](x) Iẵ0;x2(y). Note this density implies that (x,y) points are equally likely to occur on and below the parabola represented by the graph of yẳx2. There is a direct functional dependence betweenXand the range of Y, so thatf(y |x) will change as x changes and thusXand Ymust be dependent random variables. Nonetheless,sXYẳ0. To see this, note that

EðXYị ẳ1:5 Z 1

1

Z x2 0

xydydxẳ1:5 Z 1

1ð1=2ịx5dxẳ:75x6 6

1

1

ẳ0;

EðXị ẳ1:5 Z 1

1

Z x2

0

xdydxẳ1:5 Z 1

1

x3dxẳ1:5x4 4

1

1

ẳ0;

EðYị ẳ1:5 Z 1

1

Z x2 0

ydydxẳ1:5 Z 1

1

1=2

ð ịx4dxẳ:75x5 5

1

1

ẳ:3:

Therefore,sXYẳE(XY)E(X)E(Y)ẳ00(.3)ẳ0. □ 3.10.2 Correlation, Linear Association and Degeneracy

We now demonstrate that when the covariance takes its maximum absolute value, and thusrXYẳ+1 or1, there is a perfect positive (rXYẳ+1) or negative (rXYẳ 1)linearrelationship betweenXandYthat holds with probability one (i.e., P(y ẳa+ bx)ẳ1 orP(yẳabx)ẳ1). The demonstration is facilitated by the following useful result.

Theorem 3.35 Degeneracy whens2= 0

Let Z be a Random Variable for whichs2Zẳ0.Then P(zẳE(Z))ẳ1

Proof Letg(Z)ẳ(ZE(Z))2. Then

PðEðZị a<z<EðZị ỵaị ẳPðzEðZị ị2<a2

ẳ1PðzEðZị ị2a2

1s2Z=a2;

where the inequality is established using Markov’s inequality.If s2Z ẳ0, then P(E(Z)a<z <E(Z) +a) ẳ1 8 a>0, and since only zẳE(Z) satisfies the inequality8a >0,P(z ẳE(Z)) ẳ1 whens2Zẳ0. n The result on the linear relationship betweenXandYwhenrXYẳ+1 or1, or equivalently, whensXYachieves its upper and lower bound, is as follows.

Theorem 3.36 Correlation Bounds and Linearity

If rXYẳ+1 or 1, then P(y ẳa1+ bx) ẳ1 or P(yẳa2bx)ẳ1, respectively, where a1ẳE(Y)(sY/sX)E(X),a2 ẳE(Y) +(sY/sX)E(X),and bẳ(sY/sX).

Proof Define Zẳl1 (XE(X)) +l2 (YE(Y)), and note that E(Z) ẳ0. It follows immediately that s2Z ẳ E(Z2)ẳE((l1(XE(X)) +l2(YE(Y)))2) ẳl21 E((X E(X))2) +l22E((YE(Y))2) + 2l1l2sXY08l1,l2, which can be represented in matrix terms as

s2Zẳ ẵl1 l2 s2X sXY

sXY s2Y

l1

l2

" #

0 8ðl1;l2ị:

IfrXYẳ+1 or1, thensXYachieves either its (nominal) upper or lower bound, respectively, or equivalently,s2XY ẳs2Xs2Y. It follows that the above 22 matrix is singular, since its determinant would be zero. Then the columns of the matrix are linearly dependent, so that there exist nonzero values ofl1andl2such that

s2X sXY

sXY s2Y

l1

l2

" #

ẳ 0 0

" #

and for thesel-values, the quadratic form above, and thuss2Z, achieves the value 0.

A solution forl1andl2is given byl1ẳsXY/s2Xandl2 ẳ 1 which can be validated by substituting these values forl1andl2in the linear equations, and noting that s2Yẳs2XY=s2Xunder the prevailing assumptions. Sinces2Zẳ0 at these values ofl1

andl2, it follows from Theorem 3.35 thatP(zẳ0)ẳ1 (recall E(Z)ẳ0).

Given the definition of Z, substituting the above solution values for l1 and l2 obtains an equivalent probability statement P(y ẳ(E(Y)(sXY/s2X) E(X)) + (sXY/s2X)x) ẳ1. IfrXYẳ+1, thensXYẳsXsY, yieldingP(y ẳa1+ bx)ẳ1 in the statement of the theorem, while ifrXYẳ 1, thensXYẳ sXsY, yielding P(yẳa2bx)ẳ1 in the statement of the theorem. n The theorem implies that when rXYẳ+1 (or 1), the event that the out- come of (X,Y) is on a straight line with positive (or negative) slope occurs with probability 1. As a diagrammatic illustration, if (X,Y) is a discrete bivariate random variable, then the situation where rXYẳ+1 would be exemplified 154 Chapter 3 Expectations and Moments of Random Variables

by Figure3.10. Note in Figure3.10thatf(x,y) assumes positive values only for points along the lineyẳa+bx, reflecting the fact thatP(y ẳa+ bx)ẳ1. This situation illustrates what is known as a degenerate random variable and a degenerate density function. The defining characteristic of a degenerate random variable is that it is ann-variate random variable (X1,. . .,Xn) whose components satisfy one or more linear functional relationships with probability one, i.e., ifP (ai+ Pn

jẳ1bijxjẳ0) ẳ1 for iẳ1,. . .,m, then (X1,. . .,Xn) is a degenerate random variable.16A characteristic of the accompanying degenerate density function for (X1,. . .,Xn) is that the entire mass of probability (a mass of 1) is concentrated on a collection of points that lie on a hyperplane of dimension less than n, the hyperplane being defined by the collection of linear functional relationships.

Degeneracy causes no particular difficulty in the discrete case—probabilities of events for the degenerate random variable (X1,. . .,Xn) can be calculated in the usual way by summing the degenerate density function over the outcomes in the event of interest. However, degeneracy in the continuous case results inf(x1,. . ., xn) not being a density function according to our original definition of the concept. For a heuristic description of the problem, examine the diagram- matic illustration in Figure3.11for a degenerate bivariate random variable in the continuous case. Intuitively, because there is no volume under the graph of f(x,y),xÐ2

x1

Ð

y2

y1

f(x,y)dy dxẳ08x1 x2and8y1y2, andf(x,y) cannot be integrated in the usual way to assign probabilities to events for (X,Y). However, thereis area below the graph of f(x,y) and above the line yẳa+bx representing the probability mass of 1 distributed over a segment (or perhaps, all) of this line.

Since only subsets of the set {(x,y):y ẳa+ bx, x∈R(X)}17 are assigned nonzero

f(x,y)

x

y = a + bx y

Figure 3.10 rXYẳ+1, Discrete case.

16The concept of degeneracy can be extended by calling (X1,. . .,Xn) degenerate if the components satisfy one or more functional relationships (not necessarily linear) with probability 1. We will not examine this generalization here.

17Equivalently, {(x,y):xẳb–1(ya),y∈R(Y)}.

probability, the degenerate density function can be used to assign probabilities to events by use ofline integrals,18which essentially integratef(x,y) over subsets of points along the lineyẳa +bx. The general concept of line integrals is beyond the scope of our study, but in essence, the relevant integral in the current context is of the form R

x2Af(x,a +bx)dx. Note the linear relationship linkingyandxis explicitly accounted for by substitutinga+ bxforyinf(x,y), which convertsf(x, y) into a function of the single variablex. Then the function ofxis integrated over the points in the eventA forx, which determines the probability of the eventBẳ{(x,y):yẳa+bx,x∈A} for the bivariate random variable (X,Y).

Having introduced the concept of degeneracy, we can alternatively charac- terizerXYẳ+1 or1 as a case where the bivariate random variable (X,Y), and its accompanying joint density function, are degenerate, withXandYsatisfying, respectively, a positively or negatively sloped linear functional relationship, with probability one. What can be said about the relationship betweenXandY when |rxy| <1? The closer |rXY| is to one, the closer the relationship betweenX andYis to being linear, where “closeness” can be interpreted as follows. Define the random variableY^ẳa+bXto represent predictions ofYoutcomes based on a linear function ofX. We will choose the coefficientsaandbso thatY^ is thebest linearprediction ofY, wherebestis taken to mean “minimum expected squared distance between outcomes ofYand outcomes ofY.”^

Theorem 3.37 Best Linear Prediction of Y Outcomes

Let(X,Y)have moments of at least the second order,and letYẳa + bX. Then the^ choices of a and b that minimizeE(d2(Y,Y))^ ẳE((Y(a +b(X)))2)are given by a ẳE(Y) (sXY/s2X)E(X)and bẳ(sXY/s2X).

f(x,y)

x

y = a + bx y

Figure 3.11 rXYẳ+1, Continuous case.

18For an introduction to the concept of line integrals, see E. Kreyzig (1979)Advanced Engineering Mathematics, 4th ed. New York:

Wiley, Chapter 9.

156 Chapter 3 Expectations and Moments of Random Variables

Proof Left to the reader. n Now defineVẳYY^ to represent the deviations between outcomes ofY and outcomes of thebestlinear prediction ofYoutcomes as defined in Theorem 3.37. Because E(Y)ẳE(Y), E(V)^ ẳ0. It follows that

s2Y ẳEðYEðYịị2

ẳE Y^E Y^ ỵV

2

ẳs2Y^ ỵs2VỵsYV^ ; where

s2VẳE V2 ẳEd2Y;Y^

ẳEd2ðY;aỵbXị

ẳs2Ys2XY=s2Xẳs2Y 1r2XY

; s2Y^ ẳEY^E Y^ 2

ẳs2Yr2XY; sYV^ ẳE Y^E Y^

V

ẳ sXY=s2X

EððXEðXịịVị ẳ0:

Thus, the variance of Yis decomposed into a proportion r2XY due to Y^ and a proportion (1r2XY) due toV, i.e.,s2Y ẳs2Y^ỵs2Vẳs2Y r2XY ỵs2Y 1r2XY

:

We can now interpret values ofrXY∈(1, 1). Specifically,r2XY is the propor- tion of the variance inY that is explained by the best linear prediction of the formY^ ẳa +bX, and the proportion of the variance unexplained is (1 r2XY).

Relatedly, s2Y 1r2XY

is precisely the expected squared distance between outcomes ofYand outcomes of the best linear predictionYẳa^ +bX. Thus, the closer |rXY| is to 1, the more the variance inYis explained by the linear function a+ bX, and the smaller is the expected squared distance betweenYandYẳa^ + bX. It is in this sense that the higher the value of |rXY|, the closer is the linear association betweenYandX.

If rXYẳ0, the random variables are said to be uncorrelated. In this case, Theorem 3.37 indicates that the best linear predictor is E(Y)—there is effectively no linear association withXwhatsoever. The reader should note thatYandXcan be interchanged in the preceding argument, leading to an analogous interpreta- tion of the degree of linear association betweenXandXẳa^ +bY(for appropriate changes in the definitions ofaandb).

Một phần của tài liệu Mathematical statistics for economics and business (second edition) part 1 (Trang 178 - 186)

Tải bản đầy đủ (PDF)

(388 trang)