Random variables X1 and X2are independent if and only if the charac-teristic function of the bivariate random variable X1, X2 is equal to the product of the characteristic functions of X
Trang 1Random variables X1and X2are said to be independent if the relation
P (X1 S1, X2 S2) = P (X1 S1) P (X2 S2) (20.2.5.7)
holds for any measurable sets S1and S2
THEOREM1 Random variables X1and X2are independent if and only if
F X1 ,X2(x1, x2) = F X1(x1)F X2(x2)
THEOREM2 Random variables X1 and X2are independent if and only if the
charac-teristic function of the bivariate random variable (X1, X2) is equal to the product of the
characteristic functions of X1and X2,
f X1 ,X2(x1, x2) = f X1(x1)f X2(x2)
20.2.5-4 Numerical characteristics of bivariate random variables
The expectation of a function g(X1, X2) of a bivariate random variable (X1, X2) is defined
as the expression computed by the formula
E{g (X1, X2)}=
⎧
⎪
⎨
⎪
⎩
i
j
g (x1i , x2j )p ij in the discrete case,
–∞
–∞ g (x1, x2)p(x1, x2) dx1dx2 in the continuous case,
(20.2.5.8)
if these expressions exist in the sense of absolute convergence; otherwise, one says that
E{g (X1, X2)}does not exist
The moment of order r1+ r2 of a two-dimensional random variable (X1, X2) about a
point (a1, a2) is defined as the expectation E{(X1– a1)r1(X2– a2)r2}
If a1 = a2=0, then the moment of order r1+ r2of a two-dimensional random variable
(X1, X2) is called simply the moment, or the initial moment The initial moment of order
r1+ r2is usually denoted by α r1 ,r2; i.e., α r1,r2 = E{X r1
1 X2r2}
The first initial moments are the expectations of the random variables X1and X2; i.e.,
α1 , 0= E{X1
1X20}= E{X1}and α0,1= E{X0
1X21}= E{X2} The point (E{X1}, E{X2})
on the OXY -plane characterizes the position of the random point (X1, X2); this position
spreads about the point (E{X1}, E{X2}) Obviously, the first central moments are zero The second initial moments are given by the formulas
α2 , 0= α2(X1), α0 , 2= α2(X2), α1 , 1 = E{X1X2}
If a1 = E{X1} and a2 = E{X2}, then the moment of order r1+ r2 of the bivariate
random variable (X1, X2) is called the central moment The central moment of order r1+ r2
is usually denoted by μ r1,r2; i.e., μ r1,r2 = E{(X1– E{X1})r1(X2– E{X2})r2}
The second central moments are of special interest and have special names and notation:
λ11= μ2 , 0= Var{X1}, λ22= μ0 , 2= Var{X2},
λ12= λ21= μ1,1= E{(X1– E{X1})(X2– E{X2})} The first two of these moments are the variances of the respective random variables, and
the third moment is called the covariance and will be considered below.
Trang 220.2.5-5 Covariance and correlation of two random variables.
The covariance (correlation moment, or mixed second moment) Cov(X1, X2) of random
variables X1and X2is defined as the central moment of order (1+1):
Cov(X1, X2) = α1,1= E{(X1– E{X1})(X2– E{X2})} (20.2.5.9) Properties of the covariance:
1 Cov(X1, X2) = Cov(X2, X1)
2 Cov(X, X) = Var{X}
3 If the random variables X1 and X2 are independent, then Cov(X1, X2) = 0 If
Cov(X1, X2)≠ 0, then the random variables X1and X2are dependent
4 If Y1= a1X1+ b1and Y2= a2X2+ b2, then Cov(Y1, Y2) = a1a2Cov(X1, X2)
5 Cov(X1, X2) = E{X1X2}– E{X1}E{X2}
6 |Cov(X1, X2)| ≤√Var{X1}Var{X2} Moreover, Cov(X1, X2) = √
Var{X1}Var{X2}
if and only if the random variables X1and X2are linearly dependent
7 Var{X1+ X2}= Var{X1}+ Var{X2}+2Cov(X1, X2)
If Cov(X1, X2) =0, then the random variables X1and X2are said to be uncorrelated;
if Cov(X1, X2) ≠ 0, then they are correlated Independent random variables are always
uncorrelated, but correlated random variables are not necessarily independent in general
Example 1 Suppose that we throw two dice Let X1 be the number of spots on top of the first die, and
let X2be the number of spots on top of the second die We consider the random variables Y1= X1+ X2 and
Y2= X1– X2 (the sum and difference of points obtained) Then
Cov(Y1, Y2) = E{(X1+ X2– E{X1+ X2})(X1– X2– E{X1– X2} ) } =
= E{(X1– E{X1} )2– (X2– E{X2} )2} = Var {X1} – Var {X2} = 0 ,
since X1 and X2 are identically distributed and hence Var {X1} = Var {X2} But Y1 and Y2 are obviously
dependent; for example, if Y1= 2then one necessarily has Y2= 0
The covariance of random variables X1 and X2characterizes both the degree of their
dependence on each other and their spread around the point (E{X1}, E{X2}) The
covari-ance of X1 and X2 has the dimension equal to the product of dimensions of X1 and X2
Along with the covariance of random variables X1and X2, one often uses the correlation
ρ (X1, X2), which is a dimensionless normalized variable The correlation (or correlation coefficient) of random variables X1 and X2is the ratio of the covariance of X1 and X2to the product of their standard deviations,
ρ (X1, X2) = Cov(X1, X2)
σ X1σ X2
The correlation of random variables X1and X2indicates the degree of linear dependence
between the variables If ρ(X1, X2) =0, then there is no linear relation between the random variables, but there may well be some different relation between them
Properties of the correlation:
1 ρ(X1, X2) = ρ(X2, X1)
2 ρ(X, X) =1
3 If random variables X1and X2are independent, then ρ(X1, X2) =0 If ρ(X1, X2)≠ 0,
then the random variables X1and X2are dependent
4 If Y1= a1X1+ b1and Y2= a2X2+ b2, then ρ(X1, X2) = ρ(X1, X2).
5 |ρ (X1, X2)| ≤ 1 Moreover, ρ(X1, X2) = 1if and only if the random variables X1and
X2are linearly dependent.
Trang 320.2.5-6 Conditional distributions.
The joint distribution of random variables X1and X2determines the conditional distribution
of one of the random variables given that the other random variable takes a certain value (or lies in a certain interval) If the joint distribution is discrete, then the conditional
distributions of X1and X2are also discrete The conditional distributions are described by the formulas
P1|2(x1i|x2j ) = P (X1= x1i|X2= x2j) = P (X1P = x (X1i , X2= x2j
2= x2j =
p ij
P X2 ,j,
P2|1(x2j|x1i ) = P (X2= x2j|X1 = x1i) = P (X1P = x (X1i , X2= x2j
1= x1i) =
p ij
P X1 ,i,
i=1, , m; j =1, , n.
(20.2.5.11)
The probabilities P1|2(x1i|x2j ), j = 1, , n, define the conditional probability mass function of the random variable X2 given X1 = x1i ; and the probabilities P2|1(x2j|x1i),
i=1, , m, define the conditional probability mass function of the random variable X1 given X2 = x2j These conditional probability mass functions have the properties of usual probability mass functions; for example, the sum of probabilities in each of them is equal
i
P1|2(x1i|x2j) =
j
P2|1(x2j|x1i) =1
If the joint distribution is continuous, then the conditional distributions of the random
variables X1and X2 are also continuous and are described by the conditional probability density functions
p1|2(x1|x2) = p X1,X2(x1, x2)
p X2(x2) , p2|1(x2|x1) = p X1,X2(x1, x2)
p X1(x1) . (20.2.5.12)
The conditional distributions of the random variables X1and X2can also be described
by the conditional cumulative distribution functions
F X2(x2|X1= x1) = P (X2 < x2|X1 = x1),
F X1(x1|X2= x2) = P (X1 < x1|X2 = x2) (20.2.5.13) The total probability formulas for the cumulative distribution functions of continuous random variables have the form
F X2(x2) =
–∞ F X2(x2|X1 = x1)p X1(x1)dx1,
F X1(x1) =
–∞ F X1(x1|X2 = x2)p X2(x2)dx2
(20.2.5.14)
THEOREM ON MULTIPLICATION OF DENSITIES The joint probability function for two random variables is equal to the product of the probability density function of one random variable by the conditional probability density function of the other random variable, given the value of the first random variable:
p X1 ,X2(x1, x2) = p X2(x2)p1|2(x1x2) = p X1(x1)p2|1(x2|x1). (20.2.5.15)
Trang 4Bayes’ formulas:
P1|2(x1i|x2j) = p X1(x1i )p2|1(x2j|x1i)
i P (X1= x1i ) P2|1(x2j x1i)
,
P2|1(x2j|x1i) = p X2(x2j )p1|2(x1i|x2j
j P (X2= x2j )p1|2(x1i|x2j ;
(20.2.5.16)
p1|2(x1|x2) = 7+∞ p X1(x1)p2|1(x2|x1)
–∞ p X1(x1)p2|1(x2|x1) dx1,
p2|1(x2|x1) = 7+∞ p X2(x2)p1|2(x1|x2)
–∞ p X2(x2)p1|2(x1|x2) dx2.
(20.2.5.17)
20.2.5-7 Conditional expectation Regression
The conditional expectation of a discrete random variable X2, given X1= x1(where x1is a
possible value of the random variable X1), is defined to be the sum of products of possible
values of X2by their conditional probabilities,
E{X2|X1= x1}=
j
x2j p2|1(x2j|x1) (20.2.5.18) For continuous random variables,
E{(X2|X1 = x1}=
–∞ x2p2|1(x2|x1) dx2. (20.2.5.19) Properties of the conditional expectation:
1 If random variables X and Y are independent, then their conditional expectations coincide with the unconditional expectations; i.e., E{Y|X = x}= E{Y}and E{X|Y =
y}= E{X}
2 E{f (X)h(Y )|X = x}= f (x)E{h (Y )|X = x}
3 Additivity of the conditional expectation:
E{Y1+ Y2|X}= E{Y1|X = x}+ E{Y2|X = x}
A function g2(X1) is called the best mean-square approximation to a random variable X2
if the expectation E{[X2– g2(X1)]2}takes the least possible value; the function g2(x1) is
called the mean-square regression of X2on X1
The conditional expectation E{X2|X1}is a function of X1,
E{X2|X1}= g2(X1) (20.2.5.20)
It is called the regression function of X2 on X1 and is the mean-square regression of X2
on X1
In a majority of cases, it suffices to approximate the regression (20.2.5.20) by the linear function
2g2(X1) = α + β21X1= E{X2}+ β21(X1– E{X1})
Here the coefficient β21 = ρ12σ X2/σ X1 is called the regression coefficient of X2 on X1 (ρ12= ρ(X1, X2)) The number σ X22(1– ρ212) is called the residual standard deviation of the random variable X2with respect to the random variable X1; this number characterizes the
error arising if X2is replaced by the linear function g2(X1) = α + β21X1.
Trang 5Remark 1 The regression (20.2.5.20) can be approximated more precisely by a polynomial of degree
k> 1(parabolic regression of order k) or some other nonlinear functions (exponential regression, logarithmic
regression, etc.).
Remark 2. If X2is taken for the independent variable, then we obtain the mean-square regression
E{X1|X2}= g1(X2)
of X1on X2 and the linear regression
2g1(X2) = E{X1}+ β12(X2– E{X2} ), β12= ρ12σ X1
σ X2
of X1on X2.
Remark 3. All regression lines pass through the point (E{X1}, E{X2} ).
20.2.5-8 Distribution function of multivariate random variable
The probability P (X1< x1, , X n < x n ) treated as a function of a point x = (x1, , x n)
of the n-dimensional space and denoted by
FX(x) = F (x) = P (X1 < x1, , X n < x n) (20.2.5.21)
is called the multiple (or joint) distribution function of the n-dimensional random vector
X = (X1, , X n)
Properties of the joint distribution function of a random vector X:
1 F (x) is a nondecreasing function in each of the arguments.
2 If at least one of the arguments x1, , x n is equal to –∞, then the joint distribution
function is equal to zero
3 The m-dimensional distribution function of the subsystem of m < n random variables
X1, , X mcan be determined if the arguments corresponding to the remaining random
variables X m+1, , X nare set to +∞,
F X1 , ,X m (x1, , x m ) = FX(x1, , x m, –∞, , +∞).
(The m-dimensional distribution function F X1, ,X m (x1, , x m) is usually called the
marginal distribution function.)
4 The function FX (x) is left continuous in each of the arguments.
An n-dimensional random variable X is said to be discrete if each of the random
variables X1, X2, , X n is discrete The distribution of a subsystem X1, , X m of random variables and the conditional distributions are defined as in Paragraphs 20.2.5-6 and 20.2.5-7
An n-dimensional random variable X is said to be continuous if its distribution function
F(x) can be written in the form
F(x) =
–∞<y1 <x1
.
–∞<y n x n
p (y) dy, (20.2.5.22)
where dy = dy1 dy n and the function p(x), called the multiple (or joint) probability function of the random variables X1, , X n, is piecewise continuous The joint probability function can be expressed via the joint distribution function by the formula
p(x) = ∂ n FX (x)
∂x1 ∂x n; (20.2.5.23)
Trang 6i.e., the joint probability function is the nth mixed partial derivative (one differentiation in
each of the arguments) of the joint distribution function
Formulas (20.2.5.22) and (20.2.5.23) establish a one-to-one correspondence (up to sets of probability zero) between the joint probability functions and the joint distribution
functions of continuous multivariate random variables The differential p(x) dx is called a
probability element The joint probability function of n random variables X1, X2, , X n has the same properties as the joint probability function of two random variables X1and X2
(see Paragraph 20.2.1-4.) The marginal probability functions and conditional probability
functions obtained from a continuous n-dimensional probability distribution are defined
precisely as in Paragraphs 20.2.1-4 and 20.2.1-8
Remark 1. The distribution of a system of two or more multivariate random variables X1= (X11, X12, )
and X2= (X21, X22, ) is the joint distribution of all variables X11, X12, ; X21, X22, ;
Remark 2 A joint distribution can be discrete in some random variables and continuous in the other random variables.
20.2.5-9 Numerical characteristics of multivariate random variables
The expectation of a function g(X) of a multivariate random variable X is defined by the
formula
E{g(X)}=
⎧
⎪
⎨
⎪
⎩
i1
.
i n
g (x1i1, , x2i n )p i1i2 i n in the discrete case,
–∞ .
–∞ g (x)p(x) dx in the continuous case
(20.2.5.24)
if these expressions exist in the sense of absolute convergence; otherwise, one says that
E{g(X)}does not exist
The moment of order r1+· · · + r n of a random variable X about a point (a1, , a n) is
defined as the expectation E{(X1– a1)r1 (X n – a n)r n}
For a1 =· · · = a n=0, the moment of order r1+· · · + r n of an n-dimensional random
variable X is called the initial moment and is denoted by
α r1 r n = E{X r1
1 X n r n}
The first initial moments are the expectations of the coordinates X1, , X n The point
(E{X1}, , E{X n}) in the space Rn characterizes the position of the random point
(X1, , X n ), which spreads about the point (E{X1}, , E{X n}) The first central moments are naturally zero
If a1 = E{X1}, , a n = E{X n}, then the moment of order r1+ · · · + r n of the
n -dimensional random variable X is called the central moment and is denoted by
μ r1 r n = E5
(X1– E{X1})r1 (X n – E{X n})r n6
The second central moments have the following notation:
λ ij = λ ji = E5
(X i – E{X i})(X j – E{X j})6
=
Var{X i}= σ i2 for i = j, Cov(X i , X j for i≠j (20.2.5.25)
The moments λ ij given by relation (20.2.5.25) determine the covariance matrix (matrix
of moments) [λ ij] Obviously, the covariance matrix is real and symmetric; its
determi-nant det[λ ij ] is called the generalized variance of the n-dimensional distribution The
correlations
ρ ij = ρ(X i , X j) = Cov(X σ i , X j
X i σ X j
= λ ij
λ ii λ jj (i, j =1,2, , n) (20.2.5.26)
Trang 7determine the correlation matrix [ρ ij ] of the n-dimensional distribution provided that all
variances Var{X i}are nonzero Obviously, the correlation matrix is real and symmetric The quantity
det[ρ ij] is called the spread coefficient
20.2.5-10 Regression
A function g1(X2, , X n ) is called the best mean-square approximation to a random variable X1 if the expectation E{[X1– g1(X2, , X n)]2}takes the least possible value
The function g1(x2, , x n ) is called the mean-square regression of X1on X2, , X n
The conditional expectation E{X1|X2, , X n}is a function of X2, , X n,
E{X1|X2, , X n}= g1(X2, , X n) (20.2.5.27)
It is called the regression function of X1on X2, , X nand is the mean-square regression
of X1on X2, , X n
In a majority of cases, it suffices to approximate the regression (20.2.5.27) by the linear function
2g i = E{X i}+
j≠i
β ij (X j – E{X j}) (20.2.5.28)
Relation (20.2.5.28) determines the linear regression of X i on the other n –1variables
The regression coefficients β ij are determined by the relation
β ij = –Λij
Λii, whereΛij are the entries of the inverse of the covariance matrix The measure of correlation
between X i and the other n –1variables is the multiple correlation coefficient
ρ (X i 2g i) = 1– 1
λ iiΛii.
The residual of X i with respect to the other n –1 variables is defined as the random variableΔi = X i–2g i It satisfies the relations
Cov(Δi , X j) =0 for i≠j,
Var{Δi} for i = j (residual variance).
20.2.5-11 Characteristic functions
The characteristic function of a random variable X is defined as the expectation of the
random variable exp
in j=1t j X j
:
fX(t) = f (t) = E
exp
in j=1t j X j
4
where t = (t1, , t n ), i is the imaginary unit, i2= –1
...Λii, whereΛij are the entries of the inverse of the covariance matrix The measure of correlation
between X i and the other n –1variables is the multiple... Characteristic functions
The characteristic function of a random variable X is defined as the expectation of the
random variable exp
in j=1t...
It is called the regression function of X1on X2, , X nand is the mean-square regression
of X1on X2,