In the next section we consider the concept of a random vector and its joint distribution and density functions in direct analogy to the random variable case.. In Sections 5.2 and 5.3 we
Trang 1CHAPTER 5
Random vectors and their distributions
The probability model formulated in the previous chapter was in the form
of a parametric family of densities associated with a random variable (r.v.) X: ®={ f(x; 6), 0€O} In practice, however, there are many observable phenomena where the outcome comes in the form of several quantitative attributes For example, data on personal income might be related to number of children, social class, type of occupation, age class, etc In order
to be able to model such real phenomena we need to extend the above framework for a single r.v to one for multidimensional r.v.’s or random vectors, that is,
X=(ŒX,X¿, X,},
where each X,,¡ = l,2, ,n measures a particular quantifiable attribute of the random experiment’s (6) outcomes
For expositional purposes we shall restrict attention to the two- dimensional (bivariate) case, which is quite adequate for a proper understanding of the concepts involved, giving only scanty references to the n-dimensional random vector case (just for notational purposes) In the next section we consider the concept of a random vector and its joint distribution and density functions in direct analogy to the random variable case In Sections 5.2 and 5.3 we consider two very important forms of the joint density function, the marginal and conditional densities respectively These forms of the joint density function will play a very important role in Part IV
5.1 Joint distribution and density functions
Consider the random experiment & of tossing a fair coin twice The sample
78
Trang 25.1 Joint distribution and density functions 79 space takes the form S={(H1T), (TH), (HH), (TT)} Define the function
X ,(-) to be the number of ‘heads’ and X,(-) to be the number of ‘tails’ Both
of these functions map S into the real line IR in the form
(X,(-), X20 )(TA)} =, 1), ee (X (TH), Xo(TH))= (1, 0), (X4(-), XC: )(AT)} = (1, Ð,
(X4(-), Xa(-))(CH)} = (2,0),
(X1(-), X2(-))(TT)} =, 2)
This is shown in Fig 5.1 The function (X,(-), X2(-)): S— R? is a two- dimensional vector function which assigns to each element s of S, the pair of ordered numbers (x,,x2) where x,=X,(s), x,=X,(s) As in the one- dimensional case, for the vector function to define a random vector it has to
satisfy certain conditions which ensure that the probabilistic and event
structure of (S, ¥ P(-)) is preserved In direct analogy with the single variable case we say that the mapping
defines a random vector if for each event in the Borel field product #x Z=
8, say B=(B,, B,), the event defined by
X~1(B)={s: X,(s)eB,, X,(s)€B,,s€S} (5.2)
belongs to &
Ss X (+) = (Ky +), XK)
X¿
0
Fig 5.1 A bivariate random vector
x
Trang 380 Random vectors and their distributions
Extending the result that # can be profitably seen as being the o-field
generated by half-closed intervals of the form {— «,x] to the case of the
direct product # x #8 we can show that the random vector X(-) satisfying
X-'!{(—~z ,x])e.# for all xelR2?
implies X~1(B)e¢ ¥ for all BEB’ (5.3)
This allows us to define a random vector as follows:
Definition 1
A random vector X(-): is a vector function
such that for any two real numbers (x,,X )=x, the event
X-!(—z.,x]={s:—œ<X¡§)<xy,
—~ x <ÄX¿(s)<x;, seS} c,Z
Note.(— x,x]=((— x x,].(— x.x,]) represents an infinite rectangle (see Fig 5.2) The random vector (as in the case of a single random variable)
induces a probability space (R*, 2”, P,(-)), where Z? are Borel subsets on
the plane and P,(-) a probability set function defined over events in #7, ina way which preserves the probability structure of the original probability
X;
a
mn
{x1 XQ) xy ST xy < x5}
me
Fig 5.2 The inũnite rectangle (— +, x*], x*=(Xị Xa)
Trang 4
%1 Joint distribution and density functions 81 space (S,# P(-)) This is achieved by attributing to each Be-Z? the probability
or
This enables us to reduce P,({-) to a point function F(x,, x), we call the joint (cumulative) distribution function
Definition 2
Let X=(X,,X) be a random vector defined on (S, ¥, P(-)) The function defined by
such that
F(x) = F(X1,X2)=P,{(— «©, x]) = PX, <x, X2 <x)
=Pr(X <x)
is said to be the joint distribution function of X
In the coin-tossing example above, the random vector X(-) takes the
value (1, 1), (2,0), (0, 2) with probabilities 4, 4 and 4 respectively In order to
derive the joint distribution function (DF) we have to define all the events of
the form {s: X;(s)<x¡, X;z(s)<x;¿, seS} for all (x¡, x;) e8?
Ø x;<0,x;<0 {(HT),(THỊ, 0<x¿<2,0<x;<2
is: Ä¡(s)Sx;:, X;¿(s)<x¿,seS}= ( (HH), x:;>2,0<x;<2
{TT}, O<x,<2,x,22
Nore a degree of arbitrariness in choosing the infinite rectangles (— «, x] The joint DF of X, and X, is given by
0, x,<0,x,<0
$3, O<x,<2,0<x,<2
F(x,,X,)= 3
p X,22,0<x,<2
1, x, 22,x,22.
Trang 582 Random vectors and their distributions
Table 5.1 Joint density function of (X,, X >)
X;
From the definition of the joint DF we can deduce that F(x,,X,) 1s a monotone non-decreasing function in each variable separately, and
As in the case of one r.v., we concentrate exclusively on discrete and continuous joint DF only; singular distributions are not considered
Definition 3
The joint DF of X, and X, is called a discrete distribution if there exists a density function f{() such that
/#&¡,x;)>0, (x¡,x;) elR? (5.10) and it takes the value zero everywhere except at a finite or countably infinite point in the plane with
In the coin-tossing example the density function in a rectangular array form is represented in Table 5.1 Fig 5.3 represents the graph of the joint density function of X =(X,, X 2) The joint DF is obtained from f(x,, x) via the relation
Definition 4
The joint DF of X, and X , is called (absolutely) continuous if there exists a non-negative function f(x,,x,) such that
Fx,.x;)= Ỉ Ừ ƒ(u, e) du ức, (5.13)
Trang 652 Some bivariate distributions
83
Ff (x4,X2)
xy
Fig 5.3 The bivariate density function of Table 5.1
f(xị,Xz} 1S called the joint (probability) density function of X,,X This definition implies the following properties for f(x,, Xa):
(F1) ƒ(x,,x;) đxị dx;= 1b
(5.14)
(5.15)
if f(-) is continuous at (x1, Xz)
5.2 Some bivariate distributions
(1) Bivariate normal distribution
(1—p?)?
x3 = -
2(1—p”) oy
—2 Xy Ta X27 Ha 4 X)—H2\"
0=(H¡ Hạ, 01, 03 pyeR? x RẺ x [0 1].
Trang 784 Random vectors and their distributions
F (x4, X>)
Fig 5.4 The density function of a standard normal density
It is interesting to note that the expression inside the square brackets expressed in the form of
defines a sequence of ellipses of points with equal probability which can be viewed as map-like contours of the graph of f(x,, x2) represented in Fig 5.4
(2) Bivariate Pareto distribution
f(X45Xa) = HAF Dayan) (agxy +4yXq— Gy ay) 7",
(A>0, x, >a, >0, x2 >a4,>0), (5.19)
O=(/.;.d))
(3) Bivariate binomial distribution
nl ee
/(xị.X¿z)= vÀ xa DỊ P2" xịi+x;¿=l, pi†+p;=] (5.20)
xijtxảl
The extension of the concept of a random variable X to that of a random vector X=(X,, X>, , X,,) enables us to generalise the probability model
Trang 85.3 Marginal distributions 85
to that of a parametric family of joint density functions
This is a very important generalisation since in most applied disciplines, including econometrics, the real phenomena to be modelled are usually multidimensional in the sense that there is more than one quantifiable feature to be considered
5.3 Marginal distributions
Let X =(X,, X,) bea bivariate random vector defined on (S,.% P(-)) witha joint distribution function F(x;, x;) The question which naturally arises is whether we could separate X, and X, and consider them as individual random variables The answer to this question leads us to the concept of a marginal distribution The marginal distribution functions of X ¡ and X ; are defined by
xi>#
Having separated X , and X , we need to see whether they can be considered
as single r.v.’s defined on the same probability space In defining a random vector we imposed the condition that
The definition of the marginal distribution function we used the event
which we know belongs to # This event, however, can be written as the intersection of two sets of the form
but the second set is S ie {s: X4(s)< x} =S,
which implies that
ft; X¡(s)Sxị, X2(S)< DH} = [81 Xy(s) Sy 5, (5.28)
which indeed belongs to ⁄Z and it is the condition needed for X; to be a r.v
with a probability function F ,(x,); the same is true for X ; In order to see
Trang 986 Random vectors and their distributions
this, consider the joint distribution function
—Đxạ F(x,,x;)=l—e ”'—e"”: +exp| —=Ô0(x¡ +x;)},
(x;.X;)€ R , (5.29)
F,(x¡)= lim F(x,,x;)=l—e ”"!, x,elR,,
since lim„ „(e `”)=0 Similarly,
Note that F¡(x¡) and F;(x;) are proper distribution functions
Given that the probability model has been defined tn terms of the joint density functions, it is important to consider the above operation of marginalisation in terms of these density functions The marginal density
functions of X, and X, are defined by
fts=| ƒ(xị,x;)dx;
fIs)= | | ƒ(xịi, X;) dx:,
— 2
that is, the marginal density of X,(= l,2) is derived by integrating out
X ({i#j) from the joint density In the discrete case this amounts to summing out with respect to the other variable:
II 1
Example
Consider the working population of the UK classified by income and age as follows:
Income: £2000—-4000, £4000—8000, £8000— 12 000, £12 000—20 000, £20 000—
50 000, over £50 000
Age: young, middle-aged senior
Define the random variables X ,-income class, taking values 1-6, and X ,- age class, taking values 1-3 Let the joint density be (Table 5.2):
Trang 105.3 Marginal distributlons 87
Table 5.2 Joint density of (X,,X)
Xp =Xy;
The marginal density function of X, is shown in the column representing row totals and it refers to the probabilities that a randomly selected person will belong to the various income classes The marginal density of X, is the row representing column totals and it refers to the probabilities that a randomly selected person will belong to the various age classes That is, the marginal distribution of X,(X.,) incorporates no information relating to X,(X,) Moreover, it is quite obvious that knowing the joint density function of X, and X, we can derive their marginal density functions; the reverse, however, is not true in general Knowledge of f,(x,) and f,(x3) is enough to derive f(x,,x,) only when
in which case we say that X, and X, are independent r.v.’s Independence in terms of the distribution functions takes the same form
In the case of the income-age example it is clear that
ƒ(xị,x;)#/(x¡) folX2),
e.g
0.250 ¥ (0.275)(0.4),
and hence, X, and X, are not independent r.v.’s, ie income and age are related in some probabilistic sense; it is more probable to be middle-aged and rich than young and rich!
In the continuous r.v.’s example we can easily verify that
F (x1) Fa(x2)=U ee) = Flxy, x2), (5.34)
and thus X, and X, are indeed independent.
Trang 1188 Random vectors and their distributions
Note that two events, A, and A), in the context of the probability space (S, ¥, P(-)) are said to be independent (see Section 3.3) if
It must be stressed that marginal density functions are proper density functions satisfying all the properties of such functions In the income—age example it can be seen that ƒ,(x;)>0 /;(x;)>0 and 3; /(x¡j)= I and
3; /2(x;¡j)= 1
Because of its importance in what follows let us consider the marginal density functions in the case of the bivariate normal density:
x 1- 2)\-4
fis=| une)
_„ 2010;
since the integral equals one, the integrand being a proper conditional density function (see Section 5.4 below)
Similarly, we can show that
Hence, the marginal density functions of jointly normal r.v.’s are univariate normal
In conclusion we observe that marginalisation provides us with ways to simplify a probability model when such model is defined in terms of joint density functions by ‘taking out’ any unwanted random variables In general, the marginal density of the r.v.’s of interest X,, X, , X, can be
Trang 125.4 Conditional distributions 89 derived from the joint density function of X¡., X¿ Xg Äz y4 X Via
đ.2, kÊXG Xxc cv Xx)
-| | | ƒ(X‡,X¿ = Xz)dXy.¡ dX,,
`——~-
In the income-age example if age is not relevant in our investigation we can simplify the probability by marginalising out X 5
5.4 Conditional distributions
In the previous section we considered the question of simplifying probability models of the form (22) by marginalising out some subset of the rv.’s X,, X>, , X, This amounts to ‘throwing away the information related to the r.v.’s; integrated out as being irrelevant In this section we consider the question of simplifying ® by conditioning with respect to some subset of the r.v.’s
In the context of the probability space (S,.4 P(-)) the conditional probability of event A, given event A, is defined by (see Section 3.3):
(5.41)
By choosing 4, ={s: X,(s)<x,} we could use the above formula to derive
an analogous definition in terms of distribution functions that is
where
P(fs:X,(s)<xi}n4;
PUX <j) Ag) PISA 4) (5.43)
P(A)
As far as event A, is concerned there are two related forms we are particularly interested in, A, = {X,=%,}, where X, is a specific value taken
by X; and A,=o(X,) i.e the o-field generated by X, In the case where A,=o(X,), there are no particular problems arising in the definition of the conditional distribution function
since ơ(X ;)c.Z, although ít is not particularly clear what form Fy ay, will
take In the case where A,='s: X,(s)=X,}, however, it is immediately
obvious that since P(s: A,(s)=x,)=0 when X, is a continuous r.v., there will