CHAPTER 6 Functions of random variables One of the most important problems in probability theory and statistical inference is to derive the distribution of a function hX,,X3,....X,, w
Trang 1CHAPTER 6
Functions of random variables
One of the most important problems in probability theory and statistical
inference is to derive the distribution of a function h(X,,X3, X,,) when
the distribution of the random vector X=(X, , X,,) is known This
problem is important for at least two reasons:
(i) it is often the case that in modelling observable phenomena we are
primarily interested in functions of random variables; and
(11} in statistical inference the quantities of primary interest are
commonly functions of random variables
It is no exaggeration to say that the whole of statistical inference is based on
our ability to derive the distribution of various functions of r.v.’s In the first
subsection we are going to consider the distribution of functions ofa single
r.v and then consider the case of functions of random vectors
6.) Functions of one random variable
Let X be a r.v on the probability space (S,-¥% P(-)) By definition, X(-):
5 > R.ie X isa real valued function on S Suppose that f(-): R > R, where
h is a continuous function with at most a countable number of
discontinuities More formally we need /(-) to be a Borel function
Definition |
A function h(-): R, > Ris said to be a Borel function if foranvaeR
and x ER, the set B,= (x: h(x) <a} isa Borel set i.e B, €-4 where B
is the Borel field on R (see Section 3.2)
Requiring that h(-) is a Borel function is an obvious condition to impose
even that we need FON) to be a random variable itself
Wœ£ khow thất Vos ct dimetion from Š to ca and thú, SCVGH cạn be
Hà
6.1 Functions of one random variable 97
A= {s:h(X)(s)eB,}
A(X(s))=h (X)(s) Fig 6.1 A Borel function of a random variable
considered a function from S to R and the above ensures that the composite function h(X): S - R is indeed a random variable, i.e the set A= {s: h(Xs)eB,}c.Z for any B,eF (see Fig 6.1) Let us denote the r.v A(X) by
Y, then Y induces a probability set function P,(-) such that P,(B,)= P,(B,)= P(A), in order to preserve the probability structure of the original (S, % P(-)) Note that the reason we need h(-) to be a Borel function is to
preserve the event structure of Z2
Having ensured that the function /(-) of the r.v X is itself a rv Y=h(X)
we want to derive the distribution of Y when the distribution of X is known Let us consider the discrete case first When X isa discrete r.v the Y=h(X)
is again a discrete r.v and all we need to do is to give the set of values of Y and the corresponding probabilities Consider the coin-tossing example where X is the r.v defined by X =(number of H — number of T), then since S={HT, TH, HH,TT}, X(HT)= X(TH)=0, X(HH)=2, X(TT)= —2 and the probability function ts
P(X=x) 4 3
let Y= X?, then Y takes values (—2)?=4, (0)?=0, 2?=4 with the same
probabilities as X but since 4 occurs twice we add the probabilities, i.e
PdY=y) 1 3
In general, the distribution function of Y is defined as
Fy) = P(s: ¥(s) <= Pls: X(s)eh'((-— cw vy), (6.1) where the inverse function A '(-) need not be unique
Trang 2OS Functions of random variables
In the case where V is a continuous r.v., deriving the distribution of ¥
h(X) is not as simple as the discrete case because, firstly, Y is not always a
continuous r.v as well and, secondly, the solution to the problem depends
crucially on the nature of h(-) A sufficient condition for Y to be a
continuous r.v as well is given by the following lemma
Lemma
Let X be a continuous r.v and Y =h(X) where h(X) is differentiable
for allx ER, and [dh(x)}/(dx) > 0 or [dh(x)]/(dx) <0 for all x Then
the density function of Y is given by
d
> J(6y)=/ˆ"()) lạ h- l0) Jora<y<b, (6.2)
where || stands for the absolute value and a and b refer to the
smallest and biggest value y can take, respectively
Example 1
Let X ~ N(u, 0”) and Y =(X —y)/o, which implies that [dh(x)]/(dx) = 1/o>
Ofor all x eR since o > 0 by definition; h~'(y)=oy+pand [dh~'(y)]/(dy)=
o Thus since
fay {aCe f
ie Y ~ N(O, 1) the standard normal distribution
In cases where the conditions of Lemma 1 are not satisfied we need to
derive the distribution from the relationship
F,(y) = Pr(h(x)2 y)= Pr(X ch~!((— œ, x])) (6.3)
Example 2
Let X ~ Núu, ø?)and Y = X” (see Fig 6.2) Since [dh(x)]/(dx) = 2x we can see
that h(x) is monotonically increasing for x>0 and monotonically
decreasing for x <0 and Lemma 2 is not satisfied However, for y>0
Fg(y)= PH(h(x) <y) = Prix eh *(— w, x])
=Pr(—./y<X <,/=F,/y)-FA-V/y)
Aix}
h (x) = x?
EE———*kx)<y———
Fig 6.2 The function Y= X? where X is normally distributed
(see Fig 6.2) In this form we can apply the above lemma with [dh” !(y)]/
(dy)= 1/(2,/y) for x >0 and x <0 separately to get
Horm hida( 5.) +6l-V9( 575) for y>0
=5( 2 (2z) exp{—3y}+ 2 (Qn) : exp{—4y} Pi —a2yys fy }y7?
1
“FD (9)! exp(—4y) y>0
That is, f,(y) is the so-called gamma density, where I(-) is the gamma function (I(n)=|@ v"e~’ dv) A gamma r.v denoted by Y~G(r, p) has a density of the form ƒ(y)=[p/T0)l(pyƑ~' exp(— py), y>0 The above distribution is G(4,4) and is known as the chi-square distribution; an important distribution in statistical inference; see Appendix 6.1
6.2* Functions of several random variables
As in the case of a single r.v for a Borel function h(-): R’—R and a random
vector X=(X,, X, , X,,), h(X) is a random variable Let us consider
certain commonly used functions of random variables concentrating on the two variables case for convenience of exposition
Trang 3100 hunetions of random sariables
xi†x¿ạ=y
Fig 6.3 The function Y= X,+X3
(td) The distribution of X , +X,
By definition the distribution function of Y=, +X, (see Fig 6.3) is
Fy(y)=PHX; +X;<y)
SQ) -| ƒy—x;z,x;)dx;, ve
In particular, if X, and X, are independent, then
6I9=| #4(y—x¿) f2(X¿) asa | Ail) aly — x1) dx,,
—
by symmetry Using an analogous argument we can show that for Y=
XxX, ~ X;
Por Vand Ny independent
Ay tvs val HP bt wad faly a)
AO) hy, —1}
Lxantple 3
Pot Vy ~ Nt oT), X 2 ~ N(uo, 63), X , and X , are independent r.v.’s Define
yo oN, ¢ ¥,, then
"mm
“|svau°m| | 5 Jin
{+034 cà {Y0 +2),
= SP{— 2gyø) {|
Hence, Y~ N(u, + 2, 07 +03) In general if X,,X,, ,X,, are independent nv.s with X,~ N(u;, 07); then
Y= » x~M(Š Hạ 3 ø?}
Example 4 let X;~ U(—1, 1), i= 1,2 (uniformly distributed), and define Y= X,+X, Using Fig 6.3 we can show that
0, |>2
(sce Fig 6.4) For X,~ U(—1, 1), i=1,2,3 and Y=X,+X,4+-X, we can show
0, |y|23
)=J ——, (3-9)? I<i<
3— 2
go 0<|y|<1
This density function is shown below (see Fig 6.5) and as can be seen it is not
Trang 4
102 Functions of random variables
fyly)
0.5
!
y
Fig 6.4 The density function of Y=X,+X, where X, and X, are
uniformly distributed
f„ ty)
Fig 6.5 The density function of Y=X,+X,+X 3 where X;, i=1, 2, 3,
are uniformly distributed
only continuous but also differentiable everywhere The shape of the curve
is very much like the normal density This is a general result which states
mat for X;~ U(—1, 1),i=1,2, uniformly distributed independent r.v.’s,
=)", X; has a distribution which is closer to a normal distribution the
ghaate, the value of n; a particular case of the central limit theorem (see
Chapter 9)
(2) The distribution of X ,/X 5
Consider two rvos ¥, and NV, and let Y= N,N) The distribution of ¥
2* Functions of several random variables 103
Mf 7 xX ne | Xì
%
Fig 6.6 The function Y=X,/X, for Y<0 and Y >0
tikes the form
- ao Ê`yWXz
Ko | A(X1,X2) dx, dx,
09 —œ
0 œ
as suggested by Fig 6.6 For mm
F,(y) -|" f (UX2,X2)X2 dudx,
_>
s0!=| Ix] f(yx2, Xo) dx,, yeR (6.7)
In the case where X, and X, are independent this becomes
hvample 5 (the mathematical manipulations are not important!) EetV, - X0 DĐ and X, < z0) chi-square with n degrees of freedom, X, and Vo berg independent Define Yo Ny (¥a/n) and let us derive its Medobuton The density function of the denominator Z ~ (X3/n) ts given
Trang 5104 Functions of random variables
by
nr?
IA) = se) (nd) exp > › z>0
Since f(x,,2)=f)(x,) -f5(2), it takes values only for z>0, which implies that
00 [+L yam? | amt {st
nữưz)
Tin 7 explain yeh de
—l[{n+ 1)/2] R
This is the density of Student’s t-distribution
Example 6
Let X¥,~x°(n,) and X,~y?(n,) be two independent r.v.’s and define
_Ötx 1/m) _Hy Ấn
(Xz/n;) - ny xX,
A) -| (ws sa Ê say) fates dx,
(n n 2ˆ [ứu +n;)/2] n ứŒy/2)— 1 1/ 2 © ylinin/21= 1 x,
1
\ my dx,
2
(Ep) 2 2 ny}°
This represents the density of Fisher's /-distribution with a, and"; degrees
of freedom
6.2* Functions of several random variables 105
Example 7
Let X¥,~N(0,07), X,~ N(O, 03), X, and X independent r.v.’s, and define
y = X,/X, The density function of Y takes the form
x
Ị y?x) x3
00952 2| | sÍsp| ("2 +) Jess
Xã
-[ v,(er} (53 +3) Jes
d -
“nays Jo “APY 2 a2 oa
==—zz | oe "du, where u=— [| 3+
Ihe density of y is known as the Cauchy density function
(1) The dlistribufion øƒ Y = min(X¡, X;}) [he distribution function of ¥ =min(X,, X,) for two r.v.s X,, X takes the
yeneral form
Pe (y)= Primin(X, X 2) <¥)= 1 — Primin(X ,, X 2) >)
illustrated in Fig 6.7 In the case where X, and X, are independent
BG) = 1 == F(x, Flv)
ILxuample 8 bast
1+ 1í khoa das the Weibul đistrIbution function
Ne sisneriie Vathious sHHBIe Tunclons OE rÝ vs separately, let us
Trang 6106
X2
consider them together Let (X,, X,, ,
joint probability density function f(x,,x,
Functions of random variables
+%
4
`
r†
“— min (x1, X2)
7
⁄
Fig 6.7 The function Y=min (X;, X;)
transformation:
whose inverse take the form h; '(-)=g,(-), i= 1, 2,
Assume:
(i)
(ii)
(iii)
3ì =hy(X1,X2,- vơ Xx)
ya=h¿(X¡,X;, X„)
Vn =X 1 Xo ng Xn)
X1 =Gi(V 15 V25 -5 Va)
Xn=Gu(V1> Varrees Yn)
h,(:) and g,(-) are continuous;
the partial derivatives ¢x,/Cy;, i,j=1, 2,
continuous; and
the Jacobian of the inverse transformation
J =del( hiến 1" we) 20
COV Von Vy)
X,,) be a random vector with a ,X,) and define the one-to-one
(6.10)
,H
(6.11)
, D, exist and are
6.2* Functions of several random variables 107
These assumptions enable us to deduce that
FOr Va VHA Va)ooe es Gn Vise ees
Example 9 Let X;~ N(O, 1), i= 1,2 be two independent r.v.’s and
xX
Y¥, =hy(X,,X)=X,4+ X32, Y,=h,(X,,X,)=—
X;
Since
1+ 1+ ),)7 Vy
this implies that
a OPV IN tye If +99)
1 Wily | y2 = ¥a)
2xr(1+y¿)2 t1 2? yf
The main drawback of this approach is well demonstrated by the above example The method provides us with a way to derive the joint density function of the Y;s and not the marginal density functions These can be tetiscd by integrating out the other variables For instance,
and in the above example these take the form
/ Qn)
I , Cauchy density
fv)
¬¬
The derivations of these marginal density functions, however, involve some
mini atcd mathematical manipulations.
Trang 7108 Functions of random variables
6.3 Functions of normally distributed random variables, a summary
The above examples on functions of random variables show clearly that
deriving the distribution of h(X,, , X,) when f(x,, x,) is known is
not an easy exercise Indeed this is one of the most difficult problems in
probability theory as argued below Some of the above results, although
involved (as far as mathematical manipulations are concerned), have been
included because they play a very important role in statistical inference
Because of their importance generalisations of these results will be
summarised below for reference purposes
Lemma 6.1
If X;~N(uj,07), i=1, 2, ., a are independent r.v.s_ then
(È7-¡ Xj)~N7?-¡ ty, }7<¡ 6?) — normal
Lemma 6.2
If X;~ N(O, 1),i=1,2, nare independent r.v.’s then 7 1, X?)~
y°(n) — chi-square with n degrees of freedom
Lemma 6.2*
If X,~N(uj,07), i=1, 2, ., nm are independent r.v.s then
(Ề7-¡ XỶ/ø?)~ xứ: ð) — non-central chi-square with non-centrality
parameter, 0= À3 4 HỆ lơ,
Lemma 6.3
If X,~N(O.1), X.~y7(n) are X,,X > independent r.v’s then
X¡/[V(X;/n)]~ tín) - Student’s ¢ with n degrees of freedom
Lemma 6.3*
If X,~N(u,o*), X,~o07x7(n), X,,X independent r.v’s then
X1/[./(X 2/n)] ~ t(n; 6) non-central ¢ with non-centrality
parameter 0=p/0
Lemma 6.4
If Xi~yxm) X;~# (Hy), Xị,X; independent r.v.s then
(X ,/ny)AX/n2)~ F(n,, 2) — Fisher’s F with n, and ny degrees of
freedom
Lemma 6.4*
If X,~y71,10), X>~ ¥7(n,), X,,X5 being independent rrvos then , r~ LY z>~ÈH; Ad g I
(Xi m)AX H8) PUN.nïô) — non-central P ð bejmg the non-
centrality: parameter,
ma5
hectic | cú,0
†
|
{
|
'
t
| LUNE) erected t(n) ———¬
|
Fig 6.8 The normal and related distributions
Lemma 6.5
If X;~N(O, 1), i=1,2 are two independent rvs then (X ,/X5)~ C(O 1) — Cauchy distribution
{he relationships among the distributions referred to in these lemmas are depicted in Fig 6.8 For a summary of these distributions see Appendix 6.1 below, fora more extensive discussion see the excellent book by Johnson and Kotz (1970)
Note that if X ~t(n), Y= X?~ F(1,n) and for n=1, f(1)= C(1.0)
fn thus chapter we considered the distribution of functions of random variables, Although the mathematical manipulations are in general rather foselved this a very important facet of probability theory for two reasons:
w It often occurs in practice that the probability model is not defined
in terms of the original r.v.s bút tn some functions of these
tì Statistical inference is crucially dependent on the distribution of
functions of random: varrables Pstimators and test statistics are
Trang 8110 Functions of random variables
functions of r.v.’s of the form h(X ,, X5, ,X,,) and the distribution
of such functions is the basis of any inference related to the
unknown parameters 0
From the above discussion it is obvious that determining the distribution
of h(X,,X,, ,X,) is by no means a trivial exercise It turns out that more
often than not we cannot determine the distribution exactly Because of the
importance of the problem, however, we are forced to develop
approximations; the subject matter of Chapter 10
It is no exaggeration to say that most of the results derived in the context
of the various statistical models in econometrics, discussed in Part IV,
depend crucially on the results summarised in Section 6.3 above
Estimation, testing and prediction in the context of these models is based on
the results related to functions of normally distributed random variables
and the normal, Student’s t, Fisher’s F and chi-square distributions are
used extensively in Part IV
Appendix 6.1— The normal and related distributions
(1) Univariate normal ~ X ~ N(u, 07)
» Mi 1 7 yy "
x
E(X)=u, Var(X)=o", skewness=a,=0, kurtosis=a,=3
Higher moments:
o'r!
2r2( 7 \1
Characteristic function w(t) = exp(itp —4671°)
Cumulants kK,y=H, K,=07, K,=0, r=3,4
Some properties
(a) Z=[(X—#)/z]~N(0.1)— the standard normal distribution
(b) Reproductive property: WON» N 4.2 NX, are independent rvs,
ÁN <Š Ni 1 [4.0, 17 E2, cú then (NV 1< XIV vu Na] ca : oo 1 fae 2 9Ð,
Fly)
Fig 6.9 The density functions of a central and non-central chi-square
(2) Chi-square distribution — ¥ ~ 77(n)
ƒ›")= T5T(n/2) Am2~1e~02) vw>0, n=1,2, E(y)=n (the degrees of freedom)), Vat(y)= 2n
ihe density function is illustrated for several values of n in Fig 6.9 Reproductive property
It ¥, Y,, , ¥, are independent r.v.’s Y;~ y*(n;), i=1, 2, ., k, then
(Sh Y)~x7(ny tng 4° ++ +1,)
(1) Non-central chỉ-square distribution — Y ~ xŸ0 ð)
f(y ð,n)= Fi 27%?) exp[ — Mr +õj| y9 1
& (ðy*Tk+})
k=0 (21)! r(h +5)
y>0, 6>0, n=1,2,
Hence, the important difference with the central chi-square is that the
@ensity function is shifted to the right and the variance increases
Reproductive property
~Y, are independent rvs, ¥,~ v7 0,) i= 12,2 k, then
Trang 9
112 Functions of random variables
f (x)
x Fig 6.10 Comparison of a t and standard normal density
Student's t-distribution — W~ t(n)
ntl
com
~ /(nn) r(; " [+ 1072]
H
W)=—
n—-2
6 n>2, z¿=3+ —, n>4
n—4 These moments show that for large n the f-distribution is very close to the
normal (see Fig 6.10)
(5) Non-central t-distribution — W~ t(n: 0), 0>0
(M:m,ð)=—>————— ——sšz
"` awe ee
For large n
or Var(HMH)~ E1 5
al TH) ở
(6) Fisher’s F-distribution — U ~ F(n,, n>)
BU)= “25, mg>2, Vana a
wu>9
lus ny, a2) =
ny >4 The central and non-central F-distribution density functions are shown in lig 6.11 for purposes of comparison
(7) Non-central F-distribution - U~ F(n,,n3; 6), 0>0
f(u; ny, nz; 6)=
din +2k) kit +240= IP mị +nạ + 2K my mm
e?y
k = n, \iutm+20 ứn, n,+2k , u>d,
m(n;—2)
— ¬(12\ m +ð)?+(mị +2ð)(n; — 2)
vad2)=2( -) (n;— 2)?(n; -4) 7 nạ>4
f{u}
f(u;m,n)
tay OTL Central and non-centval PP densHy functions.
Trang 10[14 Functions of random variables
Important concepts
Borel functions, distribution ofa Borel function ofa r.v., normal and related
distributions, Student’s t, chi-square, Fisher’s F and Cauchy distributions
3
4
Questions
Why should we be interested in Borel functions of r.v.’s and their
distributions?
‘A Borel function 1s nothing more thana r.v relative to the Borel field 4
on the real line.” Discuss
Explain intuitively why a Borel function of a r.v is a r.v itself
Explain the relationships between the normal, chi-square, Student’s ¢,
Fisher’s F and Cauchy distributions
What is the difference between central and non-central chi-square and
F-distributions?
Exercises
Let X, be a r.v with density function
f (4) 43 5
Derive the density functions of
(i) x=*X?
(ii) X =e";
Let the density function of the rv X be ƒ(x)=e *, x>0 Find the
distribution of Y=log, X
Let the joint density function of X¥, and X, be
x,
Derive the distribution of
(ii) ¥ =min(X,, X,)
Let X ~ (0 1), derive the distribution of Y oN’
Additional references Clarke (1975); Cramer (1946): Giri (1974); Mood, Graybill and Boes (1974); Pfeiffer (1978); Rao (1973); Rohatgi (1976).