48 Random variables and probability distributions 4.1 The concept of a random variable Fig.. On the other hand, the function Y-: S—> R, defined by YHT}= Y{HH}=1, Y{TH}=Y{TT}=0 does not p
Trang 1developed much further than stating certain properties of P(-) and
introducing the idea of conditional probability This is because the model based on (S, ¥ P(-)) does not provide us with a flexible enough framework The main purpose of this section is to change this probability space by mapping it into a much more flexible one using the concept of a random variable
The basic idea underlying the construction of (S, ¥ P(-)) was to set upa framework for studying probabilities of events as a prelude to analysing problems involving uncertainty The probability space was proposed as a formalisation of the concept of a random experiment & One facet of & which can help us suggest a more flexible probability space is the fact that
when the experiment is performed the outcome is often considered in
relation to some quantifiable attribute; i.e an attribute which can be represented by numbers Real world outcomes are more often than not
expressed in numbers It turns out that assigning numbers to qualitative
outcomes makes possible a much more flexible formulation of probability theory This suggests that if we could find a consistent way to assign numbers to outcomes we might be able to change (S, & P(-)) to something more easily handled The concept of a random variable is designed to do just that without changing the underlying probabilistic structure of (S, F, P(-))
47
Trang 248 Random variables and probability distributions
4.1 The concept of a random variable
Fig 4.1 illustrates the mathematical model (S, ¥ P(-)) for the coin-tossing
example discussed in Chapter 3 with the o-field of interest being ¥ ={S, @,
UHH)}, {(TT)j, {(HH.(TT)), ((TH)(HT)}, ((HT),(TH), (HH),
{(HT).(TH).(TT)}) The probability set function P(-) is defined on ¥ and
rakes values in the interval (0, 1], i.e P(-) assigns probabilities to the events
in ¥ As can be seen, various combinations of the elementary events in S define the o-field F (ensure that it is a o-field!) and the probability set function P(-) assigns probabilities to the elements of ¥
The main problem with the mathematical model (S, ¥ P(-)) is that the
general nature of S and ¥ being defined as arbitrary sets makes the mathematical manipulation of P(-) very difficult; its domain being a o-field
of arbitrary sets For example, in order to define P(-) we will often have to derive all the elements of ¥ and tabulate it (a daunting task for large or infinite ¥s), to say nothing about the differentiation or integration of sucha set function
Let us consider the possibility of defining a function X(-) which maps S directly into the real line R, that is,
Trang 34.1 The concept of a random variable 49
us with a consistent way of attaching numbers to elementary events;
consistent in the sense of preserving the event structure of the prob- ability space (S, # P(-)) The answer, unsurprisingly, is certainly not
This is because, although X is a function defined on S, probabilities are
assigned to events in ¥ and the issue we have to face is how to define the values taken by X for the different elements of S in a way which preserves the event structure of 4% In order to illustrate this let us return to the earlier example To each value of X, equal to 0, 1 and 2 there correspond some subset of S, 1.¢
that is, it preserves unions, intersections and complements In other words,
Trang 450 Random variables and probability distributions
for each subset N of R, the inverse image X~‘(N) must be an event
in # Looking at X as defined above we can see that X '(O)eF
X~1({1}© {2}) € F, that is, X(-) does indeed preserve the event structure of
¥ On the other hand, the function Y(-): S—> R, defined by Y((HT})=
Y({HH})=1, Y({TH})=Y({TT})=0 does not preserve the event structure
of # since Y-'(0)¢ % Y~'(1)¢# This prompts us to define a random variable X to be any such function satisfying this event preserving condition
in relation to some o-field defined on R,; for generality we always take the
Three important features of this definition are worth emphasising
(i) A random variable is always defined relative to some specific o-
field F
(ii) In deciding whether some function Y(-): S>R is a random
variable we proceed from the elements of the Borel field 4 to those
of the o-field ¥ and not the other way around
(ili) A random variable is neither ‘random’ nor ‘a variable’
Let us consider these important features in some more detail in order to enhance our understanding of the concept of a random variable; undoubtedly the most important concept in the present book
The question ‘is X(-): S— Ra random variable?’ does not make any
sense unless some o-field ¥ is also specified In the case of the function X- number of heads, in the coin-tossing example we see that it is a random variable relative to the o-field ¥ as defined in Fig 4.1 On the other hand, Y,
as defined above, is not a random variable relative to -¥ This, however, does not preclude Y from being a random variable with respect to some other o-
fied ¥,; for instance F,=(S,@.{(HH)(HT)}, {(TH),(TT)}} Intuition
suggests that for any real valued function X(-): S > IR we should be able to define a o-field 4, on S such that X is a random variable In the previous section we considered the o-field generated by some set of events C Similarly, we can generate o-fields by functions X(-): S— R which turn X(-) into a random variable Indeed %, above is the minimal o-field generated by Y, denoted by ø(Y) The way to generate such a minimal o-field
is to start from the set of events of the inverse mapping Y 1(-), Le {(HT),(HH)} = Y ~!(1) and {(TH),(TT)} = Y ~ (0) and generate a o-field by taking unions, intersections and complements In the same way we can see
Trang 54.1 The concept of a random variable 51
that the minimal o-field generated by X — the number of heads, ø(X) coincides with the o-field ¥ of Fig 4.2; verify this assertion In general, however, the o-field ¥ associated with S on which a random variable X is defined does not necessarily coincide with o(X) Consider the function
The above example is a special case of an important general result where X,, X, , X, are random variables on the same probability space (S, ¥ P(-)) and we define the new random variables
Y,=X,, Yo=Xy+X2, Y3=X,+X 4+ Xs,
Ifo(Y,), o(¥3), ,0(¥,) denote the minimal o-fields generated by Y,, Y, ,
, respectively, we can show that
i.e a(Y,), ,6(Y,) form an increasing sequence of a-fields in ¥ In the above example we can see that if we define a new random variable X ,(-):S —> by
X;((HHj)=1, X;(TT)j)= X;(ŒH)j = X(T T)}) =9,
then X = X; + X; (see Table 4 L) 1s also a random variable relative to ø(X);
X is defined as the number of Hs (see Table 4.1)
Note that X, is defined as ‘at least one H’ and X, as ‘two Hs’
The above concept of a-fields generated by random variables will prove
very useful in the discussion of conditional expectation and martingales (see Chapters 7 and 8) The concept ofa o-field generated by a random variable enables us to concentrate on particular aspects of an experiment without having to consider everything associated with the experiment at the same
time Hence, when we choose to define a r.v and the associated a-field we make an implicit choice about the features of the random experiment we are
interested in
‘How do we decide that some function X(-): S— Ris a random variable
relative to a given o-field ¥? From the above discussion of the concept of a
Trang 652 Random yariables and probability distributions
random variable it seems that if we want to decide whether a function X isa random variable with respect to % we have to consider the Borel field 4 on
R orat least the Borel field 4, on R,;a daunting task It turns out, however,
that this is not necessary From the discussion of the o-field a(J) generated
by the set J={B,:xeR' where B,=(— «,x] we know that #=o(J) and if
X(-) is such that
X“'(-x xJHts: Xisde(—a.x],seSleF
for all(—z x]ec.2 (4.6) then
In other words, when we want to establish that X is a random variable or
define P,(-) we have to look no further than the half-closed intervals
(—x,x] and the o-field o(J) they generate, whatever the range iR, Let us
use the shorthand notation { X(s) <x} instead of {s: X(x)e(— x.x],seS} to
consider the above argument in the case of X — the number of Hs, with respect to ¥ in Fig 4.2
we can see that X~ !{(— z x])e.Z for all xelR and thus X(-) is a random
variable with respect to 4 On the other hand,
The term random variable is rather unfortunate because as can be seen from the above definition X is neither ‘random’ nor a ‘variable’; it is a real valued function and the notion of probability does not enter its definition Probability enters the picture after the random variable has been defined in
an attempt to complete the mathematical model induced by X
Trang 74.1 The concept of a random variable 53
be consistent with the probabilities assigned to the corresponding events
in F% Formally, we need to define a set function P,(-): 4 — [0, 1] such that
P.(Bì=P(X!1(B)= P(s: X(s)eB.seS) for all Be.2 (4.10) For example, in the case illustrated in Table 4.1
=, PA 2)=s, PCO; (=a ete
Jah PCI ae Pu00/2/1/)0£1 PCO} VS 0 (1) =O
LS ll
Ble ~~ ” a, —
The question which arises is whether, in order to define the set function P,{-), we need to consider all the elements of the Borel field #4 The answer is that we do not need to do that because, as argued above, any such element
of 4 can be expressed in terms of the semi-closed intervals (— «, x] This implies that by choosing such semi-closed intervals ‘intelligently’, we can define P,(-) with the minimum of effort For example, P,(-) for x, as defined
in Table 4.1, can be defined as follows:
Ás we can see, the semi-closed intervals were chosen to divide the real line at
the points corresponding to the values taken by X This way of defining the semi-closed intervals is clearly non-unique but it will prove very convenient
in the next section
The discerning reader will have noted that since we introduced the concept of a random variable X(-) on (S,.4 P(-)) we have in effect
Trang 854 Random variables and probability distributions
developed an alternative but equivalent probability space (R,.4 P,(-))
induced by X The event and probability structure of (S,4~ P(-)) is
preserved in the induced probability space (R, 4, P,{-)) and the latter has a much ‘easier to handle’ mathematical structure; we traded S, a set of
arbitrary elements, for R, the real line, ¥ a o-field of subsets of S with 4, the
Borel field on the real line, and P(-) a set function defined on arbitrary sets with P,{-), a set function on semi-closed intervals of the real line In order to illustrate the transition from the probability space (S,4% P(-)) to
(R,, 4 P,{-)) let us return to Fig 4.1 and consider the probability space
induced by the random variable X-number of heads, defined above As can
be seen from Fig 4.3, the random variable X(-) maps S into (0, 1,2}
Choosing the semi-closed intervals (— 0,0], (—«, 1], (—«,2] we can generate a Borel field on R which forms the domain of P,,(-) The concept of
a random variable enables us to assign numbers to arbitrary elements of a set (S) and we choose to assign semi-closed intervals to events in ¥ as induced by X By defining P,(-) over these semi-closed intervals we complete the procedure of assigning probabilities which is consistent with the one used in Fig 4.1 The important advantage of the latter procedure is that the mathematical structure of the probability space (R, Z2, P,(-)) isa lot more flexible as a framework for developing a probability model The purpose of what follows in this part of the book is to develop such a flexible mathematical framework It must be stressed, however, that the original probability space (S, ¥ P(-)) has a role to play in the new mathematical
framework both as a reference point and as the basis of the probability
model we propose to build Any new concept to be introduced has to be related to (S, ¥ P(-)) to ensure that it makes sense in its context
Fig 4.3 The change from (S,¥ ,P(-)) to (R,,¥,P,(-)) induced by X
Trang 94.2 The distribution and density functions 55
4.2 The distribution and density functions
In the previous section the introduction of the concept of a random variable (rv.), X, enabled us to trade the probability space (S, 4 P(-)) for
(R, 4, P,{-)) which has a much more convenient mathematical structure
The latter probability space, however, is not as yet simple enough because P,(-)is still a set function albeit on real line intervals In order to simplify it
we need to transform it into a point function (a function from a point to a point) with which we are so familiar
The first step in transforming P,(-) into a point function comes in the form of the result discussed in the previous section, that P,(-) need only be defined on semi-closed intervals (— x,x], x €R, because the Borel field # can be viewed as the minimal o-field generated by such intervals With this
in mind we can proceed to argue that in view of the fact that all such intervals have a common starting ‘point (— «) we could conceivably define
a point function
whichis, seemingly, only a function of x In effect, however, this function will
do exactly the same job as P,(-) Heuristically, this is achieved by defining F(-) as a point function by
Pal -— x, x]\)=F(x)-F(—«), for all xeR, (4.13) and assigning the value zero to F(— «) Moreover, given that as x increases the interval it implicitly represents becomes bigger we need to ensure that F(x) is a non-decreasing function with one being its maximum value (i.e
F(x,)<F(x,) if x; <x, and lim F(x) = 1) For mathematical reasons we
also require F(-) to be continuous from the right
(iii) F(x) is continuous from the right
(i.e im,¡ạ F(x + h)= F(x), VxefR) (4.17)
Trang 1056 Random variables and probability distributions
Tt can be shown (see Chung (1974)) that this defines a unique point function
for every set function P,(-)
The great advantage of F(-) over P(-) and P,(-) is that the former is a point function and can be represented in the form of an algebraic formula: the kind of functions we are so familiar with in elementary mathematics This will provide us with a very convenient way of attributing probabilities
to events
Fig 4.4 represents the graph of the DF of the r.v ¥ in the coin-tossing example discussed in the previous section, illustrating its properties in the case of a discrete rv X-number of Hs
Definition 3
A random variable X is called discrete if its range R,, is some subset
of the set of integers Z=(O41, +2, }
In this book we shall restrict ourselves to only two types of random variables, namely, discrete and (absolutely) continuous
Definition 4
A random variable X is called (absolutely) continuous if its distribution function F(x) is continuous for all x € R and there exists
a non-negative function f(-) on the real line such that
Trang 114.2 The distribution and density functions 57
It must be stressed that for X to be continuous is not enough for the distribution function F(x) to be continuous The above definition postulates that F(x) must also be derivable by integrating some non-negative function (x) So far the examples used to illustrate the various concepts referred to discrete random variables From now on, however emphasis will be placed almost exclusively on continuous random variables The reason for this is that continuous random variables (r.v.’s) are susceptible to a more flexible mathematical treatment than discrete r.v.s and this helps in the construction of probability models and facilitates the mathematical and statistical analysis
In defining the concept of a continuous r.v we introduced the function f(x) which is directly related to F(x)
is said to be the (probability) density function (pdf) of X
In the coin-tossing example, f(0)=4, f(1)=4, and f(2)=4 (see Fig 4.5) In order to compare F(x) and f(x) fora discrete with those of a continuous rv, let us consider the case where X takes values in the interval (a b] and all values of X are attributed the same probability: we express this by saying that X is uniformly distributed in the interval [a,b] and we write X ~ U(a, b) The DF of X takes the form
Trang 1258 Random variables and probability distributions
Comparing Figs 4.4 and 4.5 with 4.6 and 4.7 we can see that in the case of
a discrete random variable the DF is a step function and the density
function attributes probabilities at discrete points On the other hand, fora
continuous r.v the density function cannot be interpreted as attributing probabilities because, by definition, if X is a continuous r.v P(X =x)=0 for all xe R This can be seen from the definition of f(x) at every continuity
Trang 134.2 The distribution and density functions 59
we gain in simplicity and added intuition It enhances intuition to view
density functions as distributing probability mass over the range of X The density function satisfies the following properties:
(4.28)
Trang 1460 Random variables and probability distributions
Properties (ii) and (11) can be translated for discrete r.v.’s by substituting
‘Y’ for ‘}-dx’ It must be noted that a continuous r.v is not one with a
continuous DF F(-) Continuity refers to the condition that also requires the existence of a non-negative function f{-) such that
4.3 The notion of a probability model
Let us summarise the discussion so far in order to put it in perspective The axiomatic approach to probability formalising the concept of a random experiment & proposed the probability space (S,.~ P(-)), where S
represents the set of all possible outcomes, ¥ is the set of events and P(-)
assigns probabilities to events in-~ The uncertainty relating to the outcome
of a particular performance of & is formalised in P(-) The concept of a random variable X enabled us to map S into the real line R and construct an equivalent probability space induced by X,(R, 4 P,(-)), which has a much
‘easier to handle’ mathematical structure, being defined on the real line
Although P,(-) is simpler than P(-) it is still a set function albeit on the Borel field 4 Using the idea of o-fields generated by particular sets of events we defined P,(-) on semi-closed intervals of the form (— %, x] and managed to define the point function F(-), the three being related by
P(s: X(s)e(— %., x], seS)= P,(— x]= F(x) (4.30) The distribution function F(x) was simplified even further by introducing the density function f(x) via F(x)=J*%,, f(u) du This introduced further flexibility into the probability model because f(x) is definable in closed algebraic form This enables us to transform the original uncertainty related
to & to uncertainty related to unknown parameters @ of f(-): in order to emphasise this we write the pdfas f(x: 0) We are now ina position to define our probability model in the form ofa parametric family of density functions which we denote by
® represents a set of density functions indexed by the unknown parameter(s)
@ which are assumed to belong to a parameter space © (usually a multiple of the real line) In order to illustrate these concepts let us consider an example
Trang 154.3 The notion of a probability model 61 Fix) ~ Ov oy
4 u
SNe
Fig 4.8 The density function of a Pareto distributed random variable for
different values of the parameter
of a parametric family of density functions, the Pareto distribution:
be seen from Fig 4.8
When such a probability model is postulated it is intended as a description of the chance mechanism generating the observed data For example, the model in Fig 4.8 is commonly postulated in modelling personal incomes exceeding a certain level x9 If we compare the above graph with the histogram of personal income data in Chapter 2 for in- comes over £4500 we can see that postulating a Pareto probability density seems to be a reasonable model In practice there are numerous
such parametric families of densities we can choose from, some of which will
be considered in the next section The choice of one such family, when modelling a particular real phenomenon, is usually determined by previous experience in modelling similar phenomena or by a preliminary study of the data
When a particular parametric family of densities ® is chosen, as the appropriate probability model for modelling a real phenomenon, we are in effect assuming that the observed data available were generated by the
‘chance mechanism’ described by one of those densities in ®, The original
uncertainty relating to the outcome of a particular trial of the experiment