1 Event SpacesThe most fundamental notion of probability theory is that of a probability measure.Roughly speaking, a probability measure tells us the likelihood of observing anyconceivab
Trang 1be new for the economics student, so our pace is quite leisurely In particular, wediscuss algebras and σ-algebras in detail, pay due attention to Borel σ-algebras, andprove several elementary properties of probability measures Moreover, we outlinethe constructions of some useful probability spaces, including those that are induced
by distribution functions As usual, these constructions are achieved by invoking thefundamental extension theorem of Carathéodory We omit the proof of the existencepart of this theorem, but prove its uniqueness part as an application of the SierpinskiClass Lemma We then introduce the notion of a random variable, and discuss thenotion of Borel measurability at some length
The high point of the chapter is the introduction of the Lebesgue integration ory within the context of finite measure spaces In fact, we almost exclusively workwith probability measures, so the Lebesgue integral for the present exposition is noneother than the so-called expectation functional Our treatment is again leisurely Inparticular, we introduce the fundamental convergence theorems for the Lebesgue in-tegral by means of a step-by-step approach For instance, the Monotone ConvergenceTheorem is given in four different formulations First, we prove it for a sequence
the-of nonnegative random variables the pointwise limit the-of which is real-valued Then
we drop the nonnegativity assumption from the statement of the theorem, and thenreintroduce it but this time work with sequences that converge almost surely to anextended real-valued function Our fourth formulation states the result in full gener-ality We also study other important properties of the expectation functional, such
as its linearity, the change of variables formula, and Jensen’s Inequality The chapterconcludes with a brief introduction to the normed linear space of integrable randomvariables, and other related spaces
There is, of course, no shortage of truly excellent textbooks on probability theory
In particular, the classic treatments of Billingsley (1986), Durrett (1991), Shiryaev(1996) and Chung (2001) have a scope far more comprehensive than ours The proofsthat we omit here and in the following chapters can be recovered from any one of thesebooks A more recent reference, which the present author finds most commendable,
is Fristedt and Gray (1997)
Trang 21 Event Spaces
The most fundamental notion of probability theory is that of a probability measure.Roughly speaking, a probability measure tells us the likelihood of observing anyconceivable event in an experiment the outcome of which is uncertain To formallyintroduce this concept, however, we need to model the elusive term “conceivableevent” in this description — hence the next subsection
1.1 σ-Algebras
Dhilqlwlrq Given any nonempty set X, let A and Σ be nonempty subsets of 2X.The class A is called an algebra on X if
(i) X\A ∈ A for all A ∈ A; and
(ii) A ∪ B ∈ A for all A, B ∈ A
The collection Σ is called a σ-algebra on X if it satisfies (i) and
(iii) ∞Ai ∈ Σ whenever Ai ∈ Σ for each i = 1, 2,
Any element of Σ is called a Σ-measurable set in X If Σ is a σ-algebra on X, werefer to the pair (X, Σ) as a measurable space
In words, an algebra on X is a nonempty collection of subsets of X that is closedunder complementation and taking pairwise (and thus finite) unions It is readilyverified that both ∅ and X belong to any algebra A on X, and that an algebra isclosed under taking pairwise (and thus finite) intersections (To prove the first claim,observe that, since A is nonempty, there exists an A ⊆ X in A, and hence X\Abelongs to X Thus X = A ∪ (X\A) ∈ A.) Moreover, a collection Σ of subsets of X
is a σ-algebra, if it is an algebra and is closed under taking countable unions By the
de Morgan Law, this also implies that Σ is closed under taking countable (finite orinfinite) intersections: If C is a nonempty countable subset of Σ, then C ∈ Σ It isuseful to note that there is no difference between an algebra and a σ-algebra whenthe ground set X under consideration is finite.1
Before considering some examples, let us provide a quick interpretation of theformal model at hand Given a nonempty set X and a σ-algebra Σ on X, we think
of X as the set of all possible outcomes that may result in an experiment, the called sample space, and view any member of Σ (and only such a subset of X)
so-as an “event” that may take place in the experiment To illustrate, consider theexperiment of rolling an ordinary die once It is natural to take X := {1, , 6} asthe sample space of this experiment But what is an “event” here? The answerdepends on the actual scenario that one wishes to model If it is possible to discern
1 These are relatively easy claims, but it is probably a good idea to warm up by proving them.
In particular, how do you know that a σ-algebra is actually an algebra?
Trang 3the differences between all subsets of X, then we would take 2X as the σ-algebra ofthe model, thereby deeming any subset of X as a conceivable event (e.g {1, 2, 3}would be the event that “a number strictly less than 4 comes up”) On the otherhand, the situation we wish to model may call for a different type of an event space.For example, if we want to model the beliefs of a person who will be told after theexperiment only whether or not 1 has come up, {1, 2, 3} would not really be deemed
as a conceivable event (If the outcome is 2, one would like to say that {1, 2, 3}has occurred, but given her informational limitation, our individual has no way ofconcluding this.) Indeed, this person may have an assessment only of the likelihood
of 1 coming up in the experiment, so a nontrivial “event” for her is either “1 comesup” or “1 doesn’t come up.” Consequently, to model the beliefs of this individual, itmakes more sense to choose a σ-algebra like {∅, X, {1}, {2, , 6}} An “event” in thismodel would then be one of the four members of this particular collection of sets
In practice, then, there is some latitude in choosing a particular class Σ of events toendow a sample space X with However, we cannot do this in a completely arbitraryway If A is an event, then we need to be able to talk about this event not occurring,that is, to deem the set X\A also as an event This is guaranteed by condition(i) above Similarly, we wish to be able to talk about at least one of countablymany events occurring, and this is the rationale behind condition (iii) above Inaddition, conditions (i) and (iii) force us to view “countably many events occurringsimultaneously” as an event as well To give an example, consider the experiment
of rolling an ordinary die arbitrarily many times Clearly, we would take X = N∞
as the sample space of this experiment Suppose next that we would like to beable talk about the situation that in the ith roll of the die, number 2 comes up.Then we would choose a σ-algebra that would certainly contain all sets of the form
Ai := {(ωm) ∈ N∞ : ωi = 2} This σ-algebra must contain many other types ofsubsets of X For instance, the situation that “in neither the first nor the second roll
2 turns up” must formally be an event, because {(ωm) ∈ N∞ : ω1, ω2 = 2} equals(X\A1)∩ (X\A2) Similarly, since each Ai is deemed as an “event,” a σ-algebramaintains that ∞Ai (“2 comes up at least once through the rolls”) and ∞Ai
(“each roll results in 2 coming up”) are considered as “events” in our model
In short, given a σ-algebra Σ on X, the intuitive concept of an “event” is formalized
as any Σ-measurable set That is, and mark this, we say that A is an event if andonly if A ∈ Σ, and for this reason a σ-algebra on X is often referred to as an eventspace on X One may define many different event spaces on a given sample space,
so what an “event” really is depends on the model one chooses to work with
E{dpsoh 1.[1]2X
and {∅, X} are σ-algebras on any nonempty set X The collection
2X corresponds to the finest event space allowing each subset of X to be deemed as
an “event.”2
By contrast, {∅, X} is the coarsest possible event space that allows one
to perceive of only two types of events, “nothing happens” and “something happens.”
2 I have already told you that certain subsets of X may not be deemed as “events” for an observer
Trang 4[2]Let X := {a, b, c, d} None of the collections {∅}, {X}, {∅, X, {a}} and {∅, X, {a},{b, c, d}, {b}, {a, c, d}} qualify as an algebra on X On the other hand, each of the col-lections {∅, X}, {∅, X, {a}, {b, c, d}} and {{∅, X, {a}, {b, c, d}, {b}, {a, c, d}, {a, b}, {c, d}}
is an algebra on X
[3] If X is finite and A is an algebra on X, then A is a σ-algebra So, as notedearlier, the distinction between the notions of an algebra and a σ-algebra disappear
in the case of finite sample spaces
[4] Let us agree to call an interval right-semiclosed if it has the form (a, b]with −∞ ≤ a ≤ b < ∞, or of the form (a, ∞) with −∞ ≤ a The class of allright semiclosed intervals is obviously not an algebra on R But the set A of allfinite unions of right-semiclosed intervals — called the algebra induced by right-semiclosed intervals — is an algebra on R In fact, A is the smallest algebra thatcontains all right-semiclosed intervals It is not a σ-algebra (Proofs?)
[5] A := {S ⊆ N : min{|S| , |N\S|} < ∞} is an algebra on N but it is not a
σ-algebra Indeed, {i} ∈ A for each odd i ∈ N, but {1, 3, } /∈ A
Exercise 1 Let X be a metric space, and let A1 be the class that consists of all open subsets of X, A2 the class of all closed subsets of X, and A3 := A1 ∪ A2
Determine if any of these classes is an algebra or aσ-algebra.
Exercise 2 Let X be any nonempty set, andΩa class of σ-algebras onX
(a) Show that Ωis aσ-algebra onX
(b) Give an example to show that Ωneed not be an algebra even ifΩ is finite Exercise 3 Define
A := A ⊆ N : (1n|A ∩ {1, , n}|) is convergent
(Note For anyA ∈ A, the number limn1 |A ∩ {1, , n}|is called the asymptotic density of A.)True or false: A is an algebra but not aσ-algebra.
∗ Exercise 4 Show that aσ-algebra cannot be countably infinite.
In practice it is not uncommon that we have a pretty good idea about the kinds
of sets we wish to consider as events, but we have difficulty in terms of finding a
“good” σ-algebra for the problem because the collection of sets we have at handdoes not constitute a σ-algebra The resolution is usually to extend the collection
with limited information, so 2 X may not always be the relevant event space to endow X with (I will talk about this issue at greater length when studying the notion of conditional probability in Chapter F.) Apart from this, there are also technical reasons for why one cannot always view 2X as
a useful event space Roughly speaking, when X is an infinite set, 2X may be “too large” of a set for one to be able to assign probability numbers to each element of 2 X in a nontrivial way (More
on this in Section 3.5.)
Trang 5of sets which we are interested in to a σ-algebra in a minimal way (We consider aminimal extension because we wish to depart from our “interesting” sets as little aspossible Otherwise taking 2X as the event space would trivially solve the problem ofextension.) This idea leads us to the following fundamental concept.
Dhilqlwlrq Let X be a nonempty set and A a nonempty subclass of 2X Thesmallest σ-algebra on X that contains A (in the sense that this σ-algebra is included
in any other σ-algebra that contains A) is called the σ-algebra generated by A,and is denoted as σ(A)
For example, if X := {a, b, c}, then σ({∅}) = σ({X}) = {∅, X}, σ({∅, X, {a}}) ={∅, X, {a}, {b, c}}, and σ({∅, X, {a}, {b}}) = 2X Of course, we have Σ = σ(Σ) forany σ-algebra Σ on any nonempty set
Does any nonempty class of sets generate a σ-algebra? The answer does not followreadily from the definition above, because it is not self-evident if we can alwaysfind a smallest σ-algebra that extends any given nonempty class of sets Our firstproposition, however, shows that we can actually do this, so there is really no existenceproblem regarding generated σ-algebras.3
Pursrvlwlrq 1 Let X be a nonempty set and A a nonempty subclass of 2X Thereexists a unique smallest σ-algebra that includes A, so σ(A) is well-defined We have
σ(A) = {Σ : Σ is a σ-algebra and A ⊆ Σ}
Exercise 5.H Prove Proposition 1.
Exercise 6 Does the σ-algebra generated by the algebra of Example 1.[4] include all open sets inR?
Exercise 7.H Compute σ(A), whereA := {S ⊆ R : min{|S| , |R\S|} < ∞}.
1.2 Borel σ-algebras
Let X be any metric space, and let OX stand for the set of all open sets in X Themembers of OX are of obvious importance, but unfortunately OX need not even be
an algebra In metric spaces, then, it is natural to consider the σ-algebra generated
by OX This σ-algebra is called the Borel σ-algebra on X, and its members arereferred to as Borel sets (or in probabilistic jargon, Borel events) Throughoutthis text, we denote the Borel σ-algebra on a metric space X by B(X) By definition,therefore, we have B(X) = σ(OX)
3 As you will soon painfully find out, however, the explicit characterization of a generated algebra can be a seriously elusive problem Just to get a feeling for the difficulties that one may encounter in this regard, try to “compute” the σ-algebra σ ({{a} : a ∈ Q}) on R.
Trang 6σ-Notation We write B[a, b] for B([a, b]), and B(a, b] for B((a, b]), where −∞ < a <
b <∞
E{dpsoh 2 By definition, B(R) = σ(OR), but one does not actually need all opensets in R for generating B(R) For instance, what if we used instead the class of allopen intervals, call it A1, as a primitive collection and attempt to find σ(A1)? Thiswould lead us exactly to the σ-algebra σ(OR)! To see this, observe first that σ(OR)
is obviously a σ-algebra that contains A1 so that we clearly have σ(A1) ⊆ σ(OR).(Recall the definition of σ(A1)!) To establish the converse containment, rememberthat every open set in R can be written as the union of countably many open intervals.(Right?) Thus, we have OR⊆ σ(A1) (Why exactly?) But then, since σ(OR) is thesmallest σ-algebra that contains OR, and σ(A1) is of course a σ-algebra, we musthave σ(OR)⊆ σ(A1) So, we conclude: σ(OR) = σ(A1)
In fact, there are all sorts of other ways of generating the Borel σ-algebra on R.For instance, consider the following classes:
A2 := the set of all closed intervals
A3 := the set of all closed sets in R
A4 := the set of all intervals of the form (a, b]
A5 := the set of all intervals of the form (−∞, a]
A6 := the set of all intervals of the form (−∞, a)
It is easy to show that all of these collections generate the same σ-algebra:
B(R) := σ(OR) = σ(A1) =· · · = σ(A6) (1)
We have already showed that σ(OR) = σ(A1) On the other hand, for any closedinterval [a, b], we have [a, b] = ∞ a− 1i, b + 1i ∈ σ(OR), so we have A2 ⊆ σ(OR)
so that σ(A2) ⊆ σ(OR) Conversely, for any open interval (a, b), we have (a, b) =
∞ a + 1i, b− 1i ∈ σ(A2) So A1 ⊆ σ(A2),and it follows that σ(A1)⊆ σ(A2) The
This example shows that different collections of sets might well generate the sameσ-algebra In fact, it is generally true that the Borel σ-algebra on a metric space isalso generated by the class of all closed subsets of this space That is, for any metricspace X,
B(X) := σ({O ⊆ X : O is open}) = σ({S ⊆ X : S is closed})
(Verify!) The following exercises play on this theme a bit more
Exercise 8 Show that there is a countable subsetAof2Rsuch thatσ(A) = B(R)
Trang 7Exercise 9 For anyn∈ N,let
A1 := {XnJi : Ji is a bounded open interval, i = 1, , n} ,
A2 := {XnJi : Ji is a bounded right-closed interval, i = 1, , n} ,
A3 := {XnJi : Ji is a bounded closed interval, i = 1, , n}
Show that we haveB(Rn) = σ(A1) = σ(A2) = σ(A3)
Exercise 10 Prove: If X is a separable metric space, then B(X) = σ({Nε,X(x) :
we (you) have “computed” σ(A) by using the definition of the “generated σ-algebra”directly The following exercise provides another illustration of this
Exercise 12.H Let X be a metric space, andY a metric subspace of X Prove that
B(Y ) = {B ∩ Y : B ∈ B(X)}
The observation noted in the previous exercise is quite useful For instance, itimplies that the knowledge of B(R) is sufficient to describe the class of all Borelsubsets of [0, 1]; we have B[0, 1] = {B ∩ [0, 1] : B ∈ B(R)} Similarly, B(Rn
+) ={B ∩ Rn
+ : B ∈ B(Rn)} We conclude with a less immediate corollary
Exercise 13.H For any S ∈ B[0, 1] andα ∈ R,show that(S + α)∩ [0, 1] ∈ B[0, 1]
We are now ready to introduce the concept of probability measure.4
4 The origins of probability theory goes back to the famous exchange between Blaise Pascal and Pierre Fermat that started in 1654 While Pascal and Fermat were mostly concerned with gambling
Trang 8Dhilqlwlrq Let (X, Σ) be a measurable space A function p : Σ → R is said to beσ-additive if
for any (Am) ∈ Σ∞ with Ai ∩ Aj = ∅ for each i = j Any σ-additive function
p : Σ→ R+ with p(∅) = 0 is called a measure on Σ (or on X if Σ is clear from thecontext), and we refer to the list (X, Σ, p) as a measure space If p(X) < ∞, then p
is called a finite measure, and the list (X, Σ, p) is referred to as a finite measurespace In particular, if p(X) = 1 holds, then p is said to be a probability measure,and in this case, (X, Σ, p) is called a probability space
Dhilqlwlrq Given a metric space X, any measure p on B(X) is called a Borelmeasureon X, and in this case (X, B(X), p) is referred to as a Borel space If, inaddition, p is a probability measure, then (X, B(X), p) is called a Borel probabilityspace
Notation Throughout this text, the set of all Borel probability measures on ametric space X is denoted as P(X)
We think of a probability measure p as a function that assigns to each event (that
is, to each member of the σ-algebra that p is defined on) a number between 0 and
1 This number corresponds to the likelihood of the occurrence of that event Themap p is σ-additive in the sense that it is additive with respect to countably manypairwise disjoint events This additivity property, which is the heart and soul ofmeasure theory, entails several other useful properties for probability measures Forinstance, it implies that any probability measure is finitely additive, that is,
σ-type problems, the importance and applicability of the general topic was shortly understood, and the subject was developed by many mathematicians, including Jakob Bernoulli, Abraham de Moivre, and Pierre Laplace Despite the host of work that took place in the 18th and 19th centuries, however, a universally agreed definition of “probability” did not appear until 1933 At this date Andrei Kolmogorov introduced the (axiomatic) definition that we are about to present, and set the theory on rigorous grounds, much the same way Euclid has given an axiomatic basis for planar geometry.
Trang 9Exercise 14 Let (X, Σ, p) be a probability space, m ∈ N, and let A, B, Ai ∈ Σ,
i = 1, , m.Prove:
(a) IfA⊆ B, thenp(A)≤ p(B),
(b) p(X\A) = 1 − p(A),
(c)p( mAi)≤ mp(Ai),
(d ) (Bonferroni’s Inequality)p( mAi)≥ mp(Ai)− (m − 1)
Warning One is often tempted to conclude from Exercise 14.(a) that any subset
of an event of probability zero occurs with probability zero There is a catch here.How do you know that this subset is assigned a probability at all? For instance, let
X :={a, b, c}, Σ := {∅, X, {a, b}, {c}} and let p be the probability measure on Σ thatsatisfies p({c}) = 1 Here, while p({a, b}) = 0, it is not true that p({a}) = 0 since p isnot even defined at {a} This probability space maintains that {a} is not an event.Note Those probability spaces for which any subset of an event of probability zero
is an event (and hence occurs with probability zero) are called complete With theexception of a few (optional) remarks, however, this notion will not play an importantrole in the present exposition
Exercise 15 (The Exclusion-Inclusion Formula) Let (X, Σ, p) be a probability space, m ∈ N, and A1, , Am ∈ Σ Where Nt := {(i1, , it) ∈ {1, , m}t :
The following are simple but surprisingly useful observations
Pursrvlwlrq 2 Let (X, Σ, p) be a probability space, and let (Am) ∈ Σ∞ If A1 ⊆
A2 ⊆ · · · (in which case we say that (Am) is an increasing sequence), then
Proof Let (Am) ∈ Σ∞ be an increasing sequence Set B1 := A1 and Bi :=
Ai\Ai−1, i = 2, , and note that Bi ∈ Σ for each i and ∞Ai = ∞Bi But
Trang 10Bi∩ Bj =∅ for any i = j, so, by σ-additivity,
As an immediate application of this result and Exercise 14.(c), we obtain a basicinequality of probability theory:
Boole’s Inequality For any probability space (X, Σ, p),
i=1
Ai ≤ ∞
i=1
p(Ai) for any (Am)∈ Σ∞
Exercise 16.H Let (X, Σ, p) be a probability space Show that if (Am) ∈ Σ∞
satisfiesp(Ai∩ Aj) = 0for everyi = j, then
for all (Am), (Bm)∈ Σ∞ withBm ⊆ Am, m = 1, 2,
Exercise 18.H Let (X,B(X), p) be a Borel probability space, and let OX and CX
denote the class of all open and closed subsets ofX, respectively.
(a) Prove that
sup{p(T ) ∈ CX : T ⊆ S} = p(S) = inf{p(O) ∈ OX : S ⊆ O}
for any S ∈ B(X)
(b) Show that, ifX isσ-compact, that is, it can be written as a union of countably many compact subsets of itself, then
p(S) = sup{p(K) : Kis a compact subset of Xwith K ⊆ S}
for any S ∈ B(X).(Note Such a Borel probability measure is said to be regular.)
Trang 11The observations noted in Proposition 2 are often referred to as the continuity(from below and above, respectively) properties of a probability measure.5 As theproof of this result makes transparent, these properties are derived directly from theσ-additivity of a probability measure Indeed, any finite measure on Σ satisfies theseproperties.
Warning The first claim of Proposition 2 is valid even when p is an infinite measure.(The proof goes through verbatim.) In this case, the validity of the second claim,however, requires the additional assumption that p(Ak) < ∞ for some k To seethe need for this additional hypothesis, consider the measure space (N, 2N, q), whereq(S) :=|S| (q is called the counting measure.) Here, if Am :={m, m + 1, } foreach m ∈ N, then ∞Ai =∅ and yet q(Am) = ∞ for each m
It is useful to observe that a partial converse of this observation is also true Tomake this precise, let us agree to refer to a function defined on an algebra A as σ-additive on A if it is additive with respect to any countably many pairwise disjointevents the union of which belongs to A (Notice that this definition conforms withthe way we used the term “σ-additive” for a measure so far.) It turns out that finiteadditivity and continuity of a set function imply its σ-additivity
Pursrvlwlrq 3 Let A be an algebra (on some nonempty set), and q : A → R+
a finitely additive function such that, for any decreasing sequence (Cm) ∈ A∞ with
∞Ci =∅, we have lim q(Cm) = 0 Then, q is σ-additive on A
Proof Take any class {Am ∈ A : m = 1, 2, } such that Ai∩Aj =∅ for each i = j,and A := ∞Ai ∈ A Let Bm := mAi, and observe that q(A) = q(A\Bm) + q(Bm)for each m, by finite additivity of q But (A\Bm)is a decreasing sequence in A with
∞A\Bi = ∅ so that, by hypothesis, lim q(A\Bm) = 0 Therefore, letting m → ∞
in the equation q(A) = q(A\Bm) + q(Bm) and using the finite additivity of q again,
In words, a finitely additive set function which is continuous from above at theempty set is σ-additive As you can now show easily, an analogous result can also beproved in terms of continuity from below
5 It is customary to write Am ∞Aiif (Am) is an increasing sequence of sets, and Am ∞Ai
if it is a decreasing one Thus Proposition 2 states that Am ∞Ai implies p(Am) p( ∞Ai), and similarly for decreasing sequences of events This is the motivation behind the term “continuity
of a probability measure.”
Trang 12Exercise 19 LetA := {S ⊆ N : min{|S| , |N\S|} < ∞}, and define p :A → [0, 1]
as
p(S) := 1, |N\S| < ∞
0, |S| < ∞ .
Show thatp is finitely additive, but notσ-additive.
Let us conclude this section with a brief summary At this point, you should
be somewhat comfortable with the notion of probability space (X, Σ, p) In such aspace, X stands for the sample space of the experiment being modeled, the set of alloutcomes, so to speak The σ-algebra Σ, on the other hand, tells us which subsets
of X can be discerned in the experiment, that is, of which subsets of X we can talkabout the likelihood of occurring or not occurring (But recall that things are notcompletely arbitrary; by definition of a σ-algebra, there is quite a bit of consistency inwhat is and is not deemed as an event.) Finally, the probability measure p quantifiesthe likelihood of the members of Σ Things hang tight together by the property ofσ-additivity; the likelihood of the union of a countably many pairwise disjoint events
is simply the sum of the individual probabilities of each of these events
And now, it’s time to make things a bit more concrete
3.1 Motivating Examples
Our first example provides the formal description of the canonical probability spacewhose sample space is finite This space corresponds to a special case of the moregeneral formulation of what is said to be a simple probability space
E{dpsoh 3 Let X be any metric space The support of any function f ∈ RX isdefined as the set
Trang 13Ps(X), and say that (X, Σ, pf)is a simple probability space It is easily seen thatany probability space (X, 2X, p) with |X| < ∞ is a simple probability space (For,
f ∈ [0, 1]X
defined by f (ω) := p({ω}) is a simple density function, and p = pf.6)Since singleton sets are closed in any metric space, (X, 2X, p)is a Borel probabilityspace, provided that X is a finite set More generally, we define the support of anyBorel probability measure p ∈ P(X), denoted supp(p), as the smallest closed set Ssuch that p(S) = 1 Given this definition, a Borel probability measure is simple iff
it has finite support (Note Such measures are referred to as simple lotteries in
Exercise 20 Let X be a nonempty countable set and p : 2X → R.Prove:
(a)(X, 2X, p)is a probability space iff there exists anf ∈ [0, 1]X such thatp(S) =
ω∈Sf (ω)
(b) If(X, 2X, p)is a probability space such that there exist anε > 0andf ∈ [ε, 1]X
withp(S) = ω∈Sf (ω) for each S ∈ 2X, then|X| < ∞
Exercise 21.HLetXbe a countably infinite set, and take anyf ∈ B(X)withf ≥ 0
Show that there exists a g ∈ RX+ such that (X, 2X, p)is a probability space where
p : 2X
→ R is given byp(S) := ω∈Sg(ω)f (ω)
The following example is very important It shows that constructing non-simpleprobability measures on an infinite sample space is in general not a trivial matter
We will revisit this example several times throughout the sequel
E{dpsoh 4 Consider the experiment of tossing successively k many (fair) coins.Denoting ‘heads’ by 1 and ‘tails’ by 0 (for convenience), the sample space of thisexperiment can be written as X := {0, 1}k.Given that X is finite, there is no problemwith taking 2X as the relevant event space After all, thanks to the finiteness of X,there is a natural way of assigning probabilities to events by using the notion ofrelative frequency Hence we define p(S) := |S|2k for any event S ∈ 2X Of course,
p is none other than the simple probability measure induced by the simple densityfunction f (ω) := 1
|X| for each ω ∈ X If we drop the finiteness assumption, however,things get slightly icy, and it is actually at this point that the use of the formalism
of the general probability model kicks in
Consider the experiment of tossing a (fair) coin infinitely many times The samplespace of this experiment is the sequence space {0, 1}∞ How do we define events andprobabilities here? The problem is that infinite cardinality of our sample space makes
it impossible to use the idea of relative frequency to assign probabilities to all theevents that we are interested in Yet, intuitively, we still want to use the relativefrequency interpretation of probability here For instance, we really want to be able
6 That is to say, if X is finite, specifying p on singleton events defines p on the entire 2 X : p(S) =
ω ∈S p({ω}) for any S ⊆ X.
Trang 14to say that the probability of observing infinitely many tails is 1 Or, what is a bitmore problematic, we want to be able to say that after sufficiently many tosses, theprobability that the relative frequency of heads tends to 12 is large (because the coin
is fair)
So, how should we define our event space? Here is the idea Let us first deal withthe “easy” events For example, consider the set
{(ωm)∈ {0, 1}∞: ω1 = a1, , ωk= ak},where k ∈ N and a1, , ak ∈ {0, 1} This set is said to be a cylinder set, for it iscompletely determined by a finite number of its initial elements This property makesour relative frequency intuition operational Clearly, we wish to assign probability1/2k to this event More generally, we want to consider as an event any cylinderset, that is, any set of the form {(ωm)∈ {0, 1}∞ : (ω1, , ωk)∈ S} where k ∈ N and
S ⊆ {0, 1}k Since this is the event that “the outcome of the first k tosses belongs
to S,” it is natural to assign to it the probability |S|2k What next? Well, it turns outthat this is all we need to do So far we know that we wish to include the set of allcylinder sets
A := ∞
k=1 {(ωm) : (ω1, , ωk)∈ S} : S ⊆ {0, 1}k
in our event space (Check that this class is an algebra on {0, 1}∞.)So why don’t weconsider A as the nucleus of our event space, and take the σ-algebra that it generates,σ(A), as the event space for the problem? After all, not only is σ(A) is a σ-algebrathat differs from A in a minimal way, it also contains all sorts of interesting events thatare not contained in A For instance, in contrast to the collection A, σ(A) maintainsthat “all tosses after the fifth toss come up heads,” is an event, for {(ωm) : ωk = 1}
is a cylinder set for each k, and hence
{(ωm) : ωk= 1 for all k ≥ 6} = ∞
k=6{(ωm) : ωk= 1} ∈ σ(A) (2)Similarly, the situation “infinitely many heads come up throughout the experiment”
is captured by σ(A) (but not by A) For,
{(ωm) : ωk= 1 for infinitely many k} = ∞
k=1
∞
i=k{(ωm) : ωi = 1} ∈ σ(A) (3)(This is not entirely obvious; make sure you verify both of the claims made in (3).)
So, you see, there are many non-cylinder events that we can deduce from cylinder sets
by taking unions, intersections and complements, and by taking unions, intersectionsand complements of the resulting sets, and so on The end point of this process, theexplicit description of which cannot really be given, is none other than σ(A).7
7 Quiz True or false: {(ω m ) : limk1 kω i =12} ∈ σ(A).
Trang 15All this is good, σ(A) certainly looks like a good event space to endow our samplespace {0, 1}∞ with But it is worrisome that we only know what probabilities toassign to the members of A so far What do we do about the members of σ(A)\A?(And you are quite right if you suspect that there are very many such sets.) The goodnews is that we don’t have to do anything about them, because the probabilities of
3.2 Carathéodory’s Extension Theorem
The stage is now set for the following fundamental theorem of measure theory, which
we state here without proof
Cdudwkìrgru|’v E{whqvlrq Tkhruhp Let A be an algebra on a nonempty set
X and q : A → R+ If q is σ-additive on A, then there exists a measure p on σ(A)such that p(A) = q(A) for each A ∈ A Moreover, if q(X) < ∞, then p is unique.This is a powerful theorem that allows us to construct a probability measure(uniquely) on a σ-algebra by specifying the behavior of the measure only on thealgebra that generates this σ-algebra Since algebras are often much easier than σ-algebras to work with, Carathéodory’s Extension Theorem turns out to be extremelyuseful in constructing probability measures For instance, we may apply this theorem
to Example 4 where, as discussed above, we know how to assign probabilities to thecylinder subsets of {0, 1}∞
Mruh rq E{dpsoh 4 Consider the framework of Exercise 4, and define q ∈ [0, 1]A
by q({0, 1}∞) := 1 and
q({(ωm) : (ω1, , ωk)∈ S}) := |S|2k
for each k ∈ N and S ⊆ {0, 1}k (Is q well-defined?) Now q is easily checked to befinitely additive Moreover, as we show next, q is continuous from above at the emptyset, so we have the following fact:
Claim q is a σ-additive function on A
Proof of Claim Take any decreasing sequence (Am) of cylinder sets in {0, 1}∞
with ∞Ai =∅ Since {0, 1}∞ is compact (why?), and each Ai is closed in {0, 1}∞,
∞Ai = ∅ implies that the class {A1, A2, } cannot have the Finite Intersectionproperty Thus MAi = ∅ for some M ∈ N It follows that, for any m ≥ M, wehave Am ⊆ AM = MAi =∅ so that lim q(Am) = q(∅) = 0 Applying Proposition 3completes the argument
The stage is now set for Carathéodory’s Extension Theorem Applying this orem, we actually find a unique probability measure p on σ(A) which agrees with q
Trang 16the-on each cylinder set In turn, this solves nicely the problem of finding the “right”probability space for the experiment of Example 4.8
But, how does p attach probabilities to every event in σ(A)? Unfortunately, acomplete answer would require us to go through the proof of Carathéodory’s Ex-tension Theorem in this particular context, and we wish to avoid this at this stage.However, it is not difficult to find the probability of at least some non-cylinder events
in our experiment For instance, let us compute the probability of the event that “alltosses after the fifth toss come up heads” (recall (2)) By using Proposition 2, this isdone easily Defining the cylinder sets Ak := {(ωm) : ω6 = · · · = ωk = 1} for each
k ≥ 6, and using that proposition, we get
a very agreeable finding We can use a similar technique to compute the probability
of many other interesting events For instance, in the case of the event given in (3),
Exercise 22 Consider the probability space({0, 1}∞, σ(A), p)we have constructed above for the experiment of tossing infinitely many fair coins Show that the state- ments “at least one head comes up after the tenth toss,” “only heads come up after finitely many tosses,” and “a tail comes up at every even toss,” are formally captured
as events in the model at hand Compute the probability of each of these events Exercise 23 Let(X, Σ, q)be a probability space, andS a subset ofX withS /∈ Σ
(a) Show that
σ(Σ∪ {S}) = {(S ∩ A) ∪ ((X\S) ∩ B) : A, B ∈ Σ}
(b) By using Carathéodory’s Extension Theorem, show that there is a probability measure ponσ(Σ∪ {S})such thatp(A) = q(A)for eachA∈ Σ
(c) Do part (b) without using Carathéodory’s Extension Theorem.
8 This example attests to the usefulness of the notion of σ-algebra Suppose you instead designated
2 {0,1}∞ as the event space of this experiment How would you define the probability of an arbitrary set in {0, 1} ∞ ?
Quiz Show that 2 {0,1}∞ = σ(A) in the example at hand.
Trang 173.3 The Lebesgue-Stieltjes Probability Measure
We next move to another example in which we again use Carathéodory’s sion Theorem to construct the “right” probability space This example will play afundamental role in much of what follows
Exten-Let us first recall the notion of distribution function
Dhilqlwlrq A map F : R →[0, 1] is said to be a distribution function if it isincreasing, right-continuous and we have F (−∞) = 0 = 1 − F (∞).9
Exercise 24 Show that a distribution function can have at most countably many discontinuity points Also show that if a distribution function is continuous, then it must be uniformly continuous.
E{dpsoh 5 Let A be the algebra induced by the right-semiclosed intervals (Example
1.[4]) Let F be a distribution function Define the map q ∈ [0, 1]A as follows:
( ) If −∞ ≤ a ≤ b < ∞, then q((a, b]) := F (b) − F (a);
( ) If −∞ ≤ a, then q((a, ∞)) := 1 − F (a); and
( ) If A1, , Am are finitely many disjoint intervals in A, then
We are not done though, we still have to establish the σ-additivity of q Thestrategy of attack is identical to that in the case of Example 4 Since q is obviouslyfinitely additive, it is enough to establish the continuity of q from below at the empty
9 Notation F (−∞) := lim t →−∞ F (t) and F (∞) := lim t →∞ F (t) Moreover, for any real number
a, F (a−) denotes the left-limit of F at a, that is, F (a−) := lim m →∞ F (a− m1) The expression F (a+)
is understood similarly.
10 How do we know that q is well-defined? Since a right-semiclosed interval can be written as a finite union of other right-semiclosed intervals, we have at the moment two different ways of computing the probability of such intervals, which may be, in principle, distinct from each other But things work out fine here For instance, it is immediately verified that q((−∞, b]) = q((−∞, b − 1] ∪ (b − 1, b]) for any b ∈ R One may easily generalize this example to verify that q is well-defined.
Trang 18set (Proposition 3) To this end, take a decreasing sequence (Am) in A such that
∞Ai = ∅ Take an arbitrary ε > 0, and fix some index i ∈ N Since Ai equals theunion of finitely many right-closed intervals, and F is right-continuous, one can showthat we can find a bounded set Bi in A such that clR(Bi)⊆ Aiand q(Ai)−q(Bi) < 2εi.(Proof Exercise.) The boundedness of Bi implies that clR(Bi)is closed in R But R is
a compact metric space (yes?), and ∞clR(Bi) =∅ It follows that MclR(Bi) =∅ forsome sufficiently large positive integer M (Why?) Consequently, for each m ≥ M,
we have mBi =∅, and therefore,
since A1 ⊇ · · · ⊇ Am We are almost done All we need to observe now is that
q ( m(Ai\Bi))≤ mq(Ai\Bi).11 From this, it follows that
Conclusion: For any ε > 0, there exists an M ∈ N such that q(Am) < ε for all
A major upshot of Example 5 is the following: One can always define a probabilitymeasure on the reals by means of a distribution function Interestingly, the converse
of this is also true That is to say, any Borel probability measure p on R arises thisway Indeed, for any such p, the map t → p((−∞, t]) on R is a distribution function.This means that, on R, a Borel probability measure can actually be identified with adistribution function
Exercise 25.H Show that, for anyp ∈ P(R), the map Fp : x → p((−∞, t]) is a distribution function.Moreover, prove that
p({t}) = Fp(t)− Fp(t−) for anyt ∈ R,
soFp continuous att iffp({t}) = 0
11 Where does this come from? Well, from Exercise 14.(c), or better, from
Boole’s Inequality for Finitely Additive Set Functions: If C is an algebra and r : C → [0, 1] finitely additive, then r ( mC i ) ≤ mr(C i ) for any m ∈ N and C 1 , , C m ∈ C.
Proof is easy For m = 2, let D = C 1 ∩ C 2 and use finite additivity to get
Trang 19We have constructed in Example 5 the Lebesgue-Stieltjes measure induced by adistribution function F on the entire R The analogous construction works for anyinterval in R For instance, if X := (a, b] with −∞ < a < b < ∞, and F ∈ [0, 1]X is
an increasing and right-continuous function with F (a+) = 0 and F (b) = 1, then wecan define the Lebesgue-Stieltjes probability measure induced by F on (a, b] byusing precisely the approach developed in Example 5 It is not difficult to show thatthis measure is the restriction of the Lebesgue-Stieltjes probability measure induced
by the distribution function G on R to B(a, b], where G is the (unique) distributionfunction with G|X = F
We should also note that F (b) = 1 amounts only to a normalization here Indeed,the argument outlined in Example 5 works for any right-continuous and increasing
F ∈ R(a,b] such that F (b) > F (a) (The only modification needed in the argumentgiven in Example 5 is that we now consider only the right-closed intervals that are
in (a, b] when defining q via F and set q(X) = F (b) − F (a).) Of course, the resulting(unique) measure p — now called the Lebesgue-Stieltjes measure induced by F on(a, b] — is not a probability measure (unless F (b) − F (a) = 1) This measure ratherassesses the “measure” of the space X as F (b) − F (a)
Even though we focus on probability measures throughout this text, we need toconsider at least one infinite measure, which is, geometrically speaking, the naturalmeasure of the real line To introduce this measure, take any i ∈ Z, let Xi := (i, i+1],and define Fi : Xi → [0, 1] by Fi(t) := t− i Let i denote the Lebesgue-Stieltjesmeasure induced by Fi on Xi for each i ∈ Z We define the Lebesgue measure onB(R) by
Exercise 26.H Prove that(R, B(R), )is a measure space such that
((a, b]) = b− a, −∞ ≤ a < b < ∞
The restriction of to any Borel subset X of R is a Borel measure on X, that
is, (X, B(X), |B(X)) is a Borel measure space for any X ∈ B(R) For brevity, wedenote this measure space simply as (X, B(X), ) in what follows For instance,([0, 1],B[0, 1], ) is a Borel probability space
12 Quiz Why is well-defined?
Trang 20Let us establish a few elementary facts about the Lebesgue measure First of all,what is the Lebesgue measure of a singleton set? The answer is:
for any real number a (Why the second equality?) Consequently, any singleton set
in R has Lebesgue measure zero In fact, any countable set has this property, since
3.5 More on the Lebesgue Measure
The previous subsection contains just about all you need to know about the Lebesguemeasure to follow the subsequent development So, if you wish to get to the core ofprobability theory right away, you may proceed at this point directly to Section 5.The present subsection aims at completing the above discussion by going over a fewhighlights of the Lebesgue measure theory The presentation takes place mostly bymeans of exercises
Exercise 27.H (Translation Invariance of ) For any (S, α)∈ B(R) × R,show that
(S + α) = (S)
Exercise 28.H (a) (Non-atomicity of ) Show that, for anyA∈ B(R)with (A) > 0,
there exists aB ∈ B(R)withB ⊆ A and 0 < (B) < (A)
(b) Give an example of a measure on B(R) which does not possess either of the properties mentioned in part (a) and Exercise 27.
13 While various attempts of formulating (what we now call) the Lebesgue measure were made prior to the contributions of Emile Borel and Henri Lebesgue (in their respective doctoral theses
of 1854 and 1902), these attempts were not brought to their fruition precisely because they too assigned measure zero to countably infinite sets, an implication that was deemed “absurd” by the mathematical community of the day (Even the otherwise revolutionary Cantor was no exception to this.) In succession, Borel and Lebesgue set the theory on a completely rigorous foundation, and as the structure of countable sets were better understood in time, it was eventually accepted that an infinite set can be deemed “very small,” in fact “negligible,” from the measure-theoretic perspective (See Hawkins (1980), especially pp 172-180, for a beautiful survey on the origins of the theory of the Lebesgue measure and integral.)
The situation may at first seem somewhat reminiscent of Cantor’s countability theory viewing countably infinite sets “smaller” than uncountable sets, but this is misleading For, there are in fact uncountable sets in [a, b] which have Lebesgue measure zero While such sets are somewhat esoteric, and will not concern us here, you should make note of the fact that the “relative size” of a subset
of the real line from the “countability” and “measure” perspectives may well be radically different.
Trang 21It is worth noting that the probability space ([0, 1], B[0, 1], ) is not complete, that
is, there are -null sets A in B[0, 1] such that B /∈ B[0, 1] for some B ⊂ A (Here by
an -null set A, we mean any Borel subset A of [0, 1] with (A) = 0.) However, wecan “complete” this space in a straightforward manner Define
L[0, 1] := {S ∪ B : S ∈ B[0, 1] and B is a subset of an -null event}
(Any member of L[0, 1] is said to be a Lebesgue measurable set.) Now define
∗(S ∪ B) := (S) for any S ∈ B[0, 1] and any subset B of an -null event Then([0, 1],L[0, 1], ∗)is a complete probability space.14 This space — called the Lebesgueprobability space— extends ([0, 1], B[0, 1], ) in the sense that B[0, 1] ⊆ L[0, 1] and
∗|B[0,1] = Moreover, it is the smallest such extension in the sense that if ([0, 1], Σ, μ)
is any complete probability space with B[0, 1] ⊆ Σ and μ|B[0,1] = , then L[0, 1] ⊆ Σ.Curiously, L[0, 1] is much larger than B[0, 1].15 And yet, there are still sets in [0, 1]which do not belong to L[0, 1], that is, there are sets that are not Lebesgue measurable.The following exercise walks you through a proof of this fact
∗ Exercise 29.H For any setS in[0, 1] and anyα∈ [0, 1],let us agree to writeS⊕ α
for the set {t ∈ [0, 1] : t = s + α(mod 1) ands∈ S}.16
(a) Show thatS⊕ αis Lebesgue measurable ifSis Lebesgue measurable, and in this case (S⊕ α) = (S)
Now define the equivalence relation ≈ on [0, 1] by α ≈ β iff α− β ∈ Q Use the Axiom of Choice to select exactly one element from each of the induced equivalence classes, and denote the resulting collection by S Enumerate next the rationals in
[0, 1]as {r1, r2, },and defineSm := S⊕ rm for eachm
(b) Show that{S1, S2, } is a partition of[0, 1]
(c) Use parts (a) and (b) to conclude that we would have ([0, 1])∈ {0, ∞}ifS was Lebesgue measurable Thus,S cannot be Lebesgue measurable.17
(d ) Prove Vitali’s Theorem: There is no probability space([0, 1], 2[0,1], p)such that
p(S⊕ α) = p(S) for allS ⊆ [0, 1]
14 Quiz Prove!
15 The cardinality of L[0, 1] is strictly larger than that of B[0, 1] (The cardinality of B[0, 1] is the same as that of R.) This is clearly not the right place to prove these facts If you are interested, have a look at Hewitt and Stromberg (1965), pp 133-134.
16 For any a, b ∈ [0, 1], a + b (mod 1) equals a + b if a + b ≤ 1, and a + b − 1 otherwise.
17 More generally, every set of positive Lebesgue measure in [0, 1] (or in R) contains a Lebesgue nonmeasurable subset The present proof, which is due to Guiseppe Vitali, can easily be modified
to establish this stronger statement Thomas (1985) provides an alternative proof that derives from basic graph theory.
Note Lebesgue nonmeasurable sets cannot be found by the finite constructive method Loosely said, Solovay (1970) have shown that the existence of such a set in [0, 1] cannot be proved (within the axiomatic system of standard set theory) without invoking the Axiom of Choice (If you’re interested
in these sort of things, you may want to read the expository account of Briggs and Schaffter (1979).)
Trang 22Vitali’s Theorem shows that the use of the probability spaces that take as theevent space the power set of the sample space may sometimes be seriously limited.Insight: The σ-algebra technology is indispensable for the development of probabilitytheory.
In this short section we provide a proof of the uniqueness part of Carathéodory’sExtension Theorem As you will see later, the technique we will introduce for thispurpose is useful in a good number of other occasions as well
Let us agree to call a class S of subsets of a given nonempty set X a Sierpinskiclass(or for short, an S-class) on X, provided that
(i) if A, B ∈ S and A ⊆ B, then B\A ∈ S, and
(ii) if A1, A2, ∈ S and A1 ⊆ A2 ⊆ · · ·, then ∞Ai ∈ S
The smallest S-class on X that contains a given class of subsets of X, say A, is calledthe S-class generated by A, and is denoted by s(A) It is not difficult to verifythat such a set exists Indeed, we have
Exercise 30.H Let X be a nonempty set, and S an S-class on X Prove that if
X ∈ Sand S is closed under taking finite intersections, thenS must be aσ-algebra.
Here comes a major result that we shall later use again and again
σ-algebra From this it would follow that σ(A) ⊆ S0 ⊆ S, as we seek.19
18 This result is often referred to as Dynkin’s π-λ Theorem (where an S-class is instead called a λ-system) However, historically speaking, I think it is more suitable to use the terminology we adopt here, for even a stronger result is proved by Sierpinski (1928), albeit in a non-probabilistic context (I learned this from Bert Fristedt.)
19 So, my objective is to derive the statement
A ∩ B ∈ S 0 for all (A, B) ∈ S 0 × S 0 ,
Trang 23S1 :={A ⊆ X : A ∩ B ∈ S0 for all B ∈ A}
By hypothesis, we have A ⊆ S1.Moreover, S1 is an S-class on X Indeed, if A, C ∈ S1
A, B ∈ S0.By induction, it follows that S0 is closed under taking finite intersections,
What is the point of all this? Well, the idea is the following If we learnedsomehow that a property holds for all sets in a class A which contains the samplespace, and is closed under taking finite intersections, and if, in addition, we managed
to show that the class of all sets for which this property is true is an S-class, then
we may use the Sierpinski Class Lemma to conclude that all sets in the σ-algebragenerated by A actually belong to the latter class, and hence satisfy the property inquestion Since it is usually easier to work with S-classes rather than σ-algebras, thisobservation may, in turn, provide help when one needs to “go from a given set to theσ-algebra generated by that set.” To illustrate, consider the following claim:
Pursrvlwlrq 4 Let X and A be as in the Sierpinski Class Lemma If p and q aretwo finite measures on σ(A) such that p|A = q|A,then p = q
A moment’s reflection will show that this is even stronger than the uniqueness part
of Carathéodory’s Extension Theorem (for we do not require here A to be an algebra)
from the statement
A ∩ B ∈ S 0 for all (A, B) ∈ A × A, which is true by hypothesis Watch out for a very pretty trick! I will first prove the intermediate statement
A ∩ B ∈ S 0 for all (A, B) ∈ S 0 × A, and then deal the final blow by using this intermediate step.
Trang 24How does one prove something like this? Let’s use the idea outlined informally above.Define S := {S ⊆ X : p(S) = q(S)} Using Proposition 2, it is easy to verify that S
is an S-class on X But, by hypothesis, the property p(S) = q(S) holds for all S in
A By the Sierpinski Class Lemma, then, σ(A) ⊆ S, that is, p(S) = q(S) holds forall S ∈ σ(A), and we are done (Nice trick, no?)
Warning The uniqueness result reported in Proposition 4 is not valid for infinitemeasures in general However, if X can be written as a countable union of disjointsets Xi,and p(Xi) = q(Xi)for each i, then Proposition 4 applies even though p(X) =q(X) =∞
We conclude by noting that closedness of A under taking finite intersections iscrucial for Proposition 4 To see this, let X := {a, b, c, d} and A := {{a, b}, {b, c}} sothat σ(A) = 2X Now let p be the probability measure on 2X that assigns proba-bility 12 to the outcomes b and d, and let q be the probability measure that assignsprobability 12 to the outcomes a and c Clearly, p and q are probability measures on
2X
with p = q on A but p = q in general (Compare with Proposition 4.) What goeswrong here is that the Sierpinski Class Lemma does not work when A is not closedunder taking finite intersections Indeed, S = {{a, b}, {b, c}, {a, b, c}} is a superset
of A which is an S-class, and yet we have σ(A) = 2X
- S in this example Noticethat, the problem would disappear if we replaced A with A = {{a, b}, {b, c}, {b}, X}.Since A is closed under taking finite intersections, by the Sierpinski Class Lemma,any two probability measures on 2X
that agree on A must agree on σ(A ) = 2X.20
Exercise 31 IfX is a metric space andp, q ∈ P(X) withp(O) = q(O)for all open subsets O of X, then p = q Give two proofs of this, one that uses the uniqueness part of Carathéodory’s Extension Theorem, and another that uses the Sierpinski Class Lemma.
Exercise 32.H LetXbe a nonempty set A classM ⊆ 2X is said to be a monotone class on X if, for any (Am) ∈ M∞, A1 ⊆ A2 ⊆ · · · implies ∞Ai ∈ M and
A1 ⊇ A2 ⊇ · · · implies ∞Ai ∈ M.
(a) Show that a monotone class onX which is an algebra is aσ-algebra onX (b) Show that ifA ⊆ 2X is an algebra, the smallest class that contains A— denoted
asm(A) —must be an algebra on X.
(c) (Halmos) Prove the Monotone Class Lemma: IfA is an algebra onX,and if A∗
is a monotone class onX, thenA ⊆ A∗ impliesσ(A) ⊆ A∗
(d ) Prove the uniqueness part of Carathéodory’s Extension Theorem by using the Monotone Class Lemma.
20 For concreteness, here is a direct proof Let p and q be two such probability measures Then p and q agree on both {a, b} and {b} so that p({a}) = p({a, b}) − p({b}) = q({a, b}) − q({b}) = q({a}) One can similarly show that p({c}) = q({c}) Finally, these measures agree on {d} as well, because p({d}) = p(X) − t ∈{a,b,c} p({t}) = q(X) − t ∈{a,b,c} q({t}) = q({d}).
Trang 255 Random Variables
One is often interested in a particular characteristic of the outcome of a randomexperiment To deal with such situations we need to transform a given probabilityspace (that models the mother experiment) to another probability space the samplespace of which is a subset of R (or a more complex metric space) This transforma-tion is done by means of a random variable For instance, consider the experiment
of tossing (independently) two fair dice, and suppose that for some reason we areinterested in the sum of the faces of these dice We could model here the motherexperiment by means of the probability space (X, 2X, p) where X := {1, , 6}2 andp(S) := |S|36 for any S ∈ 2X On the other hand, this is not immediately useful, for
we are interested in the experiment only insofar as its implications for the sum of thefaces of the two dice are concerned To obtain the probability space that is tailoredfor our purposes here, we would use the map x : X → {2, , 12} which is defined
by x(i, j) := i + j (This map is an example of a random variable) Indeed, theprobability space we are after is none other than (Y, 2Y, q),where Y = {2, , 12} andq(S) := p({ω ∈ X : x(ω) ∈ S}) for each S ∈ 2Y.Of course, we could get to this spacedirectly by defining q(S) := a∈A361(6− |7 − a|) for each S ∈ 2Y,but as you will see,the previous method is far superior
None of this is really new to you, so let us move on to the formal development
5.1 Random Variables as Measurable Functions
Here is the formal definition of a random variable
Dhilqlwlrq Let (X, Σ) be a measurable space A mapping x : X → R such that
x−1(B) ∈ Σ for every Borel subset B of R is called a random variable on (X, Σ).More generally, if Y is a metric space, and x is a map from X into Y such that
x−1(B)∈ Σ for every B ∈ B(Y ), then x is called a Y -valued random variable on(X, Σ).21
Notation In this book the set of all random variables on a measurable space (X, Σ)
is denoted as RV(X, Σ) Moreover, we define
RV+(X, Σ) :={x ∈ RV(X, Σ) : x ≥ 0}
(This notation is not standard in the literature.)
A few remarks on terminology are in order
Rhpdun 1 [1] One often talks about a random variable “on (X, Σ, p),” but strictlyspeaking, this means that x is a random variable on (X, Σ) Indeed, the measure p
21 Thus, by convention, I call an R-valued random variable simply as a “random variable.”
Trang 26does not play any role in the definition of a random variable As it will be clearshortly, p is instead used to assign probabilities to events that are defined through arandom variable on (X, Σ).
[2] In real analysis what we refer here as a “random variable on (X, Σ)” is called
a Σ-measurable real function on X Furthermore, a random variable on a Borelprobability space is said to be Borel measurable While we mostly stick with theprobabilistic jargon in this book, you should nonetheless familiarize yourself withthis alternative terminology since it is widely used elsewhere (In fact, we too willocassionally use this terminology in later chapters as well.)
[3] Many authors refer to an Rn-valued random variable as a random n-vector
if n ≥ 2 Analogously, an R∞-valued random variable may be called a random realsequence, an B[0, 1]-valued random variable a random bounded map on [0, 1],and so on More generally, Resnick (1999) refers to a Y -valued random variable (forany metric space Y ) as a random element of Y None of these terminologies is
Let (X, Σ) be a measurable space and Y a metric space In principle, to verify that
a map x : X → Y is a Y -valued random variable, we need to show that x−1(B)∈ Σfor every B ∈ B(Y ) Fortunately, there is a redundancy in this definition Indeed, if,for any class A of Borel subsets of Y that generates B(Y ), we have
x−1(A)∈ Σ for every A ∈ A,then we may conclude that x is a Y -valued random variable For instance, if
x−1(O)∈ Σ for every open subset O of Y,or
x−1(S)∈ Σ for every closed subset S of Y,then x is a Y -valued random variable These observations, which are routinely invoked
in practice, are proved as follows
Exercise 33 Let(X, Σ)be a measurable space, Y a metric space, andA any class
of subsets of Y such that σ(A) = B(Y ).Prove:
(a) {A ⊆ R : x−1(A)∈ Σ}is aσ-algebra,
(b) x∈ RV(X, Σ) iff x−1(A)∈ Σfor allA ∈ A,
The following simple consequence of this exercise, and the fact that the set of allintervals of the form (−∞, a] generates B(R) (Example 2), is used so frequently that
it may be a good idea to single it out
Trang 27Observation 1 Given any measurable space (X, Σ), a map x : X → R is a randomvariable if, and only if,
{ω ∈ X : x(ω) ≤ a} ∈ Σ for any a ∈ R.22
We should now go through some examples But first, let us agree on the followingjargon
Dhilqlwlrq Given any metric space Y, a Y -valued random variable — in particular,
a random variable — is called simple if its range is a finite set, and discrete if itsrange is countable
E{dpsoh 6 [1] If X is a nonempty set and Y a metric space, then any functiondefined on X is a random variable on (X, 2X), that is, YX = RV(X, 2X) If X isfinite (countable), then any such function is a simple (discrete) random variable on(X, 2X)
[2] Let (X, Σ) be a measurable space For any event A ∈ Σ, recall that theindicator function of A, 1A∈ {0, 1}X,is defined as
1A(ω) := 1, if ω ∈ A
0, otherwise .
Clearly, 1A, and more generally, mai1A i is a simple random variable on (X, Σ, p)for any A, A1, , Am ∈ Σ and a1, , am ∈ R, m being any positive integer Anysimple random variable can be expressed in this way (Really?)
[3] Let X and Y be two metric spaces If x ∈ YX is continuous, then x is a
Y-valued random variable on (X, B(X)) (Proof Apply Exercise 33.) In particular,
1
2, if ω = 12
1 2ω−1, otherwise
,
22 ≤ can be replaced with < in this statement Right?
Trang 28is a random variable on ([0, 1], B[0, 1]).23
[4] Let x and y be random variables on a measurable space (X, Σ) Then x + y is
a random variable on this space as well since, for every real number a, we have
(x + y)−1((−∞, a)) =
r∈Q{ω : x(ω) ≤ r and y(ω) ≤ a − r}
But the latter set lies in Σ, for, thanks to Observation 1, both x−1((−∞, r]) and
y−1((−∞, a − r]) belong to Σ for any r
By induction, we may generalize this finding in the obvious way: If x1, , xm ∈RV(X, Σ), then Σmxi ∈ RV(X, Σ).24
[5] Let x and y be two random variables on a measurable space (X, Σ) If f ∈
C(R2), then f (x, y) is a random variable on (X, Σ) To see this, let O be any openset in R and observe that f−1(O) is open in R2 by continuity of f But every openset in R2 can be expressed as a countable union of open rectangles with sides parallel
to the axes (Why?) So, we may write f−1(O) = ∞(Ii
1 × Ii
2) where (Im
1 ) and (Im
2 )are two sequences of open intervals in R Thus,
to any finite number of random variables in the obvious way
[6] Let (xm)be an increasing sequence of random variables on a measurable space(X, Σ) (By increasing here, we mean that x1 ≤ x2 ≤ ···.) Then, x := lim xm (i.e thepointwise limit of the sequence (xm)) is an R-valued random variable In particular,
if we know that x(X) ⊆ R, then x ∈ RV(X, Σ) Indeed, the hypothesis x1 ≤ x2 ≤ · · ·guarantees that x−1((−∞, a]) = ∞x−1i ((−∞, a]) for each a ∈ R, so Observation 1yields the assertion easily
[7] Using part [4], we may conclude that RV(X, Σ) is a linear space under theusual addition and scalar multiplication operations The set of all simple randomvariables on (X, Σ) constitutes a subspace of this linear space By part [3], if X is ametric space, then C(X) is also a subspace of RV(X, B(X)) (The same is not truefor the set of all monotonic functions on X since this is not a linear space in its ownright.)
Trang 29While neither B(X) nor RV(X, Σ) includes the other,25 the set RVb(X, Σ) :=B(X)∩ RV(X, Σ) can be viewed as a normed linear subspace of B(X).26 As you areasked to prove in Exercise 37, this subspace is closed, and is thus Banach
Exercise 34 Let (X, Σ) be a measurable space, and Y and Z two metric spaces Letx be aY-valued random variable on(X, Σ), andy a Z-valued random variable
on (Y,B(Y )) Show thaty◦ x is aZ-valued random variable on(X, Σ)
Exercise 35.H Show that if x ∈ RX is a random variable on a measurable space
(X, Σ), then so is |x| ,but not conversely.
Exercise 36 Given any metric spaceX, show that any upper (or lower) uous mapx∈ RX is a random variable on(X,B(X)).
semicontin-Exercise 37.H Let(X, Σ)be a measurable space, and(xm)a sequence ofR-valued random variables on(X, Σ) Show that ifinf xm, sup xm, lim inf xm andlim sup xm
are R-valued random variables on (X, Σ) 27 So if lim xm is a real-valued function, then it is a random variable.
Exercise 38.H Let X be a metric space Show that B(X) is the smallest σ-algebra
on X such thatC(X)⊆ RV(X, B(X))
5.2 The Distribution of a Random Variable
Now that we have played around with its formal definition a bit, let us recall theidea behind the concept of random variable We begin with a random experimentmodeled by a probability space (X, Σ, p) Then we pick a function x mapping X to
R, so the values of x are random in the sense that x assumes a given value a iff
a certain event occurs in the experiment A natural candidate for this event is, ofcourse, x−1(a).But for this to make sense formally, the set x−1(a) must really be anevent, which explains why we require x−1(a) ∈ Σ when defining a random variable.More generally, to assess the likelihood of the event that the value of x belongs tosome open (or closed, or semiclosed) interval I, we need x−1(I) to belong to Σ Arandom variable x on (X, Σ) is a real map on X that has precisely this property
So, for any B ∈ B(R), what is the probability of x−1(B)? Given the probabilityspace at hand, we have an obvious way of assigning a probability to this event, namely,p(x−1(B)).(Notice that we can do this precisely because x−1(B)∈ Σ = the domain
27 Notation These functions are understood to be defined pointwise That is, sup xm
is the real map defined on X by ω → sup x m (ω), lim inf xm is the map defined by ω → lim inf xm(ω), etc
Trang 30Dhilqlwlrq A random variable x on a probability space (X, Σ, p) induces a Borelprobability measure px on R as follows:
de-we need to check that px ∈ P(R), that is, px is really a Borel probability measure
Of course, we have px(R) = p(x−1(R)) = p(X) = 1 and similarly px(∅) = 0 Onthe other hand, for any pairwise disjoint sequence of events (Am) in B(R), we have
x−1(Ai)∩ x−1(Aj) =∅ for each i = j, and hence
We say that the random variable x is continuous if Fx is a continuous function.(So, p({ω : x(ω) = a}) = px({a}) = 0 for any continuous random variable x and
a∈ R Why?)
Warning A continuous random variable is distinct from a random variable which
is continuous The former concept is universally defined while the latter conceptdemands (something like) a metric structure on the sample space Even in the case
of Borel probability spaces these are distinct notions If X is finite, any real function
on X is a random variable on (X, 2X) which is continuous on X (where we view X
as metrized by the discrete metric) But, obviously, in this case no real function on
X is a continuous random variable
28 Again, there is nothing probabilistic about the notion of random variable It is thus a bit silly
to say that x is defined on a probability space (X, Σ, p) — x is fully identified by the measurable space (X, Σ) However, in probability theory, one is foremost interested in the distribution of a random variable, and that surely depends on the probability measure that one uses on (X, Σ) For this reason, probabilists often talk of “a random variable x on a probability space (X, Σ, p), ” and henceforth, I will adhere to this convention as well.
Trang 31Rhpdun 2 We often describe the probabilistic behavior of a random variable byspecifying its distribution function directly So the statement “let x be a randomvariable on X with the distribution function F ” means that x is a random variable
on some probability space (X, Σ, p) such that F = Fx, that is, px = p◦ x−1 inducesthe distribution function F (or, equivalently, px is the Lebesgue-Stieltjes measureinduced by F ) In turn, a distribution function F is often defined in terms of apiecewise continuous function f : R → R+ with −∞∞ f (t)dt = 1 as follows:29
F (a) =
a
−∞
In this case, f is said to be a density for F, or equivalently, we say that F is induced
by the density function f.30
Two typical examples of distribution functions induced by density functions arethe uniform distribution on [a, b], a < b, for which
f (t) =
1 b−a, a≤ t ≤ b
Rhpdun 3 In Section 3.3 we have seen that there is a one-to-one correspondencebetween Borel probability measures on R and distribution functions Interestingly,there is a similar relation between such measures and random variables on the prob-ability space ((0, 1), B(0, 1), ) as well One direction is indeed obvious; any suchrandom variable x induces a Borel probability measure on R, namely, its distribu-tion x Conversely, every Borel probability measure p on R arises this way, that is,
p equals the distribution of some such x This is easy to see if p is induced by astrictly increasing distribution function, say F Since F is then a bijection from R
|a − b| < ε
K , we have F (b) − F (a) ≤ ab|f(t)| dt ≤ K(b − a) < ε.) The claim is in fact true for any Riemann integrable f, but we don’t need to prove this here.