The basic problems of probability theory

Consider a random walk on the set {0, 1}, where with probability one on each step the chain moves to the other state. A less trivial case is the simple random walk. on the integers.. Sup[r]

Trang 1

Probability TheoryRichard F Bass

These notes are c

but not for commercial purposes

Instead of saying a property occurs almost everywhere, we talk about properties occurring almostsurely, written a.s Real-valued measurable functions from Ω to R are called random variables and areusually denoted by X or Y or other capital letters We often abbreviate ”random variable” by r.v

We let Ac = (ω ∈ Ω : ω /∈ A) (called the complement of A) and B − A = B ∩ Ac

Integration (in the sense of Lebesgue) is called expectation or expected value, and we write E X for

R XdP The notation E [X; A] is often used for RAXdP

The random variable 1A is the function that is one if ω ∈ A and zero otherwise It is called theindicator of A (the name characteristic function in probability refers to the Fourier transform) Events such

as (ω : X(ω) > a) are almost always abbreviated by (X > a)

Given a random variable X, we can define a probability on R by

The probability PX is called the law of X or the distribution of X We define FX: R → [0, 1] by

The function FX is called the distribution function of X

As an example, let Ω = {H, T }, F all subsets of Ω (there are 4 of them), P(H) = P(T ) = 12 LetX(H) = 1 and X(T ) = 0 Then PX = 12δ0+12δ1, where δx is point mass at x, that is, δx(A) = 1 if x ∈ Aand 0 otherwise FX(a) = 0 if a < 0, 12 if 0 ≤ a < 1, and 1 if a ≥ 1

Proposition 1.1 The distribution function FX of a random variable X satisfies:

(a) FX is nondecreasing;

(b) FX is right continuous with left limits;

(c) limx→∞FX(x) = 1 and limx→−∞FX(x) = 0

Proof We prove the first part of (b) and leave the others to the reader If xn↓ x, then (X ≤ xn) ↓ (X ≤ x),and so P(X≤xn) ↓ P(X ≤ x) since P is a measure

Note that if xn↑ x, then (X ≤ xn) ↑ (X < x), and so FX(xn) ↑ P(X < x)

Any function F : R → [0, 1] satisfying (a)-(c) of Proposition 1.1 is called a distribution function,whether or not it comes from a random variable

Trang 2

Proposition 1.2 Suppose F is a distribution function There exists a random variable X such that

F = FX

Proof Let Ω = [0, 1], F the Borel σ-field, and P Lebesgue measure Define X(ω) = sup{x : F (x) < ω} It

is routine to check that FX = F

In the above proof, essentially X = F−1 However F may have jumps or be constant over someintervals, so some care is needed in defining X

Certain distributions or laws are very common We list some of them

(a) Bernoulli A random variable is Bernoulli if P(X = 1) = p, P(X = 0) = 1 − p for some p ∈ [0, 1].(b) Binomial This is defined by P(X = k) = nk

pk(1 − p)n−k, where n is a positive integer, 0 ≤ k ≤ n,and p ∈ [0, 1]

(c) Geometric For p ∈ (0, 1) we set P(X = k) = (1 − p)pk Here k is a nonnegative integer

(d) Poisson For λ > 0 we set P(X = k) = e−λλk/k! Again k is a nonnegative integer

(e) Uniform For some positive integer n, set P(X = k) = 1/n for 1 ≤ k ≤ n

If F is absolutely continuous, we call f = F0 the density of F Some examples of distributionscharacterized by densities are the following

(f) Uniform on [a, b] Define f (x) = (b − a)−11[a,b](x) This means that if X has a uniform distribution,then

(g) Exponential For x > 0 let f (x) = λe−λx

(h) Standard normal Define f (x) = √1

2πe−x2/2 SoP(X ∈ A) =

1

√2πZ

A

e−x2/2dx

(i) N (µ, σ2) We shall see later that a standard normal has mean zero and variance one If Z is astandard normal, then a N (µ, σ2) random variable has the same distribution as µ + σZ It is anexercise in calculus to check that such a random variable has density

1

√2πσe

We can use the law of a random variable to calculate expectations

Proposition 1.3 If g is bounded or nonnegative, then

E g(X) =

Zg(x) PX(dx)

Proof If g is the indicator of an event A, this is just the definition of PX By linearity, the result holds forsimple functions By the monotone convergence theorem, the result holds for nonnegative functions, and bylinearity again, it holds for bounded g

Trang 3

If FX has a density f , then PX(dx) = f (x) dx So, for example, E X = R xf (x) dx and E X =

R x2

f (x) dx (We need E |X| finite to justify this if X is not necessarily nonnegative.)

We define the mean of a random variable to be its expectation, and the variance of a random variable

is defined by

Var X = E (X − E X)2.For example, it is routine to see that the mean of a standard normal is zero and its variance is one

Note

Var X = E (X2− 2XE X + (E X)2

) = E X2− (E X)2

Another equality that is useful is the following

Proposition 1.4 If X ≥ 0 a.s and p > 0, then

E Xp=

Z ∞ 0

pλp−1P(X > λ) dλ.

The proof will show that this equality is also valid if we replace P(X > λ) by P(X ≥ λ)

Proof Use Fubini’s theorem and write

Z ∞

0

pλp−1P(X > λ) dλ = E

Z ∞ 0

pλp−11(λ,∞)(X) dλ = E

Z X 0

pλp−1dλ = E Xp

We need two elementary inequalities

Proposition 1.5 Chebyshev’s inequality If X ≥ 0,

If we apply this to X = (Y − E Y )2, we obtain

P(|Y − E Y | ≥ a) = P((Y − E Y )2≥ a2) ≤ Var Y /a2 (1.4)This special case of Chebyshev’s inequality is sometimes itself referred to as Chebyshev’s inequality, whileProposition 1.5 is sometimes called the Markov inequality

The second inequality we need is Jensen’s inequality, not to be confused with the Jensen’s formula ofcomplex analysis

Trang 4

Proposition 1.6 Suppose g is convex and and X and g(X) are both integrable Then

g(E X) ≤ E g(X)

Proof One property of convex functions is that they lie above their tangent lines, and more generally theirsupport lines So if x0∈ R, we have

g(x) ≥ g(x0) + c(x − x0)for some constant c Take x = X(ω) and take expectations to obtain

E g(X) ≥ g(x0) + c(E X − x0)

Now set x0 equal to E X

If An is a sequence of sets, define (An i.o.), read ”An infinitely often,” by

(An i.o.) = ∩∞n=1∪∞i=nAi.This set consists of those ω that are in infinitely many of the An

A simple but very important proposition is the Borel-Cantelli lemma It has two parts, and we provethe first part here, leaving the second part to the next section

Proposition 1.7 (Borel-Cantelli lemma) IfP

nP(An) < ∞, then P(An i.o.) = 0

P(Ac∩ B) = P(B) − P(A ∩ B) = P(B) − P(A)P(B) = P(B)(1 − P(A)) = P(B)P(A)

We say two σ-fields F and G are independent if A and B are independent whenever A ∈ F and

B ∈ G Two random variables X and Y are independent if the σ-field generated by X and the σ-fieldgenerated by Y are independent (Recall that the σ-field generated by a random variable X is given by

Trang 5

{(X ∈ A) : A a Borel subset of R}.) We define the independence of n σ-fields or n random variables in theobvious way.

Proposition 2.1 tells us that A and B are independent if the random variables 1A and 1B are pendent, so the definitions above are consistent

inde-If f and g are Borel functions and X and Y are independent, then f (X) and g(Y ) are independent.This follows because the σ-field generated by f (X) is a sub-σ-field of the one generated by X, and similarlyfor g(Y )

Let FX,Y(x, y) = P(X ≤ x, Y ≤ y) denote the joint distribution function of X and Y (The commainside the set means ”and.”)

Proposition 2.2 FX,Y(x, y) = FX(x)FY(y) if and only if X and Y are independent

Proof If X and Y are independent, the 1(−∞,x](X) and 1(−∞,y](Y ) are independent by the above comments.Using the above comments and the definition of independence, this shows FX,Y(x, y) = FX(x)FY(y).Conversely, if the inequality holds, fix y and let My denote the collection of sets A for which P(X ∈

A, Y ≤ y) = P(X ∈ A)P(Y ≤ y) My contains all sets of the form (−∞, x] It follows by linearity that My

contains all sets of the form (x, z], and then by linearity again, by all sets that are the finite union of suchhalf-open, half-closed intervals Note that the collection of finite unions of such intervals, A, is an algebragenerating the Borel σ-field It is clear that My is a monotone class, so by the monotone class lemma, My

contains the Borel σ-field

For a fixed set A, let MA denote the collection of sets B for which P(X ∈ A, Y ∈ B) = P(X ∈A)P(Y ∈ B) Again, MAis a monotone class and by the preceding paragraph contains the σ-field generated

by the collection of finite unions of intervals of the form (x, z], hence contains the Borel sets Therefore Xand Y are independent

The following is known as the multiplication theorem

Proposition 2.3 If X, Y , and XY are integrable and X and Y are independent, then E XY = E XE Y Proof Consider the random variables in σ(X) (the σ-field generated by X) and σ(Y ) for which themultiplication theorem is true It holds for indicators by the definition of X and Y being independent

It holds for simple random variables, that is, linear combinations of indicators, by linearity of both sides

It holds for nonnegative random variables by monotone convergence And it holds for integrable randomvariables by linearity again

Let us give an example of independent random variables Let Ω = Ω1× Ω2and let P = P1× P2, where(Ωi, Fi, Pi) are probability spaces for i = 1, 2 We use the product σ-field Then it is clear that F1 and F2are independent by the definition of P If X1is a random variable such that X1(ω1, ω2) depends only on ω1and X2 depends only on ω2, then X1 and X2 are independent

This example can be extended to n independent random variables, and in fact, if one has independentrandom variables, one can always view them as coming from a product space We will not use this fact.Later on, we will talk about countable sequences of independent r.v.s and the reader may wonder whethersuch things can exist That it can is a consequence of the Kolmogorov extension theorem; see PTA, forexample

If X1, , Xn are independent, then so are X1− E X1, , Xn − E Xn Assuming everything isintegrable,

E [(X1− E X1) + · · · (Xn− E Xn)]2= E (X1− E X1)2+ · · · + E (Xn− E Xn)2,

Trang 6

using the multiplication theorem to show that the expectations of the cross product terms are zero We havethus shown

Var (X1+ · · · + Xn) = Var X1+ · · · + Var Xn (2.1)

We finish up this section by proving the second half of the Borel-Cantelli lemma

Proposition 2.4 Suppose Anis a sequence of independent events IfP

nP(An) = ∞, then P(An i.o.) = 1.Note that here the An are independent, while in the first half of the Borel-Cantelli lemma no suchassumption was necessary

In this section we consider three ways a sequence of r.v.s Xn can converge

We say Xn converges to X almost surely if (Xn 6→ X) has probability zero Xn converges to X inprobability if for each ε, P(|Xn− X| > ε) → 0 as n → ∞ Xn converges to X in Lp if E |Xn− X|p → 0 as

n → ∞

The following proposition shows some relationships among the types of convergence

Proposition 3.1 (a) If Xn→ X a.s., then Xn→ X in probability

(b) If Xn→ X in Lp, then Xn→ X in probability

(c) If Xn→ X in probability, there exists a subsequence nj such that Xn j converges to X almost surely.Proof To prove (a), note Xn− X tends to 0 almost surely, so 1(−ε,ε)c(Xn− X) also converges to 0 almostsurely Now apply the dominated convergence theorem

(b) comes from Chebyshev’s inequality:

i=1i−1, and let An = {θ : tn−1≤ θ < tn} Let Xn = 1An Any point on the unit circle will

be in infinitely many An, so Xn does not converge almost surely to 0 But P(An) = 1/2πn → 0, so Xn→ 0

in probability and in Lp

Trang 7

4 Weak law of large numbers.

Suppose Xn is a sequence of independent random variables Suppose also that they all have the samedistribution, that is, FX n = FX 1 for all n This situation comes up so often it has a name, independent,identically distributed, which is abbreviated i.i.d

Proof Since the Xi are i.i.d., they all have the same expectation, and so E Sn= nE X1 Hence E (Sn/n −

E X1)2 is the variance of Sn/n If ε > 0, by Chebyshev’s inequality,

P(|Sn/n − E X1| > ε) ≤ Var (Sn/n)

Pn i=1Var Xi

n2ε2 = nVar X1

Since E X2< ∞, then Var X1< ∞, and the result follows by letting n → ∞

A nice application of the weak law of large numbers is a proof of the Weierstrass approximationtheorem

Theorem 4.2 Suppose f is a continuous function on [0, 1] and ε > 0 There exists a polynomial P suchthat supx∈[0,1]|f (x) − P (x)| < ε

Let Xi be i.i.d Bernoulli r.v.s with parameter x Then Sn, the partial sum, is a binomial, and hence

P (x) = E f (Sn/n) The mean of Sn/n is x We have

≤ M P(|Sn/n − x| > δ) + ε/2

By (4.1) the first term will be less than

M Var X1/nδ2≤ M x(1 − x)/nδ2≤ M nδ2,

which will be less than ε/2 if n is large enough, uniformly in x

5 Techniques related to almost sure convergence

Our aim is the strong law of large numbers (SLLN), which says that Sn/n converges to E X1 almostsurely if E |X1| < ∞

We first prove it under the assumption that E X4< ∞

Trang 8

Proposition 5.1 Suppose Xi is an i.i.d sequence with E Xi4 < ∞ and let Sn = i=1Xi Then Sn/n →

n4ε4

If we expand Sn4, we will have terms involving Xi4, terms involving Xi2Xj2, terms involving Xi3Xj, termsinvolving Xi2XjXk, and terms involving XiXjXkX`, with i, j, k, ` all being different By the multiplicationtheorem and the fact that the Xi have mean 0, the expectations of all the terms will be 0 except for those

of the first two types So

P(|Sn/n| > ε) ≤ c3/n2ε4.Consequently P(|Sn/n| > ε i.o.) = 0 by Borel-Cantelli Since ε is arbitrary, this implies Sn/n → 0 a.s.Before we can prove the SLLN assuming only the finiteness of first moments, we need some prelimi-naries

Proposition 5.2 If Y ≥ 0, then E Y < ∞ if and only ifP

nP(Y > n) < ∞

Proof By Proposition 1.4, E Y = R∞

0 P(Y > x)dx P(Y > x) is nonincreasing in x, so the integral isbounded above byP∞

n=0P(Y > n) and bounded below byP∞n=1P(Y > n)

If Xi is a sequence of r.v.s, the tail σ-field is defined by ∩∞n=1σ(Xn, Xn+1, ) An example of anevent in the tail σ-field is (lim supn→∞Xn > a) Another example is (lim supn→∞Sn/n > a) The reasonfor this is that if k < n is fixed,

The first term on the right tends to 0 as n → ∞ So lim sup Sn/n = lim sup(Pn

i=k+1Xi)/n, which is inσ(Xk+1, Xk+2, ) This holds for each k The set (lim sup Sn> a) is easily seen not to be in the tail σ-field.Theorem 5.3 (Kolmogorov 0-1 law) If the Xi are independent, then the events in the tail σ-field haveprobability 0 or 1

This implies that in the case of i.i.d random variables, if Sn/n has a limit with positive probability,then it has a limit with probability one, and the limit must be a constant

Proof Let M be the collection of sets in σ(Xn+1, ) that is independent of every set in σ(X1, , Xn)

M is easily seen to be a monotone class and it contains σ(Xn+1, , XN) for every N > n Therefore Mmust be equal to σ(Xn+1, )

If A is in the tail σ-field, then for each n, A is independent of σ(X1, , Xn) The class MA of setsindependent of A is a monotone class, hence is a σ-field containing σ(X1, , Xn) for each n Therefore MA

contains σ(X1, )

Trang 9

We thus have that the event A is independent of itself, or

P(A) = P(A ∩ A) = P(A)P(A) = P(A)2.This implies P(A) is zero or one

The next proposition shows that in considering a law of large numbers we can consider truncatedrandom variables

Proposition 5.4 Suppose Xi is an i.i.d sequence of r.v.s with E |X1| < ∞ Let X0

n= Xn1(|Xn|≤n) Then(a) Xn converges almost surely if and only if Xn0 does;

Next is Kolmogorov’s inequality, a special case of Doob’s inequality

Proposition 5.5 Suppose the Xi are independent and E Xi = 0 for each i Then

P( max

1≤i≤n|Si| ≥ λ) ≤ E S

2 n

λ2 Proof Let Ak = (|Sk| ≥ λ, |S1| < λ, , |Sk−1| < λ) Note the Akare disjoint and that Ak∈ σ(X1, , Xk).Therefore Ak is independent of Sn− Sk Then

Our result is immediate from this

The last result we need for now is a special case of what is known as Kronecker’s lemma

Proposition 5.6 Suppose xiare real numbers and sn=Pn

i=1xi IfP∞

j=1(xj/j) converges, then sn/n → 0.Proof Let bn =Pn

j=1(xj/j), b0= 0, and suppose bn→ b As is well known, this implies (Pn

Pn i=1ibi−Pn−1

i=1(i + 1)bi

Pn i=1bi

n → b − b = 0

Trang 10

6 Strong law of large numbers.

This section is devoted to a proof of Kolmogorov’s strong law of large numbers We showed earlierthat if E X2

i < ∞, where the Xiare i.i.d., then the weak law of large numbers (WLLN) holds: Sn/n converges

to E X1in probability The WLLN can be improved greatly; it is enough that xP(|X1| > x) → 0 as x → ∞.Here we show the strong law (SLLN): if one has a finite first moment, then there is almost sure convergence.First we need a lemma

Lemma 6.1 Suppose Vi is a sequence of independent r.v.s, each with mean 0 Let Wn = Pn

i=1Vi If

P∞

i=1Var Vi< ∞, then Wn converges almost surely

Proof Choose nj> nj−1such thatP∞

i=n jVar Vi< 2−3j If n > nj, then applying Kolmogorov’s inequalityshows that

P( max

nj≤i≤n|Wi− Wn j| > 2−j) ≤ 2−3j/2−2j = 2−j.Letting n → ∞, we have P(Aj) ≤ 2−j, where

Aj = (max

n j ≤i|Wi− Wnj| > 2−j)

By the Borel-Cantelli lemma, P(Aj i.o.) = 0

Suppose ω /∈ (Aj i.o.) Let ε > 0 Choose j large enough so that 2−j+1< ε and ω /∈ Aj If n, m > nj,then

|Wn− Wm| ≤ |Wn− Wnj| + |Wm− Wnj| ≤ 2−j+1< ε

Since ε is arbitrary, Wn(ω) is a Cauchy sequence, and hence converges

Theorem 6.2 (SLLN) Let Xibe a sequence of i.i.d random variables Then Sn/n converges almost surely

n=1P(|X1| > n) < ∞, and by Proposition 4.1, E|X1| < ∞

Now suppose E |X1| < ∞ By looking at Xi− E Xi, we may suppose without loss of generality that

E Xi= 0 We truncate, and let Yi= Xi1(|Xi|≤i) It suffices to showPn

i=1Yi/n → 0 a.s., by Proposition 5.4.Next we estimate We have

E Yi= E [Xi1(|Xi|≤i)] = E [X11(|X1|≤i)] → E X1= 0

Trang 11

The convergence follows by the dominated convergence theorem, since the integrands are bounded by |X1|.

To estimate the second moment of the Yi, we write

E Yi2=

Z ∞ 0

1(y≤i)yP(|X1| ≥ y) dy

= 2

Z ∞ 0

1

yyP(|X1| ≥ y) dy

= 4

Z ∞ 0

P(|X1| ≥ y) dy = 4E |X1| < ∞

Let Ui= Yi− E Yi Then Var Ui= Var Yi≤ E Y2

i , and by the above,

∞

X

i=1

Var (Ui/i) < ∞

By Lemma 6.1 (with Vi= Ui/i),Pn

i=1(Ui/i) converges almost surely By Kronecker’s lemma, (Pn

i=1Ui)/nconverges almost surely Finally, since E Yi→ 0, thenPn

Proposition 7.1 Suppose there exists ϕ : [0, ∞) → [0, ∞) such that ϕ is nondecreasing, ϕ(x)/x → ∞ as

x → ∞, and supiE ϕ(|Xi|) < ∞ Then the Xi are uniformly integrable

Proof Let ε > 0 and choose x0 such that x/ϕ(x) < ε if x ≥ x0 If M ≥ x0,

Z

(|X i |>M )

|Xi| =

Z |Xi|ϕ(|Xi|)ϕ(|Xi|)1(|Xi |>M )≤ ε

Zϕ(|Xi|) ≤ ε sup

i E ϕ(|Xi|)

Trang 12

Proposition 7.2 If Xn and Yn are two uniformly integrable sequences, then Xn+ Yn is also a uniformlyintegrable sequence.

Proof Since there exists M0 such that supnE [|Xn|; |Xn| > M0] < 1 and supnE [|Yn|; |Yn| > M0] < 1,then supnE |Xn| ≤ M0+ 1, and similarly for the Yn Let ε > 0 and choose M1 > 4(M0+ 1)/ε such thatsupnE [|Xn|; |Xn| > M1] < ε/4 and supnE [|Yn|; |Yn| > M1] < ε/4 Let M2= 4M2

Note P(|Xn| + |Yn| > M2) ≤ (E |Xn| + E |Yn|)/M2≤ ε/(4M1) by Chebyshev’s inequality Then

E [|Xn+ Yn|; |Xn+ Yn| > M2] ≤ E [|Xn|; |Xn| > M1]

+ E [|Xn|; |Xn| ≤ M1, |Xn+ Yn| > M2]+ E [|Yn|; |Yn| > M1]

+ E [|Yn|; |Yn| ≤ M1, |Xn+ Yn| > M2]

The first and third terms on the right are each less than ε/4 by our choice of M1 The second and fourthterms are each less than M1P(|Xn+ Yn| > M2) ≤ ε/2

The main result we need in this section is Vitali’s convergence theorem

Theorem 7.3 If Xn→ X almost surely and the Xn are uniformly integrable, then E |Xn− X| → 0.Proof By the above proposition, Xn− X is uniformly integrable and tends to 0 a.s., so without loss ofgenerality, we may assume X = 0 Let ε > 0 and choose M such that supnE [|Xn|; |Xn| > M ] < ε Then

Sn

n − E X1

→ 0

Proof Without loss of generality we may assume E X1= 0 By the SLLN, Sn/n → 0 a.s So we need toshow that the sequence Sn/n is uniformly integrable

Pick M1such that E [|X1|; |X1| > M1] < ε/2 Pick M2= M1E |X1|/ε So

Trang 13

Theorem 8.2 Let Xi be a sequence of independent random variables., A > 0, and Yi= Xi1(|Xi|≤A) Then

P Xi converges if and only if all of the following three series converge: (a)P

is Ω, P(Ai) > 0 for all i, and F is the σ-field generated by the Ais, then

P(A | F ) =

X

i

P(A ∩ Ai)P(Ai) 1Ai.This follows since the right-hand side is F measurable and its expectation over any set Ai is P(A ∩ Ai)

As an example, suppose we toss a fair coin independently 5 times and let Xi be 1 or 0 dependingwhether the ith toss was a heads or tails Let A be the event that there were 5 heads and let Fi =σ(X1, , Xi) Then P(A) = 1/32 while P(A | F1) is equal to 1/16 on the event (X1= 1) and 0 on the event(X1= 0) P(A | F2) is equal to 1/8 on the event (X1= 1, X2= 1) and 0 otherwise

We have

because E [E [X | F]] = E [E [X | F]; Ω] = E [X; Ω] = E X

The following is easy to establish

Proposition 9.1 (a) If X ≥ Y are both integrable, then E [X | F] ≥ E [Y | F] a.s

(b) If X and Y are integrable and a ∈ R, then E [aX + Y | F] = aE [X | F] + E [Y | F]

It is easy to check that limit theorems such as monotone convergence and dominated convergencehave conditional expectation versions, as do inequalities like Jensen’s and Chebyshev’s inequalities Thus,for example, we have the following

Proposition 9.2 (Jensen’s inequality for conditional expectations) If g is convex and X and g(X) areintegrable,

E [g(X) | F ] ≥ g(E [X | F ]), a.s

A key fact is the following

Trang 14

Proposition 9.3 If X and XY are integrable and Y is measurable with respect to F , then

Proposition 9.4 If E ⊆ F ⊆ G, then

EE [X | F ] | E = E [X | E] = E E [X | E] | F

Proof The right equality holds because E [X | E] is E measurable, hence F measurable To show the leftequality, let A ∈ E Then since A is also in F ,

EEE [X | F ] | E; A = E E [X | F]; A = E [X; A] = E [E X | E]; A

Since both sides are E measurable, the equality follows

To show the existence of E [X | F], we proceed as follows

Proposition 9.5 If X is integrable, then E [X | F] exists

Proof Using linearity, we need only consider X ≥ 0 Define a measure Q on F by Q(A) = E [X; A] for

A ∈ F This is trivially absolutely continuous with respect to P|F, the restriction of P to F Let E [X | F]

be the Radon-Nikodym derivative of Q with respect to P|F The Radon-Nikodym derivative is F measurable

by construction and so provides the desired random variable

When F = σ(Y ), one usually writes E [X | Y ] for E [X | F] Notation that is commonly used (however,

we will use it only very occasionally and only for heuristic purposes) is E [X | Y = y] The definition is asfollows If A ∈ σ(Y ), then A = (Y ∈ B) for some Borel set B by the definition of σ(Y ), or 1A= 1B(Y ) Bylinearity and taking limits, if Z is σ(Y ) measurable, Z = f (Y ) for some Borel measurable function f Set

Z = E [X | Y ] and choose f Borel measurable so that Z = f (Y ) Then E [X | Y = y] is defined to be f (y)

If X ∈ L2 and M = {Y ∈ L2

: Y is F -measurable}, one can show that E [X | F] is equal to theprojection of X onto the subspace M We will not use this in these notes

10 Stopping times

We next want to talk about stopping times Suppose we have a sequence of σ-fields Fi such that

Fi ⊂ Fi+1 for each i An example would be if Fi = σ(X1, , Xi) A random mapping N from Ω to{0, 1, 2, } is called a stopping time if for each n, (N ≤ n) ∈ Fn A stopping time is also called an optionaltime in the Markov theory literature

The intuition is that the sequence knows whether N has happened by time n by looking at Fn.Suppose some motorists are told to drive north on Highway 99 in Seattle and stop at the first motorcycleshop past the second realtor after the city limits So they drive north, pass the city limits, pass two realtors,and come to the next motorcycle shop, and stop That is a stopping time If they are instead told to stop

at the third stop light before the city limits (and they had not been there before), they would need to drive

to the city limits, then turn around and return past three stop lights That is not a stopping time, becausethey have to go ahead of where they wanted to stop to know to stop there

We use the notation a ∧ b = min(a, b) and a ∨ b = max(a, b) The proof of the following is immediatefrom the definitions

Trang 15

Proposition 10.1.

(a) Fixed times n are stopping times

(b) If N1and N2are stopping times, then so are N1∧ N2and N1∨ N2

(c) If Nn is a nondecreasing sequence of stopping times, then so is N = supnNn

(d) If Nn is a nonincreasing sequence of stopping times, then so is N = infnNn

(e) If N is a stopping time, then so is N + n

We define FN = {A : A ∩ (N ≤ n) ∈ Fn for all n}

If we have E [Mn | Fn−1] ≥ Mn−1 a.s for every n, then Mn is a submartingale If we have E [Mn |

Fn−1] ≤ Mn−1, we have a supermartingale Submartingales have a tendency to increase

Let us take a moment to look at some examples If Xi is a sequence of mean zero i.i.d randomvariables and Sn is the partial sum process, then Mn = Sn is a martingale, since E [Mn | Fn−1] = Mn−1+

E [Mn− Mn−1| Fn−1] = Mn−1+ E [Mn− Mn−1] = Mn−1, using independence If the Xi’s have varianceone and Mn= S2

n− n, then

E [Sn2 | Fn−1] = E [(Sn− Sn−1)2| Fn−1] + 2Sn−1E [Sn| Fn−1] − S2n−1= 1 + Sn−12 ,

using independence It follows that Mn is a martingale

Another example is the following: if X ∈ L1 and Mn= E [X | Fn], then Mn is a martingale

If Mn is a martingale and Hn∈ Fn−1 for each n, it is easy to check that Nn=Pn

on what conditions one puts on the stopping times

Theorem 12.1 If N is a bounded stopping time with respect to Fn and Mn a martingale, then E MN =

Trang 16

The assumption that N be bounded cannot be entirely dispensed with For example, let Mn be thepartial sums of a sequence of i.i.d random variable that take the values ±1, each with probability 12 If

N = min{i : Mi= 1}, we will see later on that N < ∞ a.s., but E MN = 1 6= 0 = E M0

The same proof as that in Theorem 12.1 gives the following corollary

Corollary 12.2 If N is bounded by K and Mn is a submartingale, then E MN ≤ E MK

Also the same proof gives

Corollary 12.3 If N is bounded by K, A ∈ FN, and Mnis a submartingale, then E [MN; A] ≤ E [MK; A].Proposition 12.4 If N1 ≤ N2 are stopping times bounded by K and M is a martingale, then E [MN 2 |

Subtracting E [MN 2; Ac] from each side completes the proof

The following is known as the Doob decomposition

Proposition 12.5 Suppose Xk is a submartingale with respect to an increasing sequence of σ-fields Fk.Then we can write Xk= Mk+ Ak such that Mk is a martingale adapted to the Fk and Ak is a sequence ofrandom variables with Ak being Fk−1-measurable and A0≤ A1≤ · · ·

Proof Let ak = E [Xk | Fk−1] − Xk−1for k = 1, 2, Since Xk is a submartingale, then each ak≥ 0 Thenlet Ak =Pk

i=1ai The fact that the Ak are increasing and measurable with respect to Fk−1 is clear Set

Trang 17

Corollary 12.6 Suppose Xk is a submartingale, and N1≤ N2 are bounded stopping times Then

Proof Set Mn+1= Mn Let N = min{j : |Mj| ≥ a} ∧ (n + 1) Since | · | is convex, |Mn| is a submartingale

If A = (Mn∗≥ a), then A ∈ FN and by Corollary 12.3

aP(Mn∗≥ a) ≤ E [|MN|; A] ≤ E [|Mn|; A] ≤ E |Mn|

For p > 1, we have the following inequality

Theorem 13.2 If p > 1 and E |Mi|p< ∞ for i ≤ n, then

pap−1P(Mn∗> a) da ≤

Z ∞ 0

pap−1E [|Mn|1(M∗

n ≥a)/a] da

= E

Z Mn∗0

14 Martingale convergence theorems

The martingale convergence theorems are another set of important consequences of optional stopping.The main step is the upcrossing lemma The number of upcrossings of an interval [a, b] is the number oftimes a process crosses from below a to above b

To be more exact, let

S1= min{k : Xk ≤ a}, T1= min{k > S1: Xk≥ b},and

Si+1= min{k > Ti : Xk ≤ a}, Ti+1= min{k > Si+1: Xk≥ b}

The number of upcrossings Un before time n is Un= max{j : Tj≤ n}

Trang 18

Theorem 14.1 (Upcrossing lemma) If Xk is a submartingale,

E Un≤ (b − a)−1E [(Xn− a)+]

Proof The number of upcrossings of [a, b] by Xk is the same as the number of upcrossings of [0, b − a]

by Yk = (Xk− a)+ Moreover Yk is still a submartingale If we obtain the inequality for the the number ofupcrossings of the interval [0, b − a] by the process Yk, we will have the desired inequality for upcrossings ofX

So we may assume a = 0 Fix n and define Yn+1= Yn This will still be a submartingale Define the

Si, Ti as above, and let S0i= Si∧ (n + 1), T0

i = Ti∧ (n + 1) Since Ti+1> Si+1> Ti, then Tn+10 = n + 1

This leads to the martingale convergence theorem

Theorem 14.2 If Xn is a submartingale such that supnE X+

n < ∞, then Xn converges a.s as n → ∞.Proof Let U (a, b) = limn→∞Un For each a, b rational, by monotone convergence,

E U (a, b) ≤ c(b − a)−1E (Xn− a)+< ∞

So U (a, b) < ∞, a.s Taking the union over all pairs of rationals a, b, we see that a.s the sequence Xn(ω)cannot have lim sup Xn > lim inf Xn Therefore Xn converges a.s., although we still have to rule out thepossibility of the limit being infinite Since Xn is a submartingale, E Xn≥ E X0, and thus

E |Xn| = E Xn++ E Xn−= 2E Xn+− E Xn ≤ 2E Xn+− E X0

By Fatou’s lemma, E limn|Xn| ≤ supnE |Xn| < ∞, or Xn converges a.s to a finite limit

Corollary 14.3 If Xnis a positive supermartingale or a martingale bounded above or below, Xnconvergesa.s

Proof If Xn is a positive supermartingale, −Xn is a submartingale bounded above by 0 Now applyTheorem 4.12

Trang 19

If Xn is a martingale bounded above, by considering −Xn, we may assume Xn is bounded below.Looking at Xn+ M for fixed M will not affect the convergence, so we may assume Xn is bounded below by

0 Now apply the first assertion of the corollary

Proposition 14.4 If Xn is a martingale with supnE |Xn|p< ∞ for some p > 1, then the convergence is in

Lp as well as a.s This is also true when Xn is a submartingale If Xn is a uniformly integrable martingale,then the convergence is in L1 If Xn→ X∞in L1, then Xn= E [X∞| Fn]

Xn is a uniformly integrable martingale if the collection of random variables Xn is uniformly grable

inte-Proof The Lp convergence assertion follows by using Doob’s inequality (Theorem 13.2) and dominatedconvergence The L1convergence assertion follows since a.s convergence together with uniform integrabilityimplies L1 convergence Finally, if j < n, we have Xj = E [Xn| Fj] If A ∈ Fj,

An elegant application of martingales is a proof of the SLLN Fix N large Let Yi be i.i.d Let

Zn = E [Y1 | Sn, Sn+1, , SN] We claim Zn = Sn/n Certainly Sn/n is σ(Sn, · · · , SN) measurable If

A ∈ σ(Sn, , SN) for some n, then A = ((Sn, , SN) ∈ B) for some Borel subset B of RN −n+1 Since the

Yi are i.i.d., for each k ≤ n,

E [Y1; (Sn, , SN) ∈ B] = E [Yk; (Sn, , SN) ∈ B]

Summing over k and dividing by n,

E [Y1; (Sn, , SN) ∈ B] = E [Sn/n; (Sn, , SN) ∈ B]

Therefore E [Y1; A] = E [Sn/n; A] for every A ∈ σ(Sn, , SN) Thus Zn = Sn/n

Let Xk = ZN −k, and let Fk = σ(SN −k, SN −k+1, , SN) Note Fk gets larger as k gets larger,and by the above Xk = E [Y1 | Fk] This shows that Xk is a martingale (cf the next to last example

in Section 11) By Doob’s upcrossing inequality, if UnX is the number of upcrossings of [a, b] by X, then

E UNX ≤ E XN+/(b − a) ≤ E |Z0|/(b − a) = E |Y1|/(b − a) This differs by at most one from the number ofupcrossings of [a, b] by Z1, , ZN So the expected number of upcrossings of [a, b] by Zk for k ≤ N isbounded by 1 + E |Y1|/(b − a) Now let N → ∞ By Fatou’s lemma, the expected number of upcrossings

of [a, b] by Z1, is finite Arguing as in the proof of the martingale convergence theorem, this says that

Zn= Sn/n does not oscillate

It is conceivable that |Sn/n| → ∞ But by Fatou’s lemma,

E[lim |Sn/n|] ≤ lim inf E |Sn/n| ≤ lim inf nE |Y1|/n = E |Y1| < ∞

Another application of martingale techniques is Wald’s identities

Trang 20

Proposition 15.1 Suppose the Yi are i.i.d with E |Y1| < ∞, N is a stopping time with E N < ∞, and N

is independent of the Yi Then E SN = (E N )(EY1), where the Sn are the partial sums of the Yi

Proof Sn− n(E Y1) is a martingale, so E Sn∧N = E (n ∧ N )E Y1 by optional stopping The right hand sidetends to (E N )(E Y1) by monotone convergence Sn∧N converges almost surely to SN, and we need to showthe expected values converge

So by dominated convergence, we have E Sn∧N → E SN

Wald’s second identity is a similar expression for the variance of SN

We can use martingales to find certain hitting probabilities

Proposition 15.2 Suppose the Yi are i.i.d with P(Y1= 1) = 1/2, P(Y1= −1) = 1/2, and Sn the partialsum process Suppose a and b are positive integers Then

P(Sn hits −a before b) = b

a + b.

If N = min{n : Sn ∈ {−a, b}}, then E N = ab

Proof S2

n− n is a martingale, so E S2

n∧N = E n ∧ N Let n → ∞ The right hand side converges to E N

by monotone convergence Since Sn∧N is bounded in absolute value by a + b, the left hand side converges

by dominated convergence to E S2

N, which is finite So E N is finite, hence N is finite almost surely

Sn is a martingale, so E Sn∧N = E S0= 0 By dominated convergence, and the fact that N < ∞ a.s.,hence Sn∧N → SN, we have E SN = 0, or

a2P(SN = −a) + b2P(SN = b), substituting gives the second result

Based on this proposition, if we let a → ∞, we see that P(Nb < ∞) = 1 and E Nb = ∞, where

Nb= min{n : Sn= b}

Next we give a version of the Borel-Cantelli lemma

Trang 21

Proposition 15.3 Suppose An∈ Fn Then (An i.o.) and ( n=1P(An| Fn−1) = ∞) differ by a null set.Proof Let Xn = Pn

m=1[1Am − P(Am | Fm−1)] Note |Xn− Xn−1| ≤ 1 Also, it is easy to see that

E [Xn− Xn−1| Fn−1] = 0, so Xn is a martingale

We claim that for almost every ω either lim Xn exists and is finite, or else lim sup Xn = ∞ andlim inf Xn = −∞ In fact, if N = min{n : Xn ≤ −k}, then Xn∧N ≥ −k − 1, so Xn∧N converges by themartingale convergence theorem Therefore lim Xn exists and is finite on (N = ∞) So if lim Xn does notexist or is not finite, then N < ∞ This is true for all k, hence lim inf Xn = −∞ A similar argument showslim sup Xn= ∞ in this case

Now if lim Xn exists and is finite, thenP∞

n=11An = ∞ if and only ifP

P(An | Fn−1) < ∞ On theother hand, if the limit does not exist or is not finite, thenP 1A n = ∞ andP

P(An| Fn−1) = ∞

16 Weak convergence

We will see later that if the Xi are i.i.d with mean zero and variance one, then Sn/√

n converges inthe sense

P(Sn/√

n ∈ [a, b]) → P(Z ∈ [a, b]),where Z is a standard normal If Sn/√

n converged in probability or almost surely, then by the zero-onelaw it would converge to a constant, contradicting the above We want to generalize the above type ofconvergence

We say Fn converges weakly to F if Fn(x) → F (x) for all x at which F is continuous Here Fn

and F are distribution functions We say Xn converges weakly to X if FXn converges weakly to FX Wesometimes say Xn converges in distribution or converges in law to X Probabilities µn converge weakly iftheir corresponding distribution functions converges, that is, if Fµn(x) = µn(−∞, x] converges weakly

An example that illustrates why we restrict the convergence to continuity points of F is the following.Let Xn = 1/n with probability one, and X = 0 with probability one FXn(x) is 0 if x < 1/n and 1 otherwise

FXn(x) converges to FX(x) for all x except x = 0

Proposition 16.1 Xn converges weakly to X if and only if E g(Xn) → E g(X) for all g bounded andcontinuous

The idea that E g(Xn) converges to E g(X) for all g bounded and continuous makes sense for anymetric space and is used as a definition of weak convergence for Xn taking values in general metric spaces.Proof First suppose E g(Xn) converges to E g(X) Let x be a continuity point of F , let ε > 0, and choose

δ such that |F (y) − F (x)| < ε if |y − x| < δ Choose g continuous such that g is one on (−∞, x], takes valuesbetween 0 and 1, and is 0 on [x + δ, ∞) Then FXn(x) ≤ E g(Xn) → E g(X) ≤ FX(x + δ) ≤ F (x) + ε.Similarly, if h is a continuous function taking values between 0 and 1 that is 1 on (−∞, x − δ] and 0

on [x, ∞), FX n(x) ≥ E h(Xn) → E h(X) ≥ FX(x − δ) ≥ F (x) − ε Since ε is arbitrary, FX n(x) → FX(x).Now suppose Xn converges weakly to X If a and b are continuity points of F and of all the FXn,then E 1[a,b](Xn) = FXn(b) − FXn(a) → F (b) − F (a) = E 1[a,b](X) By taking linear combinations, we have

E g(Xn) → E g(X) for every g which is a step function where the end points of the intervals are continuitypoints for all the FXn and for FX Since the set of points that are not a continuity point for some FXn orfor FX is countable, and we can approximate any continuous function on an interval by such step functionsuniformly, we have E g(Xn) → E g(X) for all g such that the support of g is a closed interval whose endpointsare continuity points of FX and g is continuous on its support

Let ε > 0 and choose M such that FX(M ) > 1 − ε and FX(−M ) < ε and so that M and −M arecontinuity points of FX and of the FXn By the above argument, E (1[−M,M ]g)(Xn) → E (1[−M,M ]g)(X),

Trang 22

where g is a bounded continuous function The difference between E (1[−M,M ]g)(X) and E g(X) is bounded

by kgk∞P(X /∈ [−M, M ]) ≤ 2εkgk∞ Similarly, when X is replaced by Xn, the difference is bounded bykgk∞P(Xn∈ [−M, M ]) → kgk/ ∞P(X /∈ [−M, M ]) So for n large, it is less than 3εkgk∞ Since ε is arbitrary,

E g(Xn) → E g(X) whenever g is bounded and continuous

Let us examine the relationship between weak convergence and convergence in probability Theexample of Sn/√

n shows that one can have weak convergence without convergence in probability

Proposition 16.2 (a) If Xn converges to X in probability, then it converges weakly

(b) If Xn converges weakly to a constant, it converges in probability

(c) (Slutsky’s theorem) If Xn converges weakly to X and Yn converges weakly to a constant c, then

Xn+ Yn converges weakly to X + c and XnYn converges weakly to cX

Proof To prove (a), let g be a bounded and continuous function If nj is any subsequence, then thereexists a further subsequence such that X(njk) converges almost surely to X Then by dominated convergence,

E g(X(njk)) → E g(X) That suffices to show E g(Xn) converges to E g(X)

For (b), if Xn converges weakly to c,

P(Xn− c > ε) = P(Xn > c + ε) = 1 − P(Xn≤ c + ε) → 1 − P(c ≤ c + ε) = 0

We use the fact that if Y ≡ c, then c + ε is a point of continuity for FY A similar equation showsP(Xn− c ≤ −ε) → 0, so P(|Xn− c| > ε) → 0

We now prove the first part of (c), leaving the second part for the reader Let x be a point such that

x − c is a continuity point of FX Choose ε so that x − c + ε is again a continuity point Then

P(Xn+ Yn ≤ x) ≤ P(Xn+ c ≤ x + ε) + P(|Yn− c| > ε) → P(X ≤ x − c + ε)

So lim sup P(Xn+ Yn ≤ x) ≤ P(X + c ≤ x + ε) Since ε can be as small as we like and x − c is a continuitypoint of FX, then lim sup P(Xn+ Yn≤ x) ≤ P(X + c ≤ x) The lim inf is done similarly

We say a sequence of distribution functions {Fn} is tight if for each ε > 0 there exists M such that

Fn(M ) ≥ 1 − ε and Fn(−M ) ≤ ε for all n A sequence of r.v.s is tight if the corresponding distributionfunctions are tight; this is equivalent to P(|Xn| ≥ M ) ≤ ε

Theorem 16.3 (Helly’s theorem) Let Fn be a sequence of distribution functions that is tight There exists

a subsequence nj and a distribution function F such that Fn j converges weakly to F

What could happen is that Xn= n, so that FXn → 0; the tightness precludes this

Proof Let qk be an enumeration of the rationals Since Fn(qk) ∈ [0, 1], any subsequence has a furthersubsequence that converges Use diagonalization so that Fnj(qk) converges for each qk and call the limit

F (qk) F is nondecreasing, and define F (x) = infqk≥xF (qk) So F is right continuous and nondecreasing

If x is a point of continuity of F and ε > 0, then there exist r and s rational such that r < x < s and

F (s) − ε < F (x) < F (r) + ε Then

Fnj(x) ≥ Fnj(r) → F (r) > F (x) − εand

Trang 23

Proposition 16.4 Suppose there exists ϕ : [0, ∞) → [0, ∞) that is increasing and ϕ(x) → ∞ as x → ∞.

If c = supnE ϕ(|Xn|) < ∞, then the Xn are tight

Proof Let ε > 0 Choose M such that ϕ(x) ≥ c/ε if x > M Then

P(|Xn| > M ) ≤

Z ϕ(|Xn|)c/ε 1(|Xn |>M )dP ≤εcE ϕ(|Xn|) ≤ ε

17 Characteristic functions

We define the characteristic function of a random variable X by ϕX(t) = E eitx

for t ∈ R

Note that ϕX(t) =R eitx

PX(dx) So if X and Y have the same law, they have the same characteristicfunction Also, if the law of X has a density, that is, PX(dx) = fX(x) dx, then ϕX(t) = R eitxfX(x) dx, so

in this case the characteristic function is the same as (one definition of) the Fourier transform of fX.Proposition 17.1 ϕ(0) = 1, |ϕ(t)| ≤ 1, ϕ(−t) = ϕ(t), and ϕ is uniformly continuous

Proof Since |eitx| ≤ 1, everything follows immediately from the definitions except the uniform continuity.For that we write

|ϕ(t + h) − ϕ(t)| = |E ei(t+h)X− E eitX| ≤ E |eitX(eihX− 1)| = E |eihX− 1|

|eihX− 1| tends to 0 almost surely as h → 0, so the right hand side tends to 0 by dominated convergence.Note that the right hand side is independent of t

Proposition 17.2 ϕaX(t) = ϕX(at) and ϕX+b(t) = eitbϕX(t),

Proof The first follows from E eit(aX)= E ei(at)X, and the second is similar

Proposition 17.3 If X and Y are independent, then ϕX+Y(t) = ϕX(t)ϕY(t)

Proof From the multiplication theorem,

E eit(X+Y )= E eitXeitY = E eitXE eitY

Note that if X1 and X2are independent and identically distributed, then

ϕX1−X 2(t) = ϕX1(t)ϕ−X2(t) = ϕX1(t)ϕX2(−t) = ϕX1(t)ϕX2(t) = |ϕX1(t)|2.Let us look at some examples of characteristic functions

(a) Bernoulli: By direct computation, this is peit+ (1 − p) = 1 − p(1 − eit)

(b) Coin flip: (i.e., P(X = +1) = P(X = −1) = 1/2) We have 12eit+12e−it= cos t

Trang 24

(e) Binomial: Write X as the sum of n independent Bernoulli r.v.s Bi So

ϕ(t) = 1

b − a

Z b a

eitxdx = e

itb− eita

(b − a)it .Note that when a = −b this reduces to sin(bt)/bt

(h) Exponential:

Z ∞ 0

λeitxe−λxdx = λ

Z ∞ 0

e(it−λ)xdx = λ

λ − it.(i) Standard normal:

The martingale convergence theorems are another set of important consequences of optional stopping .The main step is the upcrossing lemma The number of upcrossings of an... becausethey have to go ahead of where they wanted to stop to know to stop there

We use the notation a ∧ b = min(a, b) and a ∨ b = max(a, b) The proof of the following is immediatefrom the. .. submartingale If we obtain the inequality for the the number ofupcrossings of the interval [0, b − a] by the process Yk, we will have the desired inequality for upcrossings ofX

So we

Định dạng
Số trang	49
Dung lượng	328,82 KB