convergence of probability measures by patrick billingsley 1968

Asymptotic distribution theorems in probability and statistics have from thebeginning depended on the classical theory of weak convergence of distribu-tion functions in Euclidean space-c

Trang 1

Convergence of

Probability Measures

Patrick Billingsley

Departments of Statistics and Mathematics

The University of Chicago

JOHN WILEY & SONS, New York • Chichester • Brisbane • Toronto

Trang 2

Reproduction or translation of any part of this work beyond that permitted by Sections 107 or 108 of the 1976 United States Copyright Act without the permission of the copyright owner

is unlawful Requests for permission or further information should be addressed to the Permissions Department, John Wiley & Sons, Inc

Library of Congress Catalog Card Number : 68-23922

S BN 471 07242 7

Printed in the United States of America

20 19 18 17 16 15 14 13

Trang 3

TO MY MOTHER

Trang 4

Asymptotic distribution theorems in probability and statistics have from thebeginning depended on the classical theory of weak convergence of distribu-tion functions in Euclidean space-convergence, that is, at continuity points

of the limit function The past several decades have seen the creation andextensive application of a more inclusive theory of weak convergence ofprobability measures on metric spaces There are many asymptotic resultsthat can be formulated within the classical theory but require for their proofsthis more general theory, which thus does not merely study itself This book

is about weak-convergence methods in metric spaces, with applicationssufficient to show their power and utility

The Introduction motivates the definitions and indicates how the theorywill yield solutions to problems arising outside it Chapter 1 sets out the basicgeneral theorems, which are then specialized in Chapter 2 to the space ofcontinuous functions on the unit interval and in Chapter 3 to the space offunctions with discontinuities of the first kind The results of the first threechapters are used in Chapter 4 to derive a variety of limit theorems fordependent sequences of random variables

Although standard measure-theoretic probability and metric-space ogy are assumed, no general (nonmetric) topology is used, and the few resultsrequired from functional analysis are proved in the text or in an appendix Mastering the impulse to hoard the examples and applications till the last,thereby obliging the reader to persevere to the end, I have instead spreadthem evenly through the book to illustrate the theory as it emerges in stages Chicago, March 1968-

topol-

Patrick Billingsley

vii

Trang 5

My thanks go to Soren Johansen, Samuel Karlin, David Kendall, RonaldPyke, and Flemming Topsoe, who read large parts of the manuscript ; thebook owes much to their detailed suggestions, and I am very grateful Ishould also like to thank Mary Woolridge for her typing, cheerful, swift,and error-free

The writing of this book was supported in part by the Statistics Branch,Office of Naval Research, and in part by Research Grant No 8026 from theDivision of Mathematical, Physical, and Engineering Sciences of the NationalScience Foundation

vm

Trang 6

CHA E 1 EAK CNEGENCE IN ME IC ACE 7

1 Measures in Metric paces, 72

Measures and Integrals, 7, ightness, 9

roperties of eak Convergence, 11

3

ortmanteau heorem, 11, ther Criteria, 14

ome pecial Cases, 17

4.

Euclidean pace, 17, he Circle, 18, he pace `°, 19, he pace C, 19, roduct paces, 20

Convergence in Distribution, 22

5

andom Elements, 22, Convergence in bution, 23, Convergence in robability, 24, roduct paces, 26

Distri-eak Convergence and Mappings, 29

6.

Continuous Mappings, 29, Main heorem,

30, Integration to the Limit, 31, An Extension

of heorem 5 1, 33rohorov's heorem, 35

elative Compactness, 35, he Direct heorem, 37, he Converse, 40

ix

Trang 7

x Contents

7 First Applications, 41mooth Functions, 41, he Central Limitheorem, 42, Characteristic Functions, 45, heCramer-old Device, 48, Local and IntegralLimit heorems, 49, eak Convergence on theCircle and orus, 50

CHA E 2 HE ACE C

54

8 eak Convergence and ightness in C, 54eak Convergence, 54, ightness, 54, andomFunctions, 57, Coordinate ariables, 60

9 he Existence of iener Measure, 61iener Measure, 61, he Brownian Bridge, 64,eparable tochastic rocesses, 65

10 Donsker's heorem, 68

he heorem, 68, An Application, 70, ANecessary Condition for ightness, 73, Anotherroof of Donsker's heorem, 73

11 Functions of Brownian Motion aths, 77Maximum and Minimum, 77, he Arc ine Law,

80, he Brownian Bridge, 83

12 Fluctuations of artial ums, 87Maxima, 87, roduct Moments, 88, Applica-tions, 89, roof of heorem 12.1, 91, Moments,

94, A ightness Criterion, 95, Further equalities, 98

In-13 Empirical Distribution Functions, 103CHA E 3 HE ACE D

Trang 8

Contents xi

Existence, 130, ther Criteria, 133, eparable tochastic rocesses, 134

16 Applications, 137 Donsker's heorem, 137, Dominated Measures,

139, Empirical Distribution Functions, 141

17. andom Change of ime, 143 andomly elected artial ums, 143, andom Change of ime, 144, Applications, 145, enewal heory, 148

18 he niform opology, 150CHA E 4. DEENDEN AIABLE

21 Functions of Mixing rocesses, 182 reliminaries, 182, Functional Central Limit heorem, 184, Applications, 191, Diophantine Approximation, 193, Nonstationarity, 194

22. Empirical Distribution Functions, 195 99-Mixing rocesses, 195, Functions of 9 9- Mixing rocesses, 199

Trang 9

xii Contents

A ENDIX II MICELLANY

222

Measurability, 222, Change of ariable, 222,

ail robabilities, 223, cheffe's heorem, 223,

ubspaces, 224, roduct paces, 224, ability of Dh, 225, Helly's heorem, 226,

Measur-Kolmogorov's heorem, 228, Measurability ofome Mappings, 230, More Measurability, 232

he roblem of Measure, 233, eparableMeasures, 234, he opology of eak Con-vergence, 236, rohorov's heorem, 239

Trang 10

CNEGENCE F BABILIY MEA E

Trang 11

is the unit normal distribution function, then

(3)

for all x (n -+ oc, the probability p of success fixed)

We say of arbitrary distribution functions Fn and F on the line that F„converges weakly to F, which we indicate by writing Fn = F, if (3) holds forall continuity points x of F Thus the De Moivre-Laplace theorem assertsthat (1) converges weakly to (2) ; since (2) is everywhere continuous, theproviso about continuity points is vacuous in this case If Fn and F aredefined by

Trang 12

Fn and F These probability measures, defined on the class of Borel subsets

of the line, are uniquely determined by the requirements

holds for each x

Let aA denote the boundary of a subset A of the line ; aA consists of thosepoints that are limits of sequences of points in A and are also limits ofsequences of points outside A Since the boundary of (- oo, x] consists of thesingle point x, (6) is equivalent to

(7)where we have written A for (- oo, x] The fact of the matter is that F,, = F

holds if and only if the implication (7) is true for every Borel set A-a resultproved in Chapter 1

Let us distinguish by the term P-continuity set those Borel sets A for which

P(aA) = 0, and let us say that P,, converges weakly to P, and write P,z => P,

if P,(A) -P(A) for each P-continuity set-that is, if (7) holds As justasserted, Ps = P if and only if the corresponding distribution functionssatisfy F„ = F

This reformulation of the concept of weak convergence clarifies the reasonwhy we allow (3) to fail if F has a jump at x Without this exemption, (4)would not converge weakly to (5), but this example may appear artificial

If we turn our attention to probability measures P,, and P, however, we seethat Pn(A) -* P(A) may fail if P(aA) > 0 even in the De Moivre-Laplacetheorem The measures Pn and P generated by (1) and (2) satisfy

(8)and(9)

e 2u2 duJ2~r _A

Trang 13

for Borel sets A Now if A consists of the countably many points

then Pn(A) = 1 for all n and P(A) = 0, so that P,,(A) -> P(A) is impossible Since aA is the entire real line, this does not violate (7)

Although the concept of weak convergence of distribution functions is tied

to the real line (or to Euclidean space, at any rate), the concept of weakconvergence of probability measures can be formulated for the generalmetric space, which is the real reason for preferring the latter concept Let S

be an arbitrary metric space, let / be the class of Borel subsets of S (v° isthe a-field generated by the open sets), and consider probability measures PnandP defined on Y Exactly as before, we define weak convergence P,, =>- P

by requiring the implication (7) to hold for all Borel sets A In Chapter 1 weinvestigate the general theory of this concept of convergence and see what itreduces to in various particular metric spaces We prove there, for example,that Pn converges weakly to P if and only if

where C(S) denotes the class of bounded, continuous real-valued functions

on S (In order to conform with general mathematical usage, we take (10)

as the definition of weak convergence, so that (7) becomes a necessary andsufficient condition instead of a definition )

Chapter 2 concerns weak convergence in the space C = C[0, 1] with theuniform topology ; C is the space of all continuous real functions on theclosed unit interval [0, 1], metrized by taking the distance between twofunctions x = x(t) and y = y(t) to be

(11)

P(x, y) =O<t<1sup Ix(t) - y(t)1

An example of the sort of application made in Chapter 2 will show theutility of a general treatment of weak convergence-one that goes beyond theclassical Euclidean case Let ~1, ~2, be a sequence of independent,identically distributed random variables defined on some probability space(S2, -4, P) If the ~n have mean 0 and variance a 2, then, according to theLindeberg-Levy central limit theorem, the distribution of the normalized sum(12)

Trang 14

0) In other words, construct the function X n (w)

whose value at a point t of [0, 1] is(13)

For each co, X n (w) is an element of the space C Let P n be the distribution

of X n (w) in C, defined for Borel subsets A of C-Borel sets relative to themetric (11)-by

by a particle under Brownian motion

If A = {x :x(l) < a}, then, since the value of the function X n (w) at t = 1

It also turns out that

a

W{x :x(1) < a} = 1

e-' '2 du,J27r -~o

so that (14) does contain the Lindeberg-Levy theorem The sum ~j + • + $n may be interpreted as the position at time n in arandom walk The central limit theorem asserts that this position (properlynormalized) is, for n large, distributed approximately as the position at timet

1 of a particle in Brownian motion The relation (14) asserts that the

Trang 15

Introduction 5 entire path of the random walk during the first n steps is, for n large, distributed approximately as the path up to time t = 1 of a particle under Brownian motion

To see in a concrete way that (14) contains information going beyond the central limit theorem, consider the set

A = {x : sup x(t) < c

o_<t<1 Again it turns out that W(2A) = 0, so that (14) implies

If we evaluate the right-most member of (15) (which we can do in a number

of ways, for example, le, by computing the limit on the left for some specially selected sequence {~n} that makes the computation easy), then we have

a limit theorem for the distribution of maxk < n Sk under the hypotheses of the Lindeberg-Levy theorem

As a final example involving Xn(w), take A to be the set of x in C for which the set {t : x(t) > 0} has Lebesgue measure at most a (we assume

0 < a < 1) As before, Pn(A) -3- W(A) Since the Lebesgue measure of {t : Xn(t, (o) > 0} is essentially the fraction of the partial sums S1, S2, , Sn that exceed 0, this argument leads to an arc sine law under the hypotheses of the Lindeberg-Levy theorem Chapter 2 contains the details of all these derivations

We can in this way use the theory of weak convergence in C to obtain a whole class of limit theorems for functions of the partial sums S,, S2, , Sn The fact that Wiener measure W is the weak limit of the distribution in C

of the random function Xn(w) can also be used to prove theorems about W, and W is interesting in its own right

Chapter 3 specializes the theory of weak convergence to another space of functions on [0, 1]-the space D = D[0, 1] of functions having discontinuities of at most the first kind This is the natural space in which to analyze the behavior of empirical distribution functions, for example Let j1, 2, be independent random variables on (0, -4, P),

each uniformly distributed on [0, 1] Let Fn(t, w)

be the empirical distribution function of ~1(co), p 1 , ~n(w) ; for 0 < t < 1, Fn(t, w) is the frac-

tion of integers k, 1 < k < n, for which ~k(w) <

t Now let Yjco) be the element of D whose

value at t is

Yn(t, co) = ,Jn(Fn(t, co) - t)

Trang 16

6 Introduction

If Dis metrized in the right way, it becomes possible to speak of the tion in D of the random function Yn (w) and to prove that this distributionconverges weakly as n tends to infinity Just as in the case of the randomelement of C defined by (13), we can then go on to derive the limitingdistributions of

distribu-sup Jn(F,,,(t, (o) - t) = sup Yn(t, w)

o<_t51

o<_t<1

and related quantities that arise in statistics

Chapter 4 concerns weak convergence of the distributions of randomfunctions derived from various dependent sequences of random variables Many of the conclusions in Chapters 2, 3, and 4, although not requiringfunction-space concepts for their statement, could hardly have been derivedwithout function-space methods

Standard measure-theoretic probability and metric-space topology areused from the beginning Although the point of view throughout is that offunctional analysis (a function is a point in a space), nothing of functionalanalysis is assumed (beyond an initial willingness to view a function as apoint in a space) ; all function-analytic results needed are proved in the text

or else in Appendix I (which also gathers together for easy reference someresults in metric-space topology)

Remarks The main papers that lead to the development of this theory were Kolmogorov (1931), Erdos and Kac (1946 and 1947), Doob (1949), and Donsker (1951 and 1952) Prohorov (1953 and 1956) and Skorohod (1956) gave the theory its present form Le Cam (1957) and Varadarajan (1958a and 1961a) have extended it to general topological spaces

Trang 17

CHAPTER 1

Weak Convergence in Metric Spaces

1 MEASURES IN METRIC SPACES

Let S be a metric space We shall study probability measures on the class °

of Borel sets in S Here ? is the a-field generated by the open sets-the smallest a-field containing all the open sets-and a probability measure on

9 is a nonnegative, countably additive set function P with P(S) = 1

If such probability measures P,, and P satisfy f s f dP,, * f s f dP for every bounded, continuous real function f on S, we say that Pn converges weakly to P and write P,, = P Our aim in this chapter is to study this concept in detail ; we begin with some properties of individual probability measures on (S, O).

Although we must sometimes assume separability or completeness, most

of the theorems in this chapter hold for an arbitrary metric space S The spaces in our applications are usually separable and complete ; since they rarely have further regularity properties, such as local compactness, we never impose further restrictions

Measures and Integrals

THEOREM 1 1 Every probability measure on (S, s") is regular; that is,

if A E 9 and 8 > 0, then there exist a closed set F and an open set G such that

Trang 18

8 Weak Convergence in Metric Spaces

for some b, since the latter sets decrease to A as 6 1 0 Hence we need onlyshow that the class 9 of Borel sets with the asserted property is a Q-field tGiven sets A n in 9, choose closed sets F,, and open sets G n such that

F,, C An c G n and P(G n - Fn) < e/2n+1. If G = U,Gn, and if F

-Un<noFn, with no so chosen that P(UnFn - F) < E/2, then F ( U,An ( Gand P(G - F) < E Thus 9 is closed under the formation of countableunions ; since 9 is obviously closed under complementation, the proof iscomplete

Theorem 1 1 implies that P is determined by the values of P(F) for closedsets F Theorem 1 3 shows that P is determined by the values of f f dP$for bounded, continuous real functionsfdefined on S Denote by C(S) theclass of such functions f. It is shown on p 222§ that each f in C(S) ismeasurable Y Everything depends on the following result, which showshow to approximate the indicator (or characteristic function) IF of a closedset F by elements of C(S)

THEOREM 1 2 IfF is closed and e positive, there is a function f in C(S)such that f(x) = 1 if x EF, f(x) = 0 if p(x, F) > e, and 0 < f(x) < 1 for all

x The function f may be taken to be uniformly continuous

t We have defined the class 9' of Borel sets as the a-field generated by the open sets, which

is the same thing as the a-field generated by the closed sets and is the one appropriate for the present theory For related (mostly inappropriate) a-fields, see Problem 6

$ When it is the entire space, we omit the region of integration

§ Of Appendix II, a miscellany to which most measurability questions are relegated

Trang 19

Measures in Metric Spaces 9

then f has the required properties-it is even uniformly continuous Thedrawing graphs this f for F [a, b] on the line

THEOREM 1 3 Probability measures P and Q - on (S, 9) coincide if

(1 3)

JfdP JfdQfor each f in C(S)

Proof Suppose F is closed Start with (1 1) and define, for each positiveinteger u,

(1 4)

9qu(t) = p(ut)and

(1 5)

fu(x) = pu(p(x, F)) Then {fu} is a nonincreasing sequence of elements of C(S) converging point-wise to IF. By the bounded convergence theorem, P(F) = limuf fu dP andQ(F) = limu f fu dQ, so that, if (1 3) holds for all f in C(S), P(F) = Q(F) Since P and Q agree for all closed sets, it follows by Theorem 1 1 that P and

Q are identical

Thus the values of f fdP for f in C(S) completely determine the values of

P(A) for A in Y This fact underlies the circle of ideas centering on the notion

of weak convergence ; although we have defined weak convergence by ing the convergence of the integrals of functions in C(S), in the next section

requir-we shall characterize it in terms of the convergence of the measures of certainsets

Tightness

The following notion of tightness proves important both in the theory of weakconvergence and in its applications A probability measure P on (S, 9) istight if for each positive s there exists a compact (p 217) set K such thatP(K) > 1 - e Clearly, P is tight if and only if it has a a-compact support ]'

By Theorem 1 1, P is tight if and only if P(A) is, for each A in p, thesupremum of P(K) over the compact subsets K of A

In a space that is a-compact, every probability measure is tight-whichcovers k-dimensional Euclidean space The following result, which alsocovers the Euclidean case, is more useful

t A support of a probability measure is a set A in So with P(A) = 1 ; a set is a-compact

if it can be represented as a countable union of compact sets The characterization of a tight P as having a a-compact support is inappropriate as a definition because it does not generalize in the right way to families of probability measures (see Section 6)

Trang 20

THEOREM 1 4 If S is separable and complete, then each probability measure on (S, p ') is tight.

Proof. Since S is separable, there is, for each n, a sequence A, n1 , A nt , of

open 1/n-spheres covering S. Choose i n so that P(Ui<inAni) > 1 - e/2n Bythe completeness hypothesis, the totally bounded set I In>1Ui<i,nAni hascompact closure K (see p 217) Since clearly P(K) > 1 - E, the theoremfollows

Theorem 1 4 is false without the hypothesis of completeness ; whether thehypothesis of separability can be suppressed is equivalent to the problem ofmeasure These matters are discussed in Appendix III j'

Remarks Theorem 1 4 is due to Ulam (see Oxtoby and Ulam (1939)) ; LeCam (1957) introduced the term "tight "

PROBLEMS.;

1 Say that a function f separates sets A and B if f (x) = 0 for x in A, f (x) = 1 for x

in B, and 0 < f (x) < 1 for all x If A and B are at positive distance, they can be separated

by a uniformly continuous f [Theorem 1 2] If A and B have disjoint closures but are

at distance 0, they can be separated by a continuous f [f(x) = p(x, A)/(p(x, A) + p(x, B))]

but not by a uniformly continuousf There is no continuous f separating A and B if their closures meet ; there is no f separating A and B if they meet.

2 Give examples of distinct topologies that give rise to the same class of Borel sets

3 If S can be embedded as an open set in some complete metric space, then [Kelley (1955, p 207)] it is topologically complete Since a locally compact S is open in its completion, it is topologically complete Hence Theorem 1 4 applies if S is separable and locally compact Since such an S is a-compact [being a union of open sets with compact closures and hence (p 216) a countable such union], it also follows directly that each probability measure on it is tight ; Euclidean space is an example

4 Let S be a Hilbert space with a countably infinite orthonormal basis x1, x2, Since S is separable and complete, Theorem 1 4 applies However, - no set with nonempty interior is compact [a nonempty interior must, for some x and e, contain all the points

x + 6x], so that S is neither locally compact nor [Baire's category theorem ; Kelley (1955,

p 200)] a-compact If P assigns positive mass to each element of a countable, dense set, then P has no support locally compact in the relative topology

5 Adapt Problem 4 to the general Banach space of countably infinite dimension [there exist points x1 , x2, with supra ~IxnJJ < oo and infra#n IIxm - xnll > 0 ; see Banach (1932, p 83)] ; C[0, 1], important in probability, is such a space, which explains why a theory based on local compactness is of small utility in this subject (See also Problem 5 in Section 3.)

t Although Theorem 1 4 as given suffices for all the applications in this book, it is natural

to inquire after extensions It is to questions of just this sort that Appendix III is devoted

I Some problems involve concepts not required for an understanding of the text itself ; there are no problems whose solutions are used later in the text A simple assertion is understood to be prefaced by "show that " Square brackets contain hints or indications of solutions.

Trang 21

Properties of Weak Convergence 11

6 We have defined <50 as the a-field generated by the open sets, which we can indicate by writing 9' = a(open sets) In the same way, define 501 = a(closed G5 sets) (a set is a Ga

if it is a countable intersection of open sets), define ° ° 2 = a(C(S)) (the smallest a-field with respect to which each function in C(S) is measurable), and define 9'3 = cr(open spheres),

b4 = a(compact sets), and `9'5 = a(compact Gb sets) In a metric space each closed set is

a G5 Use this fact and Theorem 1 2 to prove

.5_Y1= ° 2 D 53 :D Y4 =505 Show that 9 = 93if S is separable Show that So = 505if S is a-compact (which will be true if S is separable and locally compact) We may have ° "2 5` 93 (even if S is locally compact) : Take S uncountable and discrete We may have Y3 54 54 (even if S is separable and complete) : Take S to be the Hilbert space in Problem 4 (The situation differs in the general topological space, where one must consider two classes of sets : The Borel sets are taken as the elements sometimes of 50 and sometimes of 5°4, and the Baire sets are taken as the elements sometimes of S°2 and sometimes of Y5the terminology varies )

7 In connection with tightness, this fact is interesting : Suppose P is defined on (S, 50),

but suppose at the outset only that it is finitely additive If, for each A in 50, P(A) = sup

P(K) with K ranging over the compact subsets of A, then P is countably additive after all

2 PROPERTIES OF WEAK CONVERGENCE

We have defined Pn => P to mean that f fdPn f fdP for eachfin the classC(S) of bounded, continuous real functions on S Note that, since theintegrals f fdP completely determine P (Theorem 1 3), the sequence {Pn}

cannot converge weakly to two different limits at the same time Note alsothat weak convergence depends only on the topology of S, not on the specificmetric that generates it : Two metrics generating the same topology give rise

to the same classes Y and C(S) and hence to the same notion of weakconvergence.'

1, , k, where E is positive and thefZ lie in C(S), then weak convergence is convergence

in this topology The topological structure of Z(S), which will be of no direct concern to

us, is discussed in Appendix III

Trang 22

12 Weak Convergence in Metric Spaces (i) P n P.

(ii) limn f f dPn = f f dPfor all bounded, uniformly continuous real f (iii) lim supra Pn(F) < P(F) for all closed F.

(iv) lim infra Pn (G) > P(G) for all open G.

(v) limn Pn(A) = P(A) for all P-continuity sets A

A couple of examples will show the significance of these conditions Let

P be a unit mass at the point x (P(A) is 1 or 0, according as x lies in A or note'), and let Pn be a unit mass at xn If xn > x, then f f dPn = f(xn) -* f(x) =

f f dP for all f in C(S), so that P,, => P If xn does not converge to x, then, for some positive e, we have p(xn, x) > e for infinitely many n If f(y) qJ(s lp(x, y)) with 99 defined by (1 1), then f c- C(S)j(x) = 1, and f(x,,) = 0 for infinitely many n ; hence Pn cannot converge weakly to P Thus Pn => P

if and only if x,,, -> x, which provides an example we shall often use (Many putative weak-convergence theorems that are in fact not theorems can be disproved by specializing this example ) Since A is a P-continuity set if and only if x 0 aA, it is easy to check the equivalence of (i) and (v) in this case

If xn -+ x but the xn all differ from x, then there is strict inequality in (iii) for

F {x} and strict inequality in (iv) for the complementary set G = Fc ; moreover, if the xn are all distinct and A = {x2, X4 }, then P,(A) does not converge to P(A) or to anything else

On the line with the ordinary metric, the DeMoivre-Laplace theorem also illustrates the conditions in the theorem For a simpler example equally relevant, consider the measure Pn corresponding to a mass of 1/n at each of the points i/n, i = 1,2, , n Now Pn converges weakly to Lebesgue measure P confined to the unit interval, as follows from the fact that f f dPn

is an approximating sum to f f dP viewed as a Riemann integral If A consists

of the rationals, then P,(A) = 1 does not converge to P(A) 7 - 0 ; if G is an open set containing the rationals and having Legesgue measure near 0, then there is strict inequality in (iv).

We prove Theorem 2.1 by establishing the implications in the following diagram.

I (i) -> (ii) > (iii) H (iv)

6 > 0 For small enough e, G = {x : p(x, F) < e} satisfies P(G) < P(F) + S,

t Each subset of S mentioned is assumed to lie in Y

Trang 23

Properties of Weak Convergence 13 since the sets of this form decrease to F as s ,[ 0 If f(x) is the function defined by (1 2), then f is uniformly continuous on S, f(x) = 1 on F, f(x) - 0

on the complement Ge of G, and 0 < f(x) < 1 for all x Since (ii) holds, we have lim,, f f dPn = f f dP, which, together with the relations

P,, (F) = JF f dPn < J f f dPn and

J f dP =J f dP < P(G) < P(F) + 6, G implies

lim sup,, P,,(F) < limn J f dPn =J f dP < P(F) + b Since 6 was arbitrary, (iii) follows.

Proof of (iii) -* (i) Suppose that (iii) holds and that f c- C(S) We shall first show that

Z~ k [P(Fi-1) - P(Fi)] = k + k i P(Fi) 71 .This and a similar transformation of the sum on the left yield

Applying (2.1) to -f yields lim infra f f dPn > f f dP, which, together with (2.1) itself, proves weak convergence

The equivalence of (iii) and (iv) follows easily by complementation

Trang 24

Proof of (iii) k (v) Let A° denote the interior of A, and let A- denote its closure If (iii) holds, then so does (iv), and hence, for each A,

(2.3)

P(A-) > lim sup,,, P,,(A-) > lim sup,, P,,(A)

> lim inf z P,,,(A) > lim inf, Pn(A°) > P(A°).

If P(aA) = 0, then the extreme terms equal P(A) and limnPJA) = P(A) follows

Proof of (v) >- (iii) Since a{x : p(x, F) <_ 8} is containedt in {x : p(x, F) = 8}, these boundaries are disjoint for distinct a, and hence at most countably many of them can have positive P-measure Therefore, for some sequence

of positive 67, going to 0, the sets Fk = {x : p(x, F) < ok} are P-continuity sets

If (v) holds, then lim sup ra P,,(F) < lim,, Pf(Fk) = P(Fk) for each k ; if F

is closed, then Fk J, F, so that (iii) follows This completes the proof of Theorem 2.1

Other Criteria

It is sometimes convenient to prove weak convergence by showing that

P,JA) * P(A) for some special class of sets A.

THEOREM 2.2 Let GI be a subclass of 9P such that (i) °l' is closed under the formation offinite intersections and (ii) each open set in S is a finite or countable union of elements of QI If P,a(A) > P(A) for every A in G', then P,a =:> P Proof If A1, , A,,, lie in °ll, then so do their intersections ; hence, by the inclusion-exclusion formula,

m

Pn U A) = F'iP,(Ai) - Y_i=1 i7 Pn(AiAj) + Y-iikPn(AiA;Ak) -

IiP(Ai) - Ei,P(AiA,) + EiykP(AiA,Ak)

= P U Ai

0=1 l

If G is open, then G = UiAi for some sequence {Ail of elements of 1& Given

s, choose m so that P(Ui < mAi) > P(G) - E By the relation just proved, P(G) - e < P(U i <<,nAi) - limn Pn(Ui<mAi) < lim inf,n P,JG) Since 8 was arbitrary, condition (iv) of the preceding theorem holds

Let S(x, e) denote the (open) e-sphere about x.

COROLLARY 1 Let °l( be a class of sets such that (i) W is closed under the formation of finite intersections and (ii) for every x in S and every positive e

t The inclusion may be strict-in a discrete space, for example

Trang 25

Properties of Weak Convergence 15

there is an A in ( with x E A ° ~= A ~ S(x, e) If S is separable and if P,"(A)

P(A) for every A in ill, then P,,, = P.

Proof Condition (ii)t implies that, for each point x of an open set G,

x E A° c A c G for some A in V Since S is separable, there exists (see p 216) in V a finite or infinite sequence {A i } such that G (= UiA° and A i ( G, which implies G = Ui A i Thus Gll satisfies the hypotheses of Theorem 2.2 COROLLARY 2 Suppose that, for each finite intersection A of open spheres,

we have P, JA) * P(A), provided A is a P-continuity set If S is separable, then Pn => P.

Proof The boundaries aS(x, e), being contained in the sets {y : p(x, y) = e}, are disjoint (for fixed x) and hence have P-measure 0, with at most countably many exceptions Since

a(A r) B) (aA) v (aB),

it follows that the hypotheses of Corollary 1 are satisfied by the class G& of those P-continuity sets that are finite intersections of spheres, and the result follows.

Let us agree to call a subclass Y,' of ° a convergence-determining class if convergence P n (A) -> P(A) for all P-continuity sets A in YV' invariably entails the weak convergence of P n to P Corollary 2 becomes : In a separable space, the finite intersections of spheres constitute a convergence-determining class Let us further agree to call 'Y' a determining class if P and Q are identical whenever they agree on 'V The class of closed sets is a determining class and

so is any field that generates Y Although each convergence-determining class is clearly also a determining class, the following example shows that the converse fails Let S be the half-open interval [0, 1) with the ordinary metric ; let 'l'- be the class of sets [a, b) with 0 < a < b < 1 Then 'K is a determining class but not (as may be seen by taking P,, [P] a unit mass at

1 - 1 /n [0]) a convergence-determining class Although this one is artificial,

we shall see that the applications abound with real examples of determining classes that are not convergence-determining classes

We close this section with another condition for weak convergence A sequence {x,z} of real numbers converges to a limit x if and only if each subsequence {x } contains a further subsequence {x,z.} that converges to x.(It is convenient to denote a sequence of integers by {n'} rather than {nk} and a subsequence of {n'} by {n"} rather than {nk,} ) From this fact it is easy

to deduce a weak-convergence analogue

f This condition is slightly stronger than the requirement that the interiors of the elements

of I& form a base for the topology of S

Trang 26

THEOREM 2.3 We have P n => P if and only if each subsequence {P n•}

contains a further subsequence {P,,,, } such that P n - =>-P

We shall deal occasionally with weak convergence of P t to P when t goes

to infinity in a continuous manner Of course, this is defined to mean that (2.4)

for each sequence {t n } going to infinity Thus P t = P as t -* 00 if and only if

P t , => P for each sequence {t n } going to infinity, and nothing really new is involved We can also let t approach in a continuous manner some finite value t o

Remarks Theorem 2.1 dates back at least to Alexandrov (1940-43) Theorem 2.2 is due to Kolmogorov and Prohorov (1954) For other accounts of the theory, see the books

of Gikhman and Skorohod (1965), Hennequin and Tortrat (1965), and Parthasarathy (1967).

The Banach space C(S) has for its adjoint C* (S) the space of finite signed measures on

S I ; the weak* topology, or the C(S) topology of C* (S), relativized to the space Z(S) of probability measures on 50, is the topology described in the first footnote in this section (hence the "weak" in our "weak convergence") ; see Dunford and Schwartz (1958, pp 262 and 419) Varadarajan (1958a and 1961a) investigates the topological structure of Z(S) ; see also Appendix III.

For extensions of the theory to general topological spaces, see LeCam (1957), darajan (1958a and 1961a), and Kallianpur (1961)

Vara-If the metric space S is not separable, the a-field 9o generated by the spheres may be smaller than 5° Dudley (1966 and 1967) has a theory of weak convergence involving only sets in 9o and functions measurable 5°0.

If Pn => P, one can ask whether P,,,(A) -+ P(A) holds uniformly on a given-class of continuity sets ; see Ranga Rao (1962) and Billingsley and Topsoe (1967)

P-PROBLEMS

1 If S is countable and discrete, then Pn => P if and only if P n{x} , P{x} for each point set {x}

one-2 Let Pn and P be given by densities p andpn with respect to a measure A on (S, 5°)

If pn(x) - p(x) except for x in a set of A-measure 0, then Pn => P [see Scheffe's theorem on

p 224] Show by example that pn (x) -> p(x) may fail on a set of positive measure even though Pn => P.

3 Even though Pn => P, f f dPn f f dP may fail iff is bounded but not continuous or

if f is continuous but not bounded (even if the integrals exist) Give examples If S is compact, the second possibility does not exist What if S is not compact but P has compact support ?

4 The class of P-continuity sets (P fixed) form a field.

Trang 27

Some Special Cases 17

5 If 0& is a determining class, if P,,(A) > Q(A) for A C- ,,&, and if P,, P, it does not follow that P = Q [Define probability measures on the line by P,,{n1} = Pn{1 + n-1} =

P{0} = P{1} _ I and Q{0} = 1 Let B consist of the points 0, 1, n-1, and 1 + n1 (n =

1, 2, ) Define I& as the field of Borel sets A such that either (i) A n B is finite and

00A or else (ii) Ac n B is finite and 00Ac (This example is due to O Bjornsson.)]

6 Define what one should mean by determining classes and convergence-determining classes of elements of C(S) Give an example of a determining class that is not a convergence-determining class Show that a class uniformly dense in C(S) is a convergence- determining class.

7 If f is bounded and upper semicontinuous (p 218), then P,z P implies lim sup

J f dP <- J f dP.

8 Let { fe} be a family of real functions on S, equicontinuous at each x (for each x and

e, there exists a 6 for which p(x, y) < 6 implies, for all 8, 1 fe(x) - fe(y)l < e) If {f0} is uniformly bounded and S is separable, then P,, => P implies that J fedP,, -* J fedP uniformly

in 8 [First show that, for each s, there exists a countable partition D E = {DEk}of S into P-continuity sets such that lfg (x) - fe(y) < e for all 8 if x and y lie in the same DEk.Approxi- mate J fedP by Ekfg(xk)P(DEk) with xkin DEk,and similarly for J fe dP,,,and apply Scheffe's theorem (p 224).] (This result is due to Ranga Rao (1962) ; for extensions, see Billingsley and Topsoe (1967) and Topsoe (1967a and 1967b).)

3 SOME SPECIAL CASES

To say that F is continuous from above at x means that for each positive E

there exists a positive 6 such that x < y < x + be implies IF (x) - F(y)I < e

Trang 28

18 Weak Convergence in Metric SpacesFrom the definition (3 2) it follows that Fis nondecreasing in each variable.Hence F is continuous from above at x if and only if F(x) coincides withinfo, o F(x + be) = info, oP{y : y < x + be}. This infimum is just the P-

measure of the intersection ni > o {y : y < x + be} = {y : y < x}. Therefore F

is continuous from above at each x

Since F is nondecreasing in each variable and is everywhere continuousfrom above, it is continuous at x if and only if it is continuous from below atthat point, that is, if and only if for each positive E there exists a positive b

such that x - be < y < x implies IF(x) - F(y) I < E. Using the monotonicityonce more, we see that this condition is in turn equivalent to F(x) - sups>o F(x - be). The supremum being the P-measure of the union

Ua> o {y : y < x - (5e} = {y : y < x}, we see that F is continuous at x if andonly if F(x) = P{y : y < x}. Since {y : y < x} - {y :y < x} is exactly theboundary of {y : y < x}, F is continuous at x if and only if {y : y < x} is aP-continuity set

For distribution functions Fn and F, let us define Fn => F~ to mean thatthere is convergence Fn(x) -* F(x) at continuity points x ofF. By what hasjust been proved, if Pn =>P, then the corresponding distribution functionssatisfy Fn = F. Now an interval (a, b] is determined by the 2k (k - 1)-

dimensional hyperplanes containing its faces ; let i& be the class of intervalsfor which all these hyperplanes have P-measure 0 Each vertex of an element

ofQl is a continuity point ofF, and °1l is closed under the formation of finiteintersections Since only countably many parallel hyperplanes can havepositive P-measure, it follows by Corollary 1 to Theorem 2.2 that, if

Pn(A) P(A)for each A inQI, thenPn = P. SinceP(a, b] is a sum E ± F(x)

with x ranging over the 21 vertices of (a, b] and similarly forPn(a, b], Fn = F

implies that Pn(A) -~- P(A) holds for each A in W. Therefore Pn => P and

Fn => F are equivalent Thus the notion of weak convergence reduces in Rk to the ordinary notion

of the convergence of distribution functions In other words, the sets{y :y < x} form a convergence-determining class The proof above alsoshows that the rectangles (a, b] form a convergence-determining class

The Circle

The same sort of result obviously holds if S is the unit circle in the complexplane : Pn => P if and only ifPn(A) -+ P(A) for every arc A whose endpointshave P-measure 0 A sequence {x1, x2i } of points of S (complex numbers

of modulus 1) is said to be uniformly distributed if the allotment of points toeach arc is proportional to its length, in the sense that

n (3 3)

lim 1 IIA(x;) = P(A),

n-+ oo n i=1

Trang 29

Some Special Cases 1 9

where IA is the indicator, or characteristic function, ofA, and P is circularLebesgue measure so normalized that P(S) - 1 If P,n denotes the nthempirical distribution of the sequence-the measure corresponding to a mass

of 1/n at each of the points x 1 , x2 , , xrithis condition reduces to

Pn(A) * P(A) for arcs A, so that the sequence is uniformly distributed ifand only if P,, =>- P. Therefore, if every arc contains its proper quota ofpoints, in the sense of (3 3), then so does every other Borel set whoseboundary has Lebesgue measure 0 We prove in Section 7 a famous theorem

of Weyl, according to which {x1, x2, } is uniformly distributed if and only

ifn -1 17'1 (x;)u > 0 holds for every nonzero integer u.

Let Trk denote the natural projection from R`° to Rk, defined by lrk(x) _ (x1 , , xk) A finite-dimensional set, or cylinder, is by definition a set of theform Irk1H with k > 1 and H E £k. Since each 'rk is continuous and hencemeasurable (see p 222), the finite-dimensional sets lie in the o-field °° ofBorel sets in R°° Let 3~' denote the class of finite-dimensional sets Sinceeach set (3.4) lies in 3~' and since R°° is separable, 3~" generates 9°° Since.F is a (finitely additive) field, it follows that F is a determining class For fixed k and x, the sets (3.4) for different values of E have disjointboundaries (E < S implies Nk E ~ Nka) Applying Corollary 1 of Theorem 2 2

to the class G& of P-continuity sets in F, we see therefore that F is even aconvergence-determining class Thus P,z => P if and only if Pn(A) * P(A)holds for all finite-dimensional P-continuity sets A

The Space C

In C = C[0, 1], the space of continuous functions on [0, 1] with the uniformmetric p(x, y) = sups (x(t) - y(t)l (see p 220), the situation differs markedlyfrom that in R°° For points t1 , , tk in [0, 1], let Trt, tk be the mapping thatcarries the point x of C to the point (x(t1), , x(tk)) of Rk The finite-dimensional sets are now defined as sets of the form 7T-1akH with H E 'Ik.

Trang 30

20 Weak Convergence in Metric SpacesSince 7rtl tk is continuous, these sets lie in the class ' of Borel sets in C.

On the other hand, the closed sphere {y : p(x, y) < E} is the limit of thefinite-dimensional sets {y : Ix(i/n) - y(i/n)l < E, i = 1, , n} ; since C isseparable, each open set is a countable union of open spheres and hence ofclosed spheres, so that the finite-dimensional sets generate ' Since theyform a field, the finite-dimensional sets are thus a determining class

An example shows that the finite-dimensional sets do not form a gence-determining class Let P be a unit mass at 0 (the function that vanishesidentically), and let P,, be a unit mass at the function x, where

conver-if 0<t< 1 ,

n(3.5)

cannot converge weakly to P (For example, ifA = S(0, 2), thenP(aA) - 0,

while Pn (A) = 0 does not converge to P(A) = 1 ) But Pn(A) * P(A) doeshold for finite-dimensional P-continuity sets-in fact, if A = 1rt1~.tkH and

if2/n is smaller than the least of the nonzero ti , then Pn(A) = P(A)

The finite-dimensional sets are thus a determining class in C but not aconvergence-determining class The difficulty, interest, and usefulness ofweak convergence in C all spring from the fact that it involves considerationsgoing beyond those of finite-dimensional sets

Product Spaces

Let S = S' x S" be the product of metric spaces S' and S" If S is separable(which requires that S' and S" be separable), then the a-fields s", u", and9" of Borel sets in these spaces are related by ° _ 9' x /" (see p 225)

The two marginal distributions of a probability measure P on (S, 9) aredefined by P'(A') = P(A' x S"), A' E 9', and P"(A") = P(S' x A"),

A" E Y"

THEOREM 3 1 If S is separable, then a necessary and sufficient conditionfor Pn = P is that Pn (A' x A") -+ P(A' x A") for each P'-continuity set A'and each P"-continuity set A", where P' and P" are the marginal distributions

of P

Proof Let a, a', and a" denote the boundary operators in S, S', and S",

Trang 31

Some Special Cases 21

To prove sufficiency, we apply Corollary 1 of Theorem 2.2 to the class I&

of sets A' x A" with A' a P'-continuity set and A" a P"-continuity set Theclass Ill is closed under the formation of finite intersections and, byhypothesis, P,,(A) -> P(A) for A in W

Given (x', x") in S and s > 0, consider the sets

Ab == {Y' : P'(x', y') < 6} x {y" : P„"," y") < 6}

For distinct 6, the sets a'{y' : p'(x', y') < 6} are disjoint and the setsa"{y" : p"(x", y") < 6} are disjoint ; therefore A, lies in Ill for some 6 with

0 < 6 < E If S is metrized by

p((x', x"), (y', y")) = max {p'(x', y'), p"(x" , y")},then Aa is just the sphere with center (x', x") and radius 6 Hence I& satisfiesthe hypotheses of Corollary 1 of Theorem 2.2, as required

The sufficiency of the condition in Theorem 3 1 implies that themeasurable rectangles form a convergence-determining class (but says moreP(a(A' x A")) = 0 does not imply P'(a'A') = P"(a"A") = 0)

For given probability measures P' and P" on (S', Y') and (S", 9"), theproduct measure P' x P" is a probability measure on 9' x °" and hence, if

S is separable, on Y The following theorem, in which Pn and P areprobability measures on (S', 9') and Pn and P" are probability measures on(S", 9"), is an immediate consequence of Theorem 3 1

THEOREM 3 2 If S is separable, then Pn x P" =>P' x P" if and only if P' => P' and P" =>- P"

PROBLEMS

1 Show directly that F,, F if and only if F„(x) -+ F(x) for all x in a dense set and that

F,, => F if and only if lim sup,,, Fn(x) < F(x) and lim infra Fn(x - 0) > F(x - 0) for all

x (here F(x - 0) = sup s <x F(y))

2 If k > 1, the set of discontinuities of F, although having dense complement, need not

be countable A (k - 1)-dimensional hyperplane can contain at most countably many discontinuities if it is normal to none of the axes [To see the problem, consider first the hyperplane {(x1, x2) : x1 = -x2} in R2.]

3 If F,, => F and if F is continuous at each point of a closed set A, then

supxEA IFn(x) - F(x)l - 0.

4 The Levy distance ).(F, G) between two one-dimensional distribution functions is the infimum of those positive s such that F(x - e) - s < G(x) < F(x + e) + e for all x

Trang 32

22 Weak Convergence in Metric Spaces Interpret 2(F, G) geometrically in terms of the graphs of F and G Show that F,, => F if and only if 2(F, F) -> 0 ; prove that the collection of one-dimensional distribution functions is

a separable, complete metric space under 2

5 Problem 5 in Section 1 adapts the problem preceding it to the general Banach space

of countably infinite dimension In C a simple direct analysis is possible : Work with the functions x,,, defined by (3 5).

6 The uniform distribution on the unit square and the uniform distribution on its diagonal have identical marginal distributions Relate to Theorem 3 2.

7 Extend the product-space theory at the end of the section by showing that, for a countable product of separable spaces, the finite-dimensional sets (appropriately defined) form a convergence-determining class (R oO is a special case).

4 CONVERGENCE IN DISTRIBUTIONThe theory of weak convergence can be paraphrased as the theory ofconvergence in distribution' When stated in the terminology of the lattertheory, which involves no new ideas, many results assume a compact andperspicuous form

Random ElementsLet X be a mapping from a probability space (S, -4, P) into a metric space

S IfXis measurable (in the sense that Y-1Y c: -4 ; see p 222), we call it arandom element We shall say X is defined on its domain S2 (or (I2, °a, P))and in its range S and call it a random element of S IfS = R', we call X a

random variable ; if S - Rk, we call X a random vector ; if S = C, we call

X a random function

Random variables and random vectors are familiar objects, and theIntroduction contains (see formula (13) there) an example of a useful randomfunction (although its measurability was not proved) A variety of randomfunctions arise in a natural way in probability theory

The distribution' of X is the probability measure P = PX-1 on (S, 9)(4 1) P(A) = P(X-'A) = P{w : X(c)) E A} = P{XEA},

t This has nothing to do with the distributions of Schwartz

Trang 33

E n { f (Xn)} Since f s f(x)P(dx) = f o f (X) dP by the change-of-variable formula (4.3) and similarly for f f dP n , we have Xn 4 X if and only if E{f (XX)} -> E{f(X)} for every f c C(S).

Theorem 2.1 asserts the equivalence of the following five statements Call

a set A in' an X-continuity set if P{X E aA} = 0.

Trang 34

(i) Xn ~ X

(ii) limn E{f (Xn)} = E{f (X)}for all bounded, uniformly continuous real f.

(iii) lim sup,, P{Xn E F} G P{XE F} for all closed F.

(iv) lim infra P{X,n E G} > P{X E G} for all open G (v) limn P{X,n E A} = P{X E A} for all X-continuity sets A Each theorem about weak convergence can be similarly recast The following hybrid terminology is useful If X,, are random elements of

S, if Pn are the corresponding distributions, and if P is a probability measure

on (S, ), we say the Xn converge in distribution to P, and write(4 7)

Xn -9* P,

in case Pn = P There is the obvious corresponding version of Theorem 2.1

It is a great convenience to be able to pass from one to another of thethree equivalent concepts (4 5), (4.6), and (4.7), and we shall do so freely This is largely a matter of expedient phraseology For example, if randomvariables Xn have asymptotically a normal distribution with mean ,u andvariance a2, we shall express this fact by writing

a of S,

(4.11)

P{p(XX , a) > e} - 0for each positive s, we say Xn converges in probability to a and write

(4.12)

Xn 4 a.

Trang 35

Convergence in Distribution 25

If a is conceived as a constant-valued random element, then, as is easilyproved, X.4- a if and only ifX .4 a Alternatively, X 4 a if and only ifthe distribution of X,, converges weakly to the probability measure corre-sponding to a mass of 1 at the point a. The random elements Xn in (4.12)may, as usual, be defined on distinct probability spaces-only the range Sneed be common to them all

If X, and Y, have a common domain, it makes sense to speak of thedistance p(X,,, Yn)-the function with value p(X.(w), Yn(w)) at w. If S isseparable, p(X., Y,,) is a random variable (see p 225) In the followingtheorem, we assume that, for each n, X, and Y, do have a common domainand that S is separable

THEOREM 4.1 If Xn 4 X and p(X., Y,,) 4- 0, then Y .4 X

Proof If FE = {x : p(x, F) < E}, then

P{ Y E F} < P{P(X,, Y.) > E} + P{Xn E FE}

Since FE is closed, the hypotheses imply

lim sup,, P{ Y E F} < lim sup P{Xn EFE} < P{XEFE}

If F is closed, then FE J, F as E 1 0 and the result follows by Theorem 2 1 (theversion corresponding to convergence in distribution)

In the next theorem' we assume that, for each n, Y., Xin , X2 n , have acommon domain and that S is separable

THEOREM 4.2 Suppose that, for each u, X.n -7 X, as n > oo and thatX,,, -4 X as u > oo Suppose further that

(4.13)

lim lim sup P{ p(X,,n, Y,,) > E} = 0

u~ oo n-' ao

for each positive s Then Yn -4 X as n - oo

Proof Defining FE as before by FE = {x : p(x, F) < E}, we have

P{ Y EF} <_ P{Xun C _FE} + P{P(Xun, Y.) > E}

By the hypothesis Xu,, -9) Xu (n -) oo),

lim sup P{Y,, E F} < P{Xu EFE} + lim sup P{p(Xun, Y,,) > e}

n~oo

n-+ oo

By (4.13) and the hypothesis Xu -* X (u ) oo),

lim sup P{Y,, E F} < P{X E FE}

n-+ ao

The result follows as before

t The remainder of this section is not central to the theory ; after a cursory reading, it can

be consulted as the need arises

Trang 36

26 Weak Convergence in Metric SpacesSuppose now that X, X 1 , X2 . all have a common domain and that Sisseparable If

P{P(XX, X) > El-*0for every positive e, we say that X n converges in probability to X and write(4.14)

< P{p(X, A) < E, X0A} + P{p(X, Ac) < E, X E A}.

As s -> 0, the right-hand member of this inequality tends to P{X E aA} = 0.

Since (4.15) implies P{X n E A} -> P{X E A}, it follows from Theorem 4 3that X n -P-* Ximplies X n -4 X.

Product Spaces$

Let X' and X,;, be random elements of S', and let X" and X, be randomelements of S" In the rest of this section we assume that X' and X"have thesame domain, that X,, and X, have the same domain for each n, and that S'

and S" are separable, so that (X', X") and (Xn, Xn) are random elements of

S' x S" (see p 225) We seek conditions under which(4 16)

(Xn, Xn) 4 (X', X")

The random elements X' and X" are by definition independent if

P{X' E A', X" E A"} = P{X' E A') P{X" E A"}. If X' and X" are independentand Xn and Xn are independent for each n, then, by Theorem 3 2, (4 16) is

t Some notation : EC is the complement of E, El - E2 = El n E 2 ° is the difference of El and E2 , and E l + E 2 = ( E l - E 2) U (E2 - E 1) is the symmetric difference of E 1 and E2.

I The next-to-last footnote still applies

Trang 37

Convergence in Distribution 27equivalent to

X", then it is natural to regard Xn and Xn as asymptotically independent

By Theorem 3 1, (4.16) holds if and only if

(4.18)

P{X, E Á, X" E A"} - P{X' E Á, X" E A"}

for all X'-continuity sets Á and all X"-continuity sets A"

THEOREM 4 4 If Xn Z- X' andXn 4 a", then (Xn, Xn) -2* (X', a").

Proof._ We must verify (4.18) with X" identically equal to a". Suppose that

Áis an X'-continuity set and that A"is an X"-continuity set (that is, a" 0 aA").

If a" E A", then P{X'' 0 A"} -* 0, and (4.18) follows from Xn -2) X' and

P{X' E Á} - P{ X" 0 A"} < P{X' E Á, X" E A"} < P{X' E Á}

If a" 0 A", then (4 18) follows from

P{X' E Á, Xn E A"} < P{X" E A"} >- 0.

In the next theorem we assume that Xn converges in probability to arandom element Y" that is not necessarily constant, which requires that Y"

and all the (Xn, Xn) have the same domain (S, -4, P) Let -40 be a (finitelyađitive) field contained in -4, and denote by ăM 0) the or-field generated

is measurable ặ40), then (X,,, Xn) ? (X', X")

Note that (4.19) implies Xn -4 X' (take E - Q) Note also that thedomain of (X', X")need not be that common to Y"and the (X, X"') We mayreplace (X', X") in the conclusion by (Ý, Y") if the domain of Y" supportssome random element Ý that is independent of Y" and has the properdistribution ; but it would be an unnecessary restriction to assume in generalthe existence of such a Ý

Trang 38

Proof. Fix an X'-continuity set A' and an X"-continuity set A". We are toprove (4.18), which, since X'andX"are independent and X"has the distribu-tion of Y", is the same thing as

P{Xn E A', Y" E A"} -> P{X' E A'} P{Y" E A"}

Write En = {X, E A'} and a = P{X'E A'}, and let g denote the indicator

of the set {Y" E A"} Then (4.21) takes the form(4.22)

Since each Xn is measurable a'(_40), each E n lies in a(,40)

We shall prove that (4.22) holds if g is an arbitrary integrable function(measurable -4) Ifg is the indicator of a set in -`0 , then (4.22) holds because

it is the same thing as (4.19) Clearly (4.22) then also holds if g is a simplefunction measurable Ro If g is integrable and measurable Q(_`0), then, foreach positive E, there is a simple function gE, measurable -40, with

E{Ig - g.1} < E ; but then

so that (4 22) follows for all such g

Finally, suppose thatg is measurable -4 and integrable but not necessarilymeasurable a(_40) By the properties of conditional expected values-' westill have, since En c- a(A),

X and that (b) if X n ~ X, I X n - Y„!

< Z,, I Y,J, and Zn Ey 0, then Y,, 4 X. [Reduce (b) to (a) via the fact that Ix - yj S

e j yj with s < I implies Ix - yI < 2e lxl l

t See Doob (1953) or Billingsley (1965) The central results in this book do not require conditional probabilities and expected values

Trang 39

Weak Convergence and Mappings 29

3 Three fair coins are tossed independently Let Eij be the event that coins i and j show thesame face, let X' = Xn = X;z = Y" be the indicator of E13 , let X" be the indicator of

E12, and let -4o = {E,2, E23} The conclusion of Theorem 4.5 fails, although its hypotheses are satisfied except for the requirement that -'o be a field.

4 Let X1, X2 be independent and have a common distribution P on S Let P,,,, be the empirical measure for X,(w), , Xn(co); P,,, W(A) is the fraction of k, 1 < k < n, for which Xk(w) E A :

1 n

Pn (A) = - I IjXk(w))n

n k=1 Show that, if S is separable, then P,, P with probability 1 (Use the strong law of large numbers for Bernoulli trials and Corollary 1 to Theorem 2 2.] (This result is due to Vara- darajan (1958b) ; see Ranga Rao (1962) for extensions )

5 Show that random variables Xn and X satisfy Xn X if and only if

6 For a probability measure P on (S, 9'), (4.4) shows how to construct on a probability space (S2, -f7, P) a random element X with distribution P If S is separable and complete,

we can take P as Lebesgue measure on the Borel sets 4 of the unit interval Q [Let -01 k = {Aku} be a decomposition of S into P-continuity sets of diameter less than 1/k and let 1k = {Iku} be a decomposition of 0 into subintervals with lengths P(Iku) = P(Aku) ; arrange that dk+l refines dk and5k+1 refines 5k For w E Iku, give Xk(co) some value in

Aku, show that {Xk(w)} is fundamental for each w, and use Theorem 4.1 to show that

X(w) = limkXk(w) has distribution P.] (Skorohod (1956) has the stronger result that, if

Pn = P, random elements Xnand X with these distributions can be constructed on the unit interval in such a way that Xn(w) -* X(w) for each w.)

7 Show that (Xn, XZ)

(X', X") if X' and X" are independent, Xn -P-> Y", where

Y" has the distribution of X", and (4.19) holds for each E in the a-field generated by Y".

5 WEAK CONVERGENCE AND MAPPINGS

Continuous Mappings

If h is a measurable mapping of S into another metric space S' (with metric

p' and o-field b' of Borel sets), then each probability measure P on(S, 9')

induces on (S', 9') a unique probability measure Ph -1 , defined by Ph-1(A) =

P(h -1A) for A E9' We need conditions under which Pn=>P implies

Pnh -1 = Ph - ' One such condition is that h be continuous, since then f(h(x)) is bounded and continuous on S whenever f(y) is bounded and continuous on S', so that Pn =>P implies f f(h(x))Pn(dx) -* f f(h(x))P(dx), a relation which, upon transformation of the integrals (see p 223), becomes

ff(y)Ph_ 1(dy)

n, -*

J f (y)Ph-1(dy)

Trang 40

For example, the natural projection Ir7, from R°° to Rk is continuous, sothat Pn =>P implies Pn7rk1 => P7rk1 for each k. Let us show that, conversely,

if Pn7rk1 => P7Tk1 for each k, then Pn =>P From the continuity of 7Tk itfollows easily that a7rk1H ( irk1 aHfor H ( Rk. Using special properties ofTTk we shall prove that there is inclusion in the other direction If x E irk1 aH,

so that 7Tkx E aH, then there are points a( u) in Hand points R(u) in He suchthat a(u) ) 7Tkx and 13(u) - 7Tkx (u -+ oo) Since the points (c4u), , akin,xk+1, ) lie in 7rk iHand converge to x, and since the points (9 u) , , #(u) ,xk+1, ) lie in (7rkiH)c and also converge to x, x E a(7rk'H) Thus a7TkiH = 7rk1 aH. If Pn7rk1 =>P7Tk1, then Pn(A) -3'P(A) for sets A = 7Tk1H with

H E RkandP(7Tk1 aH) = 0 SinceP(7rk1 aH) = 0is equivalent toP(a7Tk1H) _

0, Pn(A) -+ P(A) holds for all finite-dimensional P-continuity sets, andhence, since the finite-dimensional sets form a convergence-determiningclass, Pn = P

We call the P7rk1 the finite-dimensional distributions or measures sponding to P We have shown that probability measures on (R', R')

corre-converge weakly if and only if all the corresponding finite-dimensionaldistributions converge weakly

The finite-dimensional distributions of a probability measure P on (C, ')

we define as the various measures P7Ttli tk, where the 7rtl tk are the tions defined in Section 3 Since these projections are continuous, the weakconvergence of probability measures on (C, ') implies the weak convergence

projec-of the corresponding finite-dimensional distributions But the converse failsbecause, as was shown by counterexample, the class of finite-dimensionalsets is not convergence determining Indeed, if P [Pn] is a unit mass at 0[the function (3 5)], then Pn does not converge weakly to P, even though

Pn7rtll tk = P7TTl_tk holds for all sets (ti , , tk) On the other hand, sincethe finite-dimensional sets form a determining class, a probability measure

on (C, ') is uniquely determined by its finite-dimensional distributions Main Theorem

•We have seen that Pn =>P implies Pnh-1 = Ph-1 if h is a continuous mapping

of S into S', but we can weaken the continuity assumption Assume onlythat h is measurable and let Dh be the set of discontinuities of h Then

Dh E 6" (even if h is not measurable ; see p 225)

THEOREM 5.1 If Pn=>P and P(Dh) = 0, then Pnh-1 =Ph-1

Proof We shall show that, if F is a closed subset of S', then

lim sup Pnh-1(F) < Ph-1(F) n-+ 00

Tiêu đề	Convergence of Probability Measures
Tác giả	Patrick Billingsley
Trường học	University of Chicago
Chuyên ngành	Statistics and Mathematics
Thể loại	Thesis
Năm xuất bản	1968
Thành phố	Chicago

Định dạng
Số trang	262
Dung lượng	11,82 MB