Probability for statisticians

viii PROBABILITY FOR STATISTICIANSof statististics are introduced in Section 9.2, while useful linear algebra and themultivariate normal distribution are the subjects of Section 9.3 and

Trang 1

Probability for Statisticians

Galen R Shorack

Springer

Trang 8

There is a thin self-contained textbook within this larger presentation.

To be sure that this is well understood, I will describe later how I have used thistext in the classroom at the University of Washington in Seattle

Let me ﬁrst indicate what is diﬀerent about this book As implied by the

title, there is a diﬀerence Not all the diﬀerence is based on inclusion of statisticalmaterial (To begin, Chapters 1–6, provide the mathematical foundation for therest of the text Then Chapters 7–8 hone some tools geared to probability theory,while Chapter 9 provides a brief introduction to elementary probability theory rightbefore the main emphasis of the presentation begins.)

The classical weak law of large numbers (WLLN) and strong law of large numbers(SLLN) as presented in Sections 10.2–10.4 are particularly complete, and they alsoemphasize the important role played by the behavior of the maximal summand.Presentation of good inequalities is emphasized in the entire text, and this chapter

is a good example Also, there is an (optional) extension of the WLLN in Section10.6 that focuses on the behavior of the sample variance, even in very generalsituations

Both the classical central limit theorem (CLT) and its Lindeberg and Liapunovgeneralizations are presented in two diﬀerent chapters They are ﬁrst presented inChapter 11 via Stein’s method (with a new twist), and they are again presented inChapter 14 using the characteristic function (chf) methods introduced in Chapter

13 The CLT proofs given in Chapter 11 are highly eﬃcient Conditions for boththe weak bootstrap and the strong bootstrap are developed in Chapter 11, as is

a universal bootstrap CLT based on light trimming of the sample The approachemphasizes a statistical perspective Much of Section 11.1 and most of Sections11.2–11.5 are quite unusual I particularly like this chapter Stein’s method is alsoused in the treatment of U-statistics and Hoeﬀding’s combinatorial CLT (whichapplies to sampling from a ﬁnite population) in the optional Chapter 17 Also, thechf proofs in Section 14.2 have a slightly unusual starting point, and the approach

to gamma approximations in the CLT in Section 14.4 is new

Both distribution functions (dfs F ( ·)) and quantile functions (qfs K(·) ≡ F −1(·))

are emphasized throughout (quantile functions are important to statisticians) InChapter 7 much general information about both dfs and qfs and the Winsorizedvariance is developed The text includes presentations showing how to exploit the

inverse transformation X ≡ K(ξ) with ξ ∼ = Uniform(0, 1) In particular, Chapter 7

inequalities relating the qf and the Winsorized variance to some empirical process sults of Chapter 12 are used in Chapter 16 to treat trimmed means and L-statistics,rank and permutation tests, sampling from ﬁnite populations, and bootstrapping.(Though I am very fond of Sections 7.6–7.11, their prominence is minimized in thesubsequent parts of the text.)

re-At various points in the text choices can be voluntarily made that will oﬀerthe opportunity for a statistical example or foray (Even if the instructor does notexercise a particular choice, a student can do so individually.) After the elementaryintroduction to probability theory in Section 9.1, many of the classical distributions

Trang 9

viii PROBABILITY FOR STATISTICIANS

of statististics are introduced in Section 9.2, while useful linear algebra and themultivariate normal distribution are the subjects of Section 9.3 and Section 9.4.Following the CLT via Stein’s method in Section 11.1, extensions in Section 11.2–11.3, and application of these CLTs to the bootstrap in Sections 11.4–11.5, there

is a large collection of statistical examples in Section 11.6 During presentation ofthe CLT via chfs in Chapter 14, statistical examples appear in Sections 14.1, 14.2,and 14.4 Statistical applications based on the empirical df appear in Sections 12.10and 12.12 The highly statistical optional Chapters 16 and 17 were discussed brieﬂyabove Also, the conditional probability Sections 8.5 and 8.6 emphasize statistics.Maximum likelihood ideas are presented in Section A.2 of Appendix A Many usefulstatistical distributions contain parameters as an argument of the gamma function.For this reason, the gamma and digamma functions are ﬁrst developed in SectionA.1 Section A.3 develops cumulants, Fisher information, and other useful facts for

a number of these distributions Maximum likelihood proofs are in Section A.4

It is my hope that even those well versed in probability theory will ﬁnd somenew things of interest

I have learned much through my association with David Mason, and I would like

to acknowledge that here Especially (in the context of this text), Theorem 12.4.3

is a beautiful improvement on Theorem 12.4.2, in that it still has the potentialfor necessary and suﬃcient results I really admire the work of Mason and hiscolleagues It was while working with David that some of my present interestsdeveloped In particular, a useful companion to Theorem 12.4.3 is knowledge ofquantile functions Sections 7.6–7.11 present what I have compiled and produced

on that topic while working on various applications, partially with David

Jon Wellner has taught from several versions of this text In particular, hetyped an earlier version and thus gave me a major critical boost That head start

is what turned my thoughts to writing a text for publication Sections 8.6, 19.2,and the Hoffman–Jorgensen inequalities came from him He has also formulated anumber of exercises, suggested various improvements, offered good suggestions andreferences regarding predictable processes, and pointed out some difficulties Mythanks to Jon for all of these contributions (Obviously, whatever problems mayremain lie with me.)

My thanks go to John Kimmel for his interest in this text, and for his help andguidance through the various steps and decisions Thanks also to Lesley Poliner,David Kramer, and the rest at Springer-Verlag It was a very pleasant experience.This is intended as a textbook, not as a research manuscript Accordingly, themain body is lightly referenced There is a section at the end that contains somediscussion of the literature

Trang 10

Use of this Text xiii

Chapter 1 Measures

1 Basic Properties of Measures 1

2 Construction and Extension of Measures 12

3 Lebesgue–Stieltjes Measures 18

Chapter 2 Measurable Functions and Convergence

1 Mappings and σ-Fields 21

2 Measurable Functions 24

3 Convergence 29

4 Probability, RVs, and Convergence in Law 33

5 Discussion of Sub σ-Fields 35

Chapter 3 Integration

1 The Lebesgue Integral 37

2 Fundamental Properties of Integrals 40

3 Evaluating and Diﬀerentiating Integrals 44

4 Inequalities 46

5 Modes of Convergence 51

Chapter 4 Derivatives via Signed Measures

1 Decomposition of Signed Measures 61

2 The Radon–Nikodym Theorem 66

3 Lebesgue’s Theorem 70

4 The Fundamental Theorem of Calculus 74

Chapter 5 Measures and Processes on Products

1 Finite-Dimensional Product Spaces 79

2 Random Vectors on (Ω, A, P ) 84

3 Countably Inﬁnite Product Probability Spaces 86

4 Random Elements and Processes on (Ω, A, P ) 90

Chapter 6 General Topology and Hilbert Space

1 General Topology 95

2 Metric Spaces 101

3 Hilbert Space 104

Trang 11

x PROBABILITY FOR STATISTICIANS

Chapter 7 Distribution and Quantile Functions

1 Character of Distribution Functions 107

2 Properties of Distribution Functions 110

3 The Quantile Transformation 111

4 Integration by Parts Applied to Moments 115

5 Important Statistical Quantities 119

6 Inﬁnite Variance 123

7 Slowly Varying Partial Variance 127

8 Speciﬁc Tail Relationships 134

9 Regularly Varying Functions 137

10 Some Winsorized Variance Comparisons 140

11 Inequalities for Winsorized Quantile Functions 147

Chapter 8 Independence and Conditional Distributions

1 Independence 151

2 The Tail σ-Field 155

3 Uncorrelated Random Variables 157

4 Basic Properties of Conditional Expectation 158

5 Regular Conditional Probability 168

6 Conditional Expectations as Projections 174

Chapter 9 Special Distributions

1 Elementary Probability 179

2 Distribution Theory for Statistics 187

3 Linear Algebra Applications 191

4 The Multivariate Normal Distribution 199

Chapter 10 WLLN, SLLN, LIL, and Series

0 Introduction 203

1 Borel–Cantelli and Kronecker Lemmas 204

2 Truncation, WLLN, and Review of Inequalities 206

3 Maximal Inequalities and Symmetrization 210

4 The Classical Laws of Large Numbers, LLNs 215

5 Applications of the Laws of Large Numbers 223

6 General Moment Estimation 226

7 Law of the Iterated Logarithm 235

8 Strong Markov Property for Sums of IID RVs 239

9 Convergence of Series of Independent RVs 241

10 Martingales 246

11 Maximal Inequalities, Some with Boundaries 247

12 A Uniform SLLN 252

Trang 12

Chapter 11 Convergence in Distribution

1 Stein’s Method for CLTs 255

2 ˜Winsorization and ˇTruncation 264

3 Identically Distributed RVs 269

4 Bootstrapping 274

5 Bootstrapping with Slowly Increasing Trimming 276

6 Examples of Limiting Distributions 279

7 Classical Convergence in Distribution 288

8 Limit Determining Classes of Functions 292

Chapter 12 Brownian Motion and Empirical Processes

1 Special Spaces 295

2 Existence of Processes on (C, C) and (D, D) 298

3 Brownian Motion and Brownian Bridge 302

4 Stopping Times 305

5 Strong Markov Property 308

6 Embedding a RV in Brownian Motion 311

7 Barrier Crossing Probabilities 314

8 Embedding the Partial Sum Process 318

9 Other Properties of Brownian Motion 323

10 Various Empirical Processes 325

11 Inequalities for the Various Empirical Processes 333

12 Applications 338

Chapter 13 Characteristic Functions

1 Basic Results, with Derivation of Common Chfs 341

2 Uniqueness and Inversion 346

3 The Continuity Theorem 350

4 Elementary Complex and Fourier Analysis 352

5 Esseen’s Lemma 358

6 Distributions on Grids 361

7 Conditions for φ to Be a Characteristic Function 363

Chapter 14 CLTs via Characteristic Functions

0 Introduction 365

1 Basic Limit Theorems 366

2 Variations on the Classical CLT 371

3 Local Limit Theorems 380

4 Gamma Approximation 383

5 Edgeworth Expansions 390

6 Approximating the Distribution of h( ¯ X n) 396

Trang 13

xii PROBABILITY FOR STATISTICIANS

Chapter 15 Inﬁnitely Divisible and Stable Distributions

1 Inﬁnitely Divisible Distributions 399

2 Stable Distributions 407

3 Characterizing Stable Laws 410

4 The Domain of Attraction of a Stable Law 412

Chapter 16 Asymptotics via Empirical Proceses

0 Introduction 415

1 Trimmed and Winsorized Means 416

2 Linear Rank Statistics and Finite Sampling 426

1 Basic Technicalities for Martingales 467

2 Simple Optional Sampling Theorem 472

3 The Submartingale Convergence Theorem 473

4 Applications of the S-mg Convergence Theorem 481

5 Decomposition of a Submartingale Sequence 487

6 Optional Sampling 492

7 Applications of Optional Sampling 499

8 Introduction to Counting Process Martingales 501

9 Doob–Meyer Submartingale Decomposition 511

10 Predictable Processes and ·

0H dM Martingales 516

11 The Basic Censored Data Martingale 522

12 CLTs for Dependent RVs 529

Chapter 19 Convergence in Law on Metric Spaces

1 Convergence in Distribution on Metric Spaces 531

2 Metrics for Convergence in Distribution 540

Appendix A Distribution Summaries

1 The Gamma and Digamma Functions 546

2 Maximum Likelihood Estimators and Moments 551

3 Examples of Statistical Models 555

4 Asymptotics of Maximum Likelihood Estimation 563

Trang 14

The University of Washington is on the quarter system, so my description will reﬂectthat My thoughts are oﬀered as a potential help, not as an essential recipe.The reader will note that the problems are interspersed with the text It isimportant to read them as they are encountered.

Chapters 1–5 provide the measure-theoretic background that is necessary forthe rest of the text Many of our students have had at least some kind of anundergraduate exposure to part of this subject Still, it is important that I presentthe key parts of this material rather carefully I feel it is useful for all of them

Chapter 1 (measures; 5 lectures)

Emphasized in my presentation are generators, the monotone property of measures,the Carath´eodory extension theorem, completions, the approximation lemma, andthe correspondence theorem Presenting the correspondence theorem carefully isimportant, as this allows one the luxury of merely highlighting some proofs inChapter 5 [The minimal monotone class theorem of Section 1.1, claim 8 of theCarath´edory extension theorem proof, and most of what follows the approximationlemma in Section 1.2 would never be presented in my lectures.] {I always assign Ex-

ercises 1.1.1 (generators), 1.2.1 (completions), and 1.2.3 (the approximation lemma).Other exercises are assigned, but they vary each time.}

Chapter 2 (measurable functions and convergence; 4 lectures)

I present most of Sections 2.1, 2.2, and 2.3 Highlights are preservation of σ-ﬁelds,

measurability of both common functions and limits of simple functions, inducedmeasures, convergence and divergence sets (especially), and relating → µ to → a.s

(especially, reducing the first to the second by going to subsequences) I then assignSection 2.4 as outside reading and Section 2.5 for exploring [I never lecture oneither Section 2.4 or 2.5.] {I always assign Exercises 2.2.1 (specific σ-fields), 2.3.1

(concerning → a.e.), 2.3.3 (a substantial proof), and 2.4.1 (Slutsky’s theorem).} Chapter 3 (integration; 7 lectures)

This is an important chapter I present all of Sections 3.1 and 3.2 carefully, but

Section 3.3 is left as reading, and some of the Section 3.4 inequalities (C r, H¨older,Liapunov, Markov, and Jensen) are done carefully I do Section 3.5 carefully asfar as Vitali’s theorem, and then assign the rest as outside reading {I always

assign Exercises 3.2.1–3.2.2 (only the zero function), 3.3.3 (differentiating underthe integral sign), 3.5.1 (substantial theory), and 3.5.7 (the Scheffé theorem).} Chapter 4 (Radon–Nikodym; 2 lectures)

I present ideas from Section 4.1, sketch the Jordan–Hahn decomposition proof, andthen give the proofs of the Lebesgue decomposition, the Radon–Nikodym theorem,and the change of variable theorem These ﬁnal two topics are highlighted Thefundamental theorem of calculus of Section 4.4 is brieﬂy discussed [I would neverpresent any of Section 4.3.] {I always assign Exercises 4.2.1 (manipulating Radon–

Nikodym derivatives), 4.2.7 (mathematically substantial), and 4.4.1, 4.4.2, and 4.4.4(so that the students must do some outside reading in Section 4.4 on their own).}

Trang 15

xiv PROBABILITY FOR STATISTICIANS

Chapter 5 (Fubini; 2 lectures)

The ﬁrst lecture covers Sections 5.1 and 5.2 Proving Proposition 5.2.1 is a must,and I discuss/prove Theorems 5.1.2 (product measure) and 5.1.3 (Fubini) Theremaining time is spent on Section 5.3 [I rarely lecture from Section 5.4, but I doassign it as outside reading.] {I always assign Exercises 5.3.1 (measurability in a

countable number of dimensions) and 5.4.1 (the ﬁnite-dimensional ﬁeld).}

Chapter 6 (topology and Hilbert space, 0 lectures)

[This chapter is presented only for reference I do not lecture from it.]

The mathematical tools have now been developed In the next three chapters

we learn about some specialized probabilistic tools and then get a brief review ofelementary probability The presentation on the classic topics of probability theorythen commences in Chapter 10

Chapter 7 (distribution functions (dfs) and quantile functions (qfs); 4 lectures)

This chapter is quite important to this text Skorokhod’s theorem in Section 7.3must be done carefully, and the rest of Sections 7.1–7.4 should be covered Sec-tion 7.5 should be left as outside reading [Lecturing from Sections 7.6–7.11 ispurely optional, and I would not exceed one lecture.] {I always assign Exercises

7.1.1 (on continuity of dfs), 7.3.3 (F −1(·) is left continuous), 7.3.3 (change of

vari-able), and 7.4.2 (for practice working with X ≡ K(ξ)) Consider lecturing on

Theorem 7.6.1 (the inﬁnite variance case).}

Chapter 8 (conditional expectation; 2 lectures)

The ﬁrst lecture covers Sections 8.1 and 8.2 It highlights Proposition 8.1.1 (onthe preservation of independence), Theorem 8.1.2 (extending independence from

π-systems), and Kolmogorov’s 0-1 law. The other provides some discussion ofthe deﬁnition of conditional probability in Section 8.4, includes proofs of sev-eral parts of Theorem 8.4.1 (properties of conditional expectation), and discussesDeﬁnition 8.5.1 of regular conditional probability [I never lecture on Sections 8.3,8.5, or 8.6.] {I always assign Exercises 8.1.2 and 8.1.3 (they provide routine practice

with the concepts), Exercise 8.4.1 (discrete conditional probability), Exercise 8.4.3(repeated stepwise smoothing in a particular example), and part of Exercise 8.4.4(proving additional parts of Theorem 8.4.1).}

Chapter 9 (elementary probability; 0 lectures)

Sections 9.1 and 9.2 were written to provide background reading for those graduatestudents in mathematics who lack an elementary probability background Sections9.3 and 9.4 allow graduate students in statistics to read some of the basic multi-variate results in appropriate matrix notation [I do not lecture from this chapter.]

{But I do assign Exercises 9.1.8 (the Poisson process exists) and 9.2.1(ii) (so that

the convolution formula is refreshed).}

Chapter 10 (laws of large numbers (LLNs) and inequalities; 3 lectures for now)

Since we are on the quarter system at the University of Washington, this leaves me

3 lectures to spend on the law of large numbers in Chapter 10 before the Christmasbreak at the end of the autumn quarter In the ﬁrst 3 lectures I do Sections 10.1 and10.2 with Khinchine’s weak law of large numbers (WLLN), Kolmogorov’s inequalityonly from Section 10.3, and at this time I present Kolmogorov’s strong law of largenumbers (SLLN) only from Section 10.4 {I always assign Exercises 10.1.1 (Ces`aro

summability), 10.2.1 (it generates good ideas related to the proofs), 10.2.3 (as it

Trang 16

practices the important O p(·) and o p(·) notation), 10.4.4 (the substantial result of

Marcinkiewicz and Zygmund), 10.4.7 (random sample size), and at least one of thealternative SLLN proofs contained in 10.4.8, 10.4.9, and 10.4.10.}

At this point at the beginning of the winter quarter the instructor will havehis/her own opinions about what to cover I devote the winter quarter to the weaklaw of large numbers (WLLN), an introduction to the law of the iterated logarithm(LIL), and various central limit theorems (CLTs) That is, the second term treatsthe material of Chapters 10–11 and 13–17 I will outline my choices for which parts

to cover

Chapter 10 (LLNs, inequalities, LIL, and series; 6 lectures)

My lectures cover Section 10.3 (symmetrization inequalities and L´evy’s inequalityfor the WLLN, and the Ottovani–Skorokhod inequality for series), Feller’s WLLNfrom Section 10.4, the Glivenko–Cantelli theorem from Section 10.5, the LIL fornormal rvs in Proposition 10.7.1, the strong Markov property of Theorem 10.8.1,and the two series Theorem 10.9.2 [I do not lecture from any of Sections 10.6,10.10, 10.11, or 10.12 at this time.] {I always assign Exercise 10.7.1 (Mills’ ratio).} Chapter 11 (CLTs via Stein’s method; 3 lectures)

From section 11.1 one can prove Stein’s ﬁrst lemma and discuss his second lemma,prove the Berry–Esseen theorem, and prove Lindeberg’s CLT Note that we havenot yet introduced characteristic functions

Chapter 13 (characteristic functions (chfs); 6 lectures)

I do sections 13.1–13.5 {I always assign Exercises 13.1.1 and 13.1.3(a) (deriving

speciﬁc chfs) and 13.4.1 (Taylor series expansions of the chf).}

Chapter 14 (CLTs via chfs; 6 lectures)

The classical CLT, the Poisson limit theorem, and the multivariate CLT make anice lecture The chisquare goodness of fit example and/or the median example(of Section 11.6) make a lecture of illustrations Chf proofs of the usual CLTs aregiven in Section 14.2 (Section 13.5 could have been left until now) If Lindeberg’stheorem was proved in Chapter 11, one might do only Feller’s converse now via chfs.Other examples from either Section 14.2 or 11.6 could now be chosen, and Example11.6.4 (weighted sums of iid rvs) is my first choice [The chisquare goodness of fitexample could motivate a student to read from Sections 9.3 and 9.4.]

At this stage I still have at least 7 optional lectures at the end of the winterquarter and about 12 more at the start of the spring quarter In my ﬁnal 16 lectures

of the spring quarter I feel it appropriate to consider Brownian motion in Chapter

12 and then martingales in Chapter 18 (in a fashion to be described below) Let meﬁrst describe some possibilities for the optional lectures, assuming that the abovecore was covered

Chapter 17 (U-statistics and Hoeﬀding’s combinatorial CLT)

Sections 17.1 and 17.2 are independent of each other The Berry–Esseen potential

of Lemma 11.1.1 is required for Section 17.1 Either one or two lectures could then

be presented on U-statistics from Section 17.1 The alternative Stein formulation ofMotivation 11.1.1 is required for section 17.2 Two additional lectures would givethe Hoeﬀding combinatorial CLT and its corollary regarding sampling from ﬁnitepopulations

Trang 17

xvi PROBABILITY FOR STATISTICIANS

Chapter 11 (statistical examples)

Sections 11.6, 14.2, and 14.6 contain appropriate examples and exercises

Chapter 11 (bootstrap)

Both Sections 11.4 and 11.5 on the bootstrap require only Theorem 11.2.1

Chapters 11 and 19 (convergence in distribution)

Convergence in distribution on the line is presented in Sections 11.7 and 11.8 [This

is extended to metric spaces in Chapter 19, but I do not lecture from it.]

Chapter 11 (domain of normal attraction of the normal df)

The converse of the CLT in Theorem 11.3.2 requires the Gin´e–Zinn symmetrizationinequality and the Khinchine inequality of Section 13.3 and the Paley–Zygmundinequality of Section 3.4

Chapters 7, 10 and 11 (domain of attraction of the normal df)

Combining Sections 7.6–7.8, the Section 10.3 subsection on maximal inequalities

of another ilk, Section 10.6, and Sections 11.2–11.3 makes a nice unit L´evy’s

asymptotic normality condition (ANC) of (7.7.14) for a rv X has some prominence.

In Chapter 7 purely geometric methods plus Cauchy–Schwarz are used to derive amultitude of equivalent conditions In the process, quantile functions are carefullystudied In Section 10.6 the ANC is seen to be equivalent to a result akin to a

WLLN for the rv X2, and so in this context many additional equivalent conditionsare again derived Thus when one comes to the CLT in Sections 11.2 and 11.3, onealready knows a great deal about the ANC

Chapter 15 (inﬁnitely divisible and stable laws)

First, Section 15.1 (inﬁnitely divisible laws) is independent of the rest, includingSection 15.2 (stable laws) The theorem stated in Section 15.4 (domain of attraction

of stable laws) would require methods of Section 7.9 to prove, but the interestingexercises are accessible without this

Chapter 14 (higher-order approximations)

The local limit theorem in Section 14.3 can be done immediately for continuousdfs, but it also requires Section 13.6 for discrete dfs The expansions given inSections 14.4 (gamma approximation) and 14.5 (Edgeworth approximation) alsorequire Exercise 13.4.6

Assorted topics suitable for individual reading

These include Section 8.6 (on alternating conditional expectations), Section 10.12(a uniform SLLN), Section 16.4 (L-statistics), Sections 18.8–18.11 (counting processmartingales), and Section 18.12 (martingale CLTs)

The primary topics for the spring quarter are Chapter 12 (Brownian motionand elementary empirical processes) and Chapter 18 (martingales) I have nevercovered Chapter 12 until the spring, but I placed it rather early in the text to makeclear that it doesn’t depend on any of the later material

Chapter 12 (Brownian motion; 6 lectures)

I discuss Section 12.1, sketch the proof of Section 12.2 and carefully apply thatresult in Section 12.3, and treat Section 12.4 carefully (as I believe that at somepoint a lecture should be devoted to a few of the more subtle diﬃculties regardingmeasurability) I am a bit cavalier regarding Section 12.5 (strong Markov property),but I apply it carefully in Sections 12.6, 12.7, and 12.8 I assign Section 12.9 asoutside reading [I do not lecture on Theorem 12.8.2.] {I always assign Exercises

Trang 18

12.1.2 (on (C, C)), 12.3.1 (various transforms of Brownian motion), 12.3.3 (integrals

of normal processes), 12.4.1 (properties of stopping times), 12.7.3(a) (related toembedding a rv in Brownian motion), and 12.8.2 (the LIL via embedding).}

At this point let me describe three additional optional topics that could now bepursued, based on the previous lectures from Chapter 12

Chapter 12 (elementary empirical processes)

Uniform empirical and quantile processes are considered in Section 12.10 forward applications to linear rank statistics and two-sample test of ﬁt are included.One could either lecture from Section 12.12 (directly) or 12.11 (with a preliminarylecture from Sections 10.10–10.11, or leave these for assigned reading.)

Straight-Chapter 16 (trimmed means and/or simple linear rank statistics)

Both possibilities listed here require Section 12.10 as well as the quantile inequality

of (7.11.3) Asymptotic normality of linear rank statistics and a ﬁnite samplingCLT are derived in Section 16.2, and the bootstrap is presented in Section 16.3.The general CLT (Theorem 16.1.1) and asymptotic normality of trimmed means

(Theorem 16.1.2, but only present the β = 0 case) are derived in Section 16.1; this

will also require stating/proving the equivalence of (16.1.3) and (7.6.4), which isshown in Theorem 7.1.1

Chapter 18 (martingales; 10 lectures)

I cover most of the ﬁrst seven sections {I always assign Exercises 18.1.4 (a counting

process martingale), 18.3.2 (a proof for continuous time mgs), 18.3.7, and 18.3.9 (on

L r-convergence).}

Appendix A (maximum likelihood estimation)

I see this as being of considerable interest in conjunction with statistical pursuits,rather than as a direct part of a course on probability theory

Trang 19

Deﬁnition of Symbols

∼

= means “is distributed as”

≡ means “is deﬁned to be”

a = b ⊕ c means that |a − b| ≤ c

U n=a V n means “asymptotically equal” in the sense that U n − V n → p0

X ∼ = (µ, σ2) means that X has mean µ and variance σ2

X ∼ = F (µ, σ2) means that X has df F with mean µ and variance σ2

¯

X n is the “sample mean” and ¨X n is the “sample median”

(Ω, A, µ) and (Ω, A, P ) denote a measure space and a probability space

σ[ C] denotes the σ-ﬁeld generated by the class of sets C

F(X) denotes X −1( ¯B), for the Borel sets B and ¯ B ≡ σ[B, {+∞}, {−∞}]

ξ will always refer to a Uniform(0, 1) rv

means “nondecreasing” and ↑ means “strictly increasing”

1A(·) denotes the indicator function of the set A

“df” refers to a distribution function F ( ·)

“qf” refers to a quantile function K( ·) ≡ F −1(·)

The “tilde” symbol denotes ˜Winsorization

The “h´aˇcek” symbol denotes ˇTruncation

λ( ·) and λ n(·) will refer to Lebesgue measure on the line R and on R n

See page 119 for “dom(a, a )”

Brownian motionS, Brownian bridge U, and the Poisson process N

The empirical dfFn and the empirical dfGn of Uniform(0, 1) rvs

 → is associated with convergence in the LIL (see page 235)

“mg” refers to a martingale

“smg” refers to a submartingale

>

= means “≥” for a submartingale and “=” for a martingale

The symbol “=” is paired with “s-mg” in this context.>

Prominence

Important equations are labeled with numbers to give them prominence Thus,equations within a proof that are also important outside the context of that proofare numbered Though the typical equation in a proof is unworthy of a number, itmay be labeled with a letter to help with the “bookkeeping.” Likewise, digressions

or examples in the main body of the text may contain equations labeled with lettersthat decrease the prominence given to them

Integral signs and summation signs in important equations (or suﬃciently plicated equations) are large, while those in less important equations are small It

com-is a matter of assigned prominence The most important theorems, definitions, andexamples have been given titles in boldface type to assign prominence to them.The titles of somewhat less important results are not in boldface type Routinereferences to theorem 10.4.1 or definition 7.3.1 do not contain capitalized initialletters The author very specifically wishes to downgrade the prominence given tothis routine use of these words Starting new sections on new pages allowed me tocarefully control the field of vision as the most important results were presented

Trang 20

Motivation 1.1 (The Lebesgue integral) The Riemann integral of a continuous

function f , we will restrict attention to f (x) ≥ 0 for a ≤ x ≤ b for convenience, is

formed by subdividing the domain of f , forming approximating sums, and passing

to the limit Thus the mth Riemann sum forb

where a ≡ x m0 < x m1 < · · · < x mm ≡ b (with x m,i−1 ≤ x ∗

mi ≤ x mi for all i) satisfy

meshm ≡ max[x mi −x m,i−1]→ 0 Note that x mi −x m,i−1is the measure (or length)

of the interval [x m,i−1 , x mi ], while f (x ∗ mi ) approximates the values of f (x) for all

x m,i−1 ≤ x ≤ x mi (at least it does if f is continuous on [a, b]) Within the class C+

of all nonnegative continuous functions, this deﬁnition works reasonably well But

it has one major shortcoming The conclusionb

not even be well-deﬁned

A diﬀerent approach is needed (Note ﬁgure 1.1.)

The Lebesgue integral of a nonnegative function is formed by subdividing the

range Thus the mth Lebesgue sum forb

a f (x) dx is deﬁned to be the limit of the LS m sums as m → ∞ For what

classM of functions f can this approach succeed? The members f of the class M

will need to be such that the measure (or length) of all sets of the form

1

Trang 21

2 CHAPTER 1 MEASURES

can be speciﬁed This approach leads to the concept of a σ-ﬁeld A of subsets of [a, b]

that are measurable (that is, we must be able to assign to these intervals a numbercalled their “length”), and this leads to the concept of the classM of measurable

functions This class M of measurable functions will be seen to be closed under

passage to the limit and all the other operations that we are accustomed to forming on functions Moreover, the desirable propertyb

Trang 22

Deﬁnition 1.1 (Set theory) Consider a nonvoid classA of subsets A of a nonvoid

set Ω (For us, Ω will be the sample space of an experiment.)

(a) Let A c denote the complement of A, let A ∪ B denote the union of A and B,

let A ∩ B and AB both denote the intersection, let A \ B ≡ AB c denote the set

diﬀerence, let A B ≡ (A c B ∪ AB c ) denote the symmetric diﬀerence, and let ∅

denote the empty set The class of all subsets of Ω will be denoted by 2Ω Sets A and B are called disjoint if AB = ∅, and sequences of sets A n or classes of sets A t

are called disjoint if all pairs are disjoint Writing A + B or∞

1 A nwill also denote

a union, but will imply the disjointness of the sets in the union As usual, A ⊂ B

denotes that A is a subset of B We call a sequence A n increasing (and we will

nearly always denote this fact by writing A n ) when A n ⊂ A n+1 for all n ≥ 1.

We call the sequence decreasing (denoted by A n ) when A n ⊃ A n+1 for all n ≥ 1.

We call the sequence monotone if it is either increasing or decreasing Let ω denote

a generic element of Ω We will use 1A(·) to denote the indicator function of A,

which equals 1 or 0 at ω according as ω ∈ A or ω ∈ A.

(b) A will be called a ﬁeld if it is closed under complements and unions (That is,

A and B in A requires that A c and A ∪ B be in A.) [Note that both Ω and ∅ are

necessarily inA, as A was assumed to be nonvoid, with Ω = A ∪ A c and∅ = Ω c.](c)A will be called a σ-ﬁeld if it is closed under complements and countable unions.

(That is, A, A1, A2, in A requires that A c and∪ ∞

1 A n be inA.)

(d)A will be called a monotone class provided it contains ∪ ∞

1 A n for all increasing

sequences A n inA and contains ∩ ∞

1 A n for all decreasing sequences A n inA.

(e) (Ω, A) will be called a measurable space provided A is a σ-ﬁeld of subsets of Ω.

(f) A will be called a π-system provided AB is in A for all A and B in A; and A

will be called a ¯π-system when Ω in A is also guaranteed.

IfA is a ﬁeld (or a σ-ﬁeld), then it is closed under intersections (under countable

intersections); since AB = (A c ∪ B c)c (since ∩ ∞

1 A n = (∪ ∞

1 A c

n)c) Likewise, wecould have used “intersection” instead of “union” in our deﬁnitions by making use

of A ∪ B = (A c ∩ B c)c and∪ ∞

1 A n= (∩ ∞

1 A c

n)c

Proposition 1.1 (Closure under intersections)

(a) Arbitrary intersections of fields, σ-fields, or monotone classes are fields, σ-fields,

or monotone classes, respectively

[For example,∩{F α:F αis a ﬁeld under consideration} is a ﬁeld.]

(b) There exists a minimal ﬁeld, σ-ﬁeld, or monotone class generated by (or,

con-taining) any speciﬁed classC of subsets of Ω We call C the generators For example,

σ[ C] ≡{F α:F α is a σ-ﬁeld of subsets of Ω for which C ⊂ F α }

(4)

is the minimal σ-ﬁeld generated by C (that is, containing C)

(c) A collectionA of subsets of Ω is a σ-ﬁeld if and only if it is both a ﬁeld and a

1 B n ∈ A since the B n are in A and

are Everything else is even more trivial 2

Trang 23

then µ is called a measure (or, equivalently, a countably additive measure) on (Ω, A).

The triple (Ω, A, µ) is then called a measure space We call µ ﬁnite if µ(Ω) < ∞.

We call µ σ-ﬁnite if there exists a measurable decomposition of Ω as Ω =∞

1 Ωn

with Ωn ∈ A and µ(Ω n ) < ∞ for all n.

[IfA is not a σ-ﬁeld, we will still call µ a measure on (Ω, A), provided that (5)

holds for all sequences A n for which ∪ ∞

1 A n is inA We will not, however, use the

term “measure space” to describe such a triple We will consider below measures onﬁelds, on certain ¯π-systems, and on some other collections of sets A useful property

of a collection of sets is that along with any sets A1, , A k it also include all sets

for all disjoint sequences A n inA for whichn

1A n is also in A.

Deﬁnition 1.3 (Outer measures) Consider a set function µ ∗: 2Ω→ [0, ∞].

(a) Suppose that µ ∗ satisﬁes the following three properties

Null: µ ∗ ∅) = 0.

Monotone: µ ∗ (A) ≤ µ ∗ (B) for all A ⊂ B.

Countable subadditivity: µ ∗ ∞

1 A n)≤∞1 µ ∗ (A n ) for all A n

Then µ ∗ is called an outer measure.

(b) An arbitrary subset A of Ω is called µ ∗ -measurable if

µ ∗ (T ) = µ ∗ (T A) + µ(T A c) for all subsets T ⊂ Ω.

(7)

Sets T used in this capacity are called test sets.

(c) We letA ∗ denote the class of all µ ∗ -measurable sets, that is,

A ∗ ≡ {A ∈ 2Ω: A is µ ∗-measurable}.

(8)

[Note that A ∈ A ∗ if and only if µ ∗ (T ) ≥ µ ∗ (T A) + µ ∗ (T A c ) for all T ⊂ Ω, since

the other inequality is trivial by the subadditivity of µ ∗.]

Motivation 1.2 (Measure) In this paragraph we will consider only one possible

measure µ, namely the Lebesgue-measure generalization of length Let C I denote

the set of all intervals of the types (a, b], ( −∞, b], and (a, +∞) on the real line R,

and for each of these intervals I we assign a measure value µ(I) equal to its length,

thus∞, b − a, ∞ in the three special cases All is well until we manipulate the sets

Trang 24

inC I, as even the union of two elements in C I need not be in C I Thus,C I is not

a very rich collection of sets A natural extension is to letC F denote the collection

of all ﬁnite disjoint unions of sets inC I , where the measure µ(A) we assign to each such set A is just the sum of the measures (lengths) of all its disjoint pieces Now C F

is a ﬁeld, and is thus closed under the elementary operations of union, intersection,and complementation Much can be done using onlyC F and letting “measure” bethe “exact length” But C F is not closed under passage to the limit, and it is thus

insuﬃcient for many of our needs For this reason the concept of the smallest σ-ﬁeld

containingC F, labeledB ≡ σ[C F], is introduced We callB the Borel sets But let

us work backwards Let us assign an outer measure value µ ∗ (A) to every subset A in

the class 2R of all subsets of the real line R In particular, to any subset A we assign the value µ ∗ (A) that is the inﬁmum of all possible numbers∞

n=1 µ(A n), in which

each A n is in the ﬁeldC F (so that we know its measure) and in which the A n’s form

a cover of A (in that A ⊂ ∪ ∞

1 A n) Thus each number∞

1 µ(A n) is a natural upper

bound to the measure (or generalized length) of the set A, and we will specify the inﬁmum of such upper bounds to be the outer measure of A Thus to each subset

A of the real line we assign a value µ ∗ (A) of generalized length This value seems

“reasonable”, but does it “perform correctly”? Let us say that a particular set A is

µ ∗ -measurable (that is, it “performs correctly”) if µ ∗ (T ) = µ ∗ (T A) + µ ∗ (T A c) for

all subsets of the real line R, that is, if the A versus A c division of the line divides

every subset T of the line into two pieces in a fashion that is µ ∗-additive This isundoubtedly a combination of reasonableness and ﬁne technicality that took sometime to evolve in the mind of its creator, Carath´eodory, while he searched for acondition that “worked” In what sense does it “work”? The collection A ∗ of all

µ ∗ -measurable sets turns out to be a σ-ﬁeld Thus the collection A ∗ is closed under

all operations that we are likely to perform; and it is big enough, in that it is a

σ-ﬁeld that contains C F Thus we will work with the restriction µ ∗ |A ∗ of µ ∗to the

sets of A ∗ (here, the vertical line means “restricted to”) This is enough to meet

our needs

There are many measures other than length For an and right-continuous

function F on the real line (called a generalized df) we deﬁne the Stieltjes measure

of an arbitrary interval (a, b] (with −∞ ≤ a < b ≤ ∞) in C I by µ F ((a, b]) =

F (b) − F (a), and we extend it to sets in C F by adding up the measure of the pieces

Reapplying the previous paragraph, we can extend µ F to the µ ∗ F-measurable sets It

is the important Carath´eodory extension theorem that will establish that all Stieltjes

measures (including the case of ordinary length, where F (x) = x as considered in

the ﬁrst paragraph) can be extended from C F to the Borel sets B That is, all

Borel sets are µ ∗-measurable for every Stieltjes measure One further extension ispossible, in that every measure can be “completed” (see the end of section 1.2) We

note here only that when the Stieltjes measure µ F associated with the generalized df

F is “completed”, its domain of deﬁnition is extended from the Borel sets B (which

all Stieltjes measures have in common) to a larger collection ˆB µ F that depends on

the particular F It is left to section 1.2 to simply state that this is as far as we

can go That is, except in trivial special cases, we ﬁnd that ˆB µ F is a proper subset

of 2R (That is, it is typically impossible to try to deﬁne the measure of all subsets

of Ω in a suitable fashion.) 2

Trang 25

Example 1.1 (Some examples of measures, informally)

(a) Lebesgue measure:

Let λ(A) denote the length of A.

(b) Counting measure:

Let #(A) denote the number of “points” in A (or the cardinality of A).

(c) Unit point mass:

Let δ ω0(A) ≡ 1 {ω0} (A), assigning measure 1 or 0 as ω0∈ A or not 2

Example 1.2 (Borel sets)

(a) Let Ω = R and let C consist of all ﬁnite disjoint unions of intervals of the types

(a, b], ( −∞, b], and (a, +∞) Clearly, C is a ﬁeld Then B ≡ σ[C] will be called the Borel sets (or the Borel subsets of R) Let µ(A) be deﬁned to be the sum of the

lengths of the intervals composing A, for each A ∈ C Then µ is a c.a measure on

the ﬁeld C, as will be seen in the proof of theorem 1.3.1 below.

(b) If (Ω, d) is a metric space and U ≡ {all d-open subsets of Ω}, then B ≡ σ[U]

will be called the Borel sets or the Borel σ-ﬁeld

(c) If (Ω, d) is (R, | · |) for absolute value | · |, then σ[C] = σ[U] even though C = U.

[This claim is true, sinceC ⊂ σ[U] and U ⊂ σ[C] are clear Then, just make a trivial

appeal to exercise 1.1.]

(d) Let ¯R ≡ [−∞, = ∞] denote the extended real line and let ¯ B ≡ σ[B, {−∞}, {+∞}] 2

Proposition 1.2 (Monotone property of measures) Let (Ω, A, µ) denote a

measure space Let (A1, A2, ) be in A.

(a) If A n ⊂ A n+1 for all n, then

[Letting Ω denote the real line R, letting A n = [n, ∞), and letting µ denote either

Lebesgue measure or counting measure, we see the need for some requirement.]

(c) (Countable subadditivity) Whenever (A1, A2, ) and ∪ ∞

1 A n are all inA, then µ( ∞

1 A k)≤∞1 µ(A k) ;

and this also holds true for a measure on a ﬁeld or on a ¯π-system.

Proof. (a) Now,

Trang 26

(b) Without loss of generality, redeﬁne A1= A2=· · · = A n0 Let B n ≡ A1\A n,

so that B n Thus, on the one hand,

1 Then these newly

deﬁned sets B k are disjoint, and ∪ n

Let n → ∞, and use part (a) to get the result 2

Deﬁnition 1.4 (Continuity of measures) A measure µ is continuous from below

(above) if µ(lim A n ) = lim µ(A n ) for all A n (for all A n , with at least one µ(A n ) ﬁnite) We call µ continuous in case it is continuous both from below and

Trang 27

giving the required countable additivity Suppose next that µ is ﬁnite and is also

continuous from above at∅ Then f.a (even if A is only a ﬁeld) gives

Deﬁnition 1.5 (liminf and limsup of sets) Let

lim A n ≡∞ n=1∞ k=n A k ={ω : ω is in all but ﬁnitely many A n’s}

where we use i.o to abbreviate inﬁnitely often.

[It is important to learn to read these two mathematical equations in a way thatmakes it clear that the verbal description is correct.] Note that we always have

lim A n ⊂ lim A n Deﬁne

lim A n ≡ lim A n whenever lim A n = lim A n

(14)

We also let lim inf A n ≡ lim A n and lim sup A n ≡ lim A n, giving us alternativenotations

Deﬁnition 1.6 (lim inf and lim sup of numbers) Recall that for real number

sequences a n one deﬁnes lim a n ≡ lim inf a n and lim a n ≡ lim sup a n by

lim infn →∞ a n ≡ lim n →∞(infk ≥n a k) = supn≥1(infk ≥n a k) andlim supn→∞ a n ≡ lim n→∞

Trang 28

Deﬁnition 1.7 (“Little oh”, “big oh”, and “at most” ⊕) We write:

This last notation allows us to string inequalities together linearly, instead of having

to start a new inequality on a new line (I use it often.)

Proposition 1.4 Clearly, lim A n equals∪ ∞

1 A n when A n is an sequence, and

lim A n equals∩ ∞

1 A n when A n is a  sequence.

Exercise 1.2 We always have µ(lim inf A n)≤ lim inf µ(A n), while the inequality

lim sup µ(A n)≤ µ(lim sup A n ) holds if µ(Ω) < ∞.

Exercise 1.3 (π-systems and λ-systems) A classD of subsets is called a λ-system

if it contains the space Ω and all proper diﬀerences (A \ B, for A ⊂ B with both

A, B ∈ D) and if it is closed under monotone increasing limits [Recall that a class

is called a π-system if it is closed under ﬁnite intersections, while ¯ π-systems are also

(c) LetC be a π-system and let D be a λ-system Then C ⊂ D implies that σ[C] ⊂ D.

Proposition 1.5 (Dynkin’s π-λ theorem) Let µ and µ be two measures on

the measurable space (Ω, A) Let C be a ¯π-system, where C ⊂ A Then

µ = µ on the ¯π-system C implies µ = µ on σ[ C].

(18)

Proof. We ﬁrst show that

σ[ C] = λ[C] whenC is a π-system.

(19)

LetD ≡ λ[C] By the easy exercise 1.3(a)(b) it suﬃces to show that D is a π-system

(that is, that A, B ∈ D implies A ∩ B ∈ D) We ﬁrst go just halfway; let

E ≡ {A ∈ D : AC ∈ D for all C ∈ C}.

(a)

Then C ⊂ E Also, for A, B ∈ E with B ⊂ A and for C ∈ C we have (since both

AC and BC are in D) that (A \ B)C = (AC \ BC) ∈ D, so that A \ B ∈ E Thus

E = D, since D was the smallest such class We have thus learned of D that

AC ∈ D for all C ∈ C, for each A ∈ D.

(b)

Trang 29

To go the rest of the way, we deﬁne

F ≡ {D ∈ D : AD ∈ D for all A ∈ D}.

(c)

ThenC ⊂ F, by (b) Also, for A, B ∈ F with B ⊂ A and for D ∈ D we have (since

both AD and BD are in D) that (A \ B)D = (AD \ BD) ∈ D, so that A \ B ∈ F.

ThusF = D, since D was the smallest such class We have thus learned of D that

AD ∈ D for all A ∈ D, for each D ∈ D.

(d)

That is,D is closed under intersections; and thus D is a π-system.

We will now demonstrate thatG ≡ {A ∈ A : µ(A) = µ (A) } is a λ-system on Ω.

First, Ω∈ G, since Ω is in the ¯π-system C Second, when A ⊂ B are both in G we

have the equality

µ(B \ A) = µ(B) − µ(A) = µ (B) − µ (A) = µ (B \ A),

(e)

giving B \ A ∈ G Finally, let A n A with all A n’s inG Then proposition 1.2(i)

yields the result

µ(A) = lim µ(A n ) = lim µ (A n ) = µ (A),

(f)

so that A ∈ G Thus G is a λ-system.

Thus the set G on which µ = µ is a λ-system that contains the ¯ π-system C.

Applying (19) shows that σ[ C] ⊂ G 2

The previous proposition is very useful in extending independence from smallclasses of sets to large ones The next proposition is used in proving the Carath´eodoryextension theorem, Fubini’s theorem, and the existence of a regular conditionalprobability distribution

Proposition 1.6 (Minimal monotone class; Halmos) The minimal monotoneclass M ≡ m[C] containing the ﬁeld C and the minimal σ-ﬁeld σ[C] generated

by the same ﬁeldC satisfy

m[ C] = σ[C] whenC is a ﬁeld.

(20)

Proof. Since σ-ﬁelds are monotone classes, we have that σ[ C] ⊃ M If we now

show thatM is a ﬁeld, then proposition 1.1.1(c) will imply that σ[C] ⊂ M.

To show thatM is a ﬁeld, it suﬃces to show that

A, B in M implies AB, A c B, AB c are inM.

(a)

Suppose that (a) has been established We will now show that (a) implies thatM

is a ﬁeld

Complements: Let A ∈ M, and note that Ω ∈ M, since C ⊂ M Then A, Ω ∈ M

implies that A c = A cΩ∈ M by (a).

Trang 30

Unions: Let A, B ∈ M Then A ∪ B = (A c ∩ B c)c ∈ M.

ThusM is indeed a ﬁeld, provided that (a) is true It thus suﬃces to prove (a).

For each A ∈ M, let M A ≡ {B ∈ M : AB, A c B, AB c ∈ M} Note that it

suﬃces to prove that

M A=M for each ﬁxed A ∈ M.

(b)

We ﬁrst show that

M A is a monotone class

(c)

Let B n be monotone in M A , with limit set B Since B n is monotone in M A, it

is also monotone in M, and thus B ≡ lim n B n ∈ M Since B n ∈ M A, we have

AB n ∈ M, and since AB n is monotone inM, we have AB = lim n AB n ∈ M In

like fashion, A c B and AB c are in M Therefore, B ∈ M A, by deﬁnition of M A.That is, (c) holds

We next show that

M A=M for each ﬁxed A ∈ C.

(d)

Let A ∈ C and let C ∈ C Then A ∈ M C, since C is a ﬁeld But A ∈ M C if and

only if C ∈ M A, by the symmetry of the deﬁnition of M A Thus C ⊂ M A That

is, C ⊂ M A ⊂ M, and M A is a monotone class by (c) But M is the minimal

monotone class containingC, by the deﬁnition of M Thus (d) holds But in fact,

we shall now strengthen (d) to

M B =M for each ﬁxed B ∈ M.

(e)

The conditions for membership inM imposed on pairs A, B are symmetric Thus

for A ∈ C, the statement established above in (d) that B ∈ M(= M A) is true if and

only if A ∈ M B ThusC ⊂ M B, whereM B is a monotone class ThusM B=M,

since (as was earlier noted)M is the smallest such monotone class Thus (e) (and

hence (a)) is established 2

Trang 31

Deﬁnition 2.1 (Outer extension) Let Ω be arbitrary Let µ be a measure on a

ﬁeldC of subsets Ω For each A ∈ 2Ωdeﬁne

µ ∗ (A) ≡ inf

∞ n=1

Theorem 2.1 (Carath´ eodory extension theorem) A measure µ on a ﬁeld C

can be extended to a measure on the σ-ﬁeld σ[ C] generated by C, by deﬁning

µ(A) ≡ µ ∗ (A) for each A in A ≡ σ[C].

(2)

If µ is σ-ﬁnite on C, then the extension is unique on A and is also σ-ﬁnite.

Proof. The proof proceeds by a series of claims

Claim 1: µ ∗ is an outer measure on (Ω, 2Ω)

Null: Now, µ ∗ ∅) = 0, since ∅, ∅, is a covering of ∅.

Monotone: Let A ⊂ B Then every covering of B is also a covering of A Thus

µ ∗ (A) ≤ µ ∗ (B).

Countably subadditive: Let all A n n there

is a covering{A nk : k ≥ 1} such that

Claim 2: µ ∗ |C = µ (that is, µ ∗ (C) = µ(C) for all C ∈ C) and C ⊂ A ∗.

Let C ∈ C Then µ ∗ (C) ≤ µ(C), since C, ∅, ∅, is a covering of C For the other

direction, we let A1, A2, be any covering of C Since µ is c.a on C, and since

∪ ∞

1 (A n ∩ C) = C ∈ C, we have from proposition 1.1.2(c) that

µ(C) = µ( ∞

1 (A n ∩ C)) =∞1 µ(A n ∩ C) ≤∞1 µ(A n ), and thus µ(C) ≤ µ ∗ (C) Thus µ(C) = µ ∗ (C) We next show that any C ∈ C is

also in A ∗ Let C

{A n } ∞ ⊂ C of T such that

Trang 32

µ ∗ ≥∞1 µ(A n) since µ ∗ (T ) is an inﬁmum

(a)

=∞

1 µ(CA n) +∞

1 µ(C c A n)

since µ is c.a on C with C and A n inC

≥ µ ∗ (CT ) + µ ∗ (C c T ) since CA n covers CT and C c A n covers C c T.

(b)

∈ A ∗ ThusC ⊂ A ∗.

Claim 3: The classA ∗ of µ ∗-measurable subsets of Ω is a ﬁeld that containsC.

Now, A ∈ A ∗ implies that A c ∈ A ∗ : The deﬁnition of µ ∗-measurable is symmetric

in A and A c And A, B ∈ A ∗ implies that AB ∈ A ∗ : For any test set T ⊂ Ω we

have the required inequality

Claim 4: µ ∗ is a f.a measure onA ∗.

Let A, B ∈ A ∗ be disjoint Finite additivity follows from

µ ∗ (A + B) = µ ∗ ((A + B)A) + µ ∗ ((A + B)A c)

since A ∈ A ∗ with test set A + B

= µ ∗ (A) + µ ∗ (B).

(d)

Trivially, µ ∗ (A) ≥ 0 for all sets A And µ ∗ ∅) = 0, since φ, φ, is a covering of φ.

Claim 5: A ∗ is a σ-ﬁeld, and it contains σ[ C].

We will show that A ≡∞1 A n ∈ A ∗ whenever all A

Trang 33

in its own right.

Claim 7: Uniqueness holds when µ is a ﬁnite measure.

Let µ1and µ2denote any two extensions of µ Let M ≡ {A ∈ σ[C] : µ1(A) = µ2(A) }

denote the class where they are equal We will ﬁrst show that

M is a monotone class.

(h)

Let A n be monotone inM Then

µ1( lim A n ) = lim µ1(A n) by propositions 1.1.4 and 1.1.2

Thus (h) holds SinceC ⊂ M, the minimal monotone class result of proposition 1.1.6

implies that σ[ C] ⊂ M Thus µ1= µ2on σ[ C] (and possibly on even more sets than

this) Thus the claimed uniqueness holds [Appeal to proposition 1.1.6 could be

replaced by appeal to Dynkin’s π-λ theorem of proposition 1.1.5.]

Claim 8: Uniqueness holds when µ is a σ-ﬁnite measure (label the sets of the

measurable partition as Ωn)

We must again demonstrate the uniqueness Fix n We will consider µ, µ1, µ2onC,

on σ[ C] ∩ Ω n , and on σ[ C ∩ Ω n] We ﬁrst show that

completing the proof 2

Question We extended our measure µ from the ﬁeld C to a collection A ∗that is

at least as big as the σ-ﬁeld σ[ C] Have we actually gone beyond σ[C]? Can we go

further?

Trang 34

Deﬁnition 2.2 (Complete measures) Let (Ω, A, µ) denote a measure space.

If µ(A) = 0, then A is called a null set We call (Ω, A, µ) complete if whenever we

have B ⊂ (some A) ∈ A with µ(A) = 0, we necessarily also have B ∈ A [That is,

all subsets of sets of measure 0 are required to be measurable.]

Exercise 2.1 (Completion) Let (Ω, A, µ) denote a measure space Show that

for all A ∈ A and for all N ⊂ (some B) ∈ A having µ(B) = 0 Show that (Ω, ˆ A µ , ˆ

is a complete measure space for which ˆµ |A = µ [Note: A proof must include

a demonstration that deﬁnition (7) leads to a well-deﬁned ˆµ That is, whenever

A1∪N1= A2∪N2we must have µ(A1) = µ(A2), so that ˆµ(A1∪N1) = ˆµ(A2∪N2).]

Deﬁnition 2.3 (Lebesgue sets) The completion of Legesgue measure on (R, B, λ)

is still called Lebesgue measure The resulting completed σ-ﬁeld ˆ B λ of the BorelsetsB is called the Lebesgue sets.

Corollary 1 When we complete a measure µ on a σ-ﬁeld A, this completed

measure ˆµ is the unique extension of µ to ˆ A µ [It is typical to denote the extension

by µ also (rather than ˆ µ).]

Corollary 2 Thus when we begin with a σ-ﬁnite measure µ on a ﬁeld C, both

the extension to A ≡ σ[C] and the further extension to ˆ A µ ≡ ˆσ[C] µ are unique.Here, we note that all sets in ˆA µ = ˆσ[ C] µare in the classA ∗ of µ ∗-measurable sets.

Proof. Consider corollary 1 ﬁrst Let ν denote any extension to ˆ A µ We willdemonstrate that

ν(A ∪ N) = µ(A) for all A ∈ A, and all null sets N

(a)

(that is, ν = ˆ µ) Assume not Then there exist sets A ∈ A and N ⊂ (some B) in A

with µ(B) = 0 such that ν(A ∪ N) > µ(A) [necessarily, ν(A ∪ N) ≥ ν(A) = µ(A)].

For this A and N we have

µ(A) < ν(A ∪ N) = ν(A ∪ (A c

N )) where A c N ⊂ A c

B = (a null set)

= ν(A) + ν(A c N ) ≤ ν(A) + ν(B)

(b)

since ν is a measure on the completion

= µ(A) + µ(B) since ν is an extension of µ.

(c)

Hence µ(B) > 0, which is a contradiction Thus the extension is unique.

Trang 35

We now turn to corollary 2 Only the ﬁnal claim needs demonstrating Suppose

A is in ˆ σ[ C] µ Then A = A ∪ N for some A ∈ A and some N satisfying N ⊂ B

with µ(B) = 0 Since A ∗ is a σ-ﬁeld, it suﬃces to show that any such N is in A ∗.

Since µ ∗ is subadditive and monotone, we have

µ ∗ (T ) ≤ µ ∗ (T N ) + µ ∗ (T N c ) = µ ∗ (T N c)≤ µ ∗ (T ),

(d)

because µ ∗ (T N ) = 0 follows from using B, ∅ ∅, to cover T N Thus equality holds

in this last equation, showing that N is µ ∗-measurable 2

Exercise 2.2 Let µ and ν be ﬁnite measures on (Ω, A).

(a) Show by example that ˆA µ and ˆA ν need not be equal

(b) Prove or disprove: ˆA µ= ˆA ν if and only if µ and ν have exactly the same sets

of measure zero

(c) Give an example of an LS-measure µ on R for which ˆ B µ= 2R

Exercise 2.3 (Approximation lemma; Halmos) Let the σ-ﬁnite measure µ

on the ﬁeld C be extended to A = σ[C], and also refer to the extension as µ Then

for each A ∈ A (or in ˆ A µ ) such that µ(A) <

µ(A for some set C ∈ C.

(8)

[Hint Truncate the sum in (1.2.1) to deﬁne C, when A ∈ A.]

Deﬁnition 2.4 (Regular measures on metric spaces) Let d denote a metric on Ω,

letA denote the Borel sets, and let µ be a measure on (Ω, A) Suppose that for each

set A in ˆ A µ and a closed set C 

for which both C ⊂ A ⊂ O and µ(O \ C ∞,

one then requires that the set C be compact Then µ is called a regular measure.

[Note exercise 1.3.1 below.]

Exercise 2.4 (Nonmeasurable sets) Let Ω consist of the sixteen values 1, , 16.

(Think of them arranged in four rows of four values.) Let

C1={1, 2, 3, 4, 5, 6, 7, 8}, C2={9, 10, 11, 12, 13, 14, 15, 16},

C3={1, 2, 5, 6, 9, 10, 13, 14}, C4={3, 4, 7, 8, 11, 12, 15, 16}.

LetC = {C1, C2, C3, C4}, and let A = σ[C].

(a) Show thatA ≡ σ[C] = 2Ω

(d) Illustrate proposition 2.2 below in the context of this exercise

Proposition 2.1 (Not all sets are measurable) Let µ be a measure on A ≡ σ[C],

withC a ﬁeld If B ∈ ˆ A µ , then there are inﬁnitely many measures on σ[ ˆ A µ ∪ {B}]

that agree with µ on C [Thus the σ-ﬁeld ˆ A µ is as far as we can go with the uniqueextension process.] (We merely state this observation for reference, without proof.)

[To exhibit a subset of R not in B requires the axiom of choice.]

Trang 36

Proposition 2.2 (Not all subsets are Lebesgue sets) There is a subset D of R

that is not in ˆB λ

Proof. Deﬁne the equivalence relation∼ on elements of [0, 1) by x ∼ y if x−y is a

rational number Use the axiom of choice to specify a set D that contains exactly one element from each equivalence class Now deﬁne D z ≡ {z + x (modulo 1) : x ∈ D}

for each rational z in [0, 1), so that [0, 1) =

z D z represents [0, 1) as a countable union of disjoint sets Moreover, all D z must have the same outer measure; call it a Assume D = D0 is measurable But then 1 = λ([0, 1)) =

Proposition 2.3 (Not all Lebesgue sets are Borel sets) There necessarily exists

a set A ∈ ˆ B λ \ B that is a Lebesgue set but not a Borel set.

Proof. This proof follows exercise 7.3.3 below 2

Exercise 2.5 Every subset A of Ω having µ ∗ (A) = 0 is a µ ∗-measurable set

Coverings

Earlier in this section we encountered Carath´eodory coverings

Exercise 2.6 (Vitali cover) (a) We say that a family V of intervals I is a Vitali cover of a set D if for each x

which x

(b) (Vitali covering theorem) Let D ⊂ R be arbitrary Let V be a Vitali cover of

D Then there exists a ﬁnite number of pairwise disjoint intervals (I1, , I m) inV

for which Lebesgue outer measure λ ∗ satisﬁes

λ ∗ (D \m

j=1 I j

(9)

[Lebesgue measure λ will be formally shown to exist in the next section, and λ ∗

will be discussed more fully.] [Result (9) will be useful in establishing the Lebesgue

result that increasing functions on R necessarily have a derivative, except perhaps

on a set having Lebesgue measure zero.]

Exercise 2.7 (Heine–Borel) If {U t : t ∈ T } is an arbitrary collection of open

sets that covers a compact subset D of R, then there exists a ﬁnite number of them U1, , U m that also covers D [We are merely stating this well-known and

frequently used result in the disguise of an exercise so that the reader can easilycontrast it with the two other ideas of Carath´eodory covering and Vitali coveringintroduced in this chapter.]

Trang 37

At the moment we know only a few measures informally We now construct thelarge class of measures that lies at the heart of probability theory

Deﬁnition 3.1 (Lebesgue–Stieltjes measure) A measure µ on the real line

R assigning ﬁnite values to ﬁnite intervals is called a Lebesgue–Stieltjes measure.

[The measure µ on (R, 2 R ) whose value µ(A) for any set A equals the number of rationals in A is not a Lebesgue–Stieltjes measure.]

Deﬁnition 3.2 (gdf) A ﬁnite function F on R that is right-continuous is

called a generalized df (to be abbreviated gdf) Then F −(·) ≡ lim y· F (y) denotes

the left-continuous version of F The mass function of F is deﬁned by

∆F ( ·) ≡ F (·) − F −(·), while F (a, b] ≡ F (b) − F (a) for all a < b

is called the increment function of F We identify gdfs having the same increment function Only one member F of each equivalence class obtained by such identiﬁ- cation satisﬁes F − (0) = 0, and this F can (and occasionally will) be used as the

representative member of the class (also to be called the representative gdf).

Example 3.1 We earlier deﬁned three measures on (R, B) informally.

(a) For Lebesgue measure λ, a gdf is the identity function F (x) = x.

(b) For counting measure, a gdf is the greatest integer function F (x) = [x] (c) For unit point mass at x0, a gdf is F (x) = 1 [x0,∞) (x) 2

Theorem 3.1 (Correspondence theorem; Lo` eve) The relationship

µ((a, b]) = F (a, b] for all − ∞ ≤ a < b ≤ +∞

(1)

establishes a 1-to-1 correspondence between Lebesgue–Stieltjes measures µ on B

and the representative members of the equivalence classes of generalized dfs [Each

such µ extends uniquely to ˆ B µ.]

Notation 3.1 We formally establish some notation that will be used throughout.Important classes of sets include:

C I ≡ {all intervals (a, b], (−∞, b], or (a, +∞) : −∞ < a < b < +∞}.

Trang 38

Proof. Given an LS-measure µ, deﬁne the increment function F (a, b] via (1).

We clearly have 0 ≤ F (a, b] < ∞ for all ﬁnite a, b, and F (a, b] → 0 as b a, by

proposition 1.1.2 Now specify F −(0)≡ 0, F (0) ≡ µ({0}), F (b) ≡ F (0) + F (0, b] for

b > 0, and F (a) = F (0) − F (a, 0] for a < 0 This F (·) is the representative gdf.

Given a representative gdf, we deﬁne µ on the collection I of all ﬁnite intervals

(a, b] via (1) We will now show that µ is a well-deﬁned and c.a measure on this

collectionI.

Nonnegative: µ ≥ 0 for any (a, b], since F is .

Null: µ( ∅) = 0, since ∅ = (a, a] and F (a, a] = 0.

Countably additive and well-deﬁned: Suppose I ≡ (a, b] =∞1 I n ≡∞1 (a n , b n]

We must show that µ(∞

1 I n) =∞

1 µ(I n)

First, we will show that∞

1 µ(I n)≤ µ(I) Fix n Thenn

Letting n → ∞ in (a) gives the ﬁrst claim.

Next, we will show that µ(I) ≤∞1 µ(I n ) Suppose b

is trivial, as µ( ∅) = 0) For each n ≥ 1, use the right continuity of F to choose an

n > 0 so small that

F (b n , b n n n , and deﬁne J n ≡ (a n , c n)≡ (a n , b n n ).

(b)

These J n

through these intervals one at a time, choose (a1, c1) to contain b, choose (a2, c2) to

contain a1, choose (a3, c3) to contain a2, ; ﬁnally (for some K), choose (a K , c K)

n I n ∈ C F with each I n of type (a, b], then we deﬁne µ(A) ≡n µ(I n) If we

also have another representation A =

m I m of this set, then we must show (where

the subscripts m and n could take on either a ﬁnite or a countably inﬁnite number

= µ(A).

(g)

Finally, a measure µ on C F determines a unique measure onB, as is guaranteed

by the Carath´eodory extension of theorem 1.2.1 2

Trang 39

Exercise 3.1 Show that all Lebesgue–Stieltjes measures on (R, B) are regular

measures

Probability Measures, Probability Spaces, and DFs

Deﬁnition 3.3 (Probability distributions P ( ·) and dfs F (·))

(a) In probability theory we think of Ω as the set of all possible outcomes of some

experiment, and we refer to it as the sample space The individual points ω in Ω are referred to as the elementary outcomes The measurable subsets A in the collection

A are referred to as events A measure of interest is now denoted by P ; it is called a probability measure, and must satisfy P (Ω) = 1 We refer to P (A) as the probability

of A, for each event A in ˆ A P The triple (Ω, A, P ) (or (Ω, ˆ A P , ˆ P ), if this is diﬀerent)

is referred to as a probability space.

(b) An right-continuous function F on R having F (−∞) ≡ lim x→−∞ F (x) = 0

and F (+ ∞) ≡ lim x→+∞ F (x) = 1 is called a distribution function (which we will

abbreviate as df) [For probability measures, setting F ( −∞) = 0 is used to specify

the representative df.]

Corollary 1 (The correspondence theorem for dfs) Deﬁning P ( ·) on all intervals

(a, b] via P ((a, b] ) ≡ F (b) − F (a) for all −∞ ≤ a < b ≤ +∞ establishes a 1-to-1

correspondence between all probability distributions P ( ·) on (R, B) and all dfs F (·)

on R.

Exercise 3.2 Prove the corollary

Trang 40

Measurable Functions and

Convergence

Notation 1.1 (Inverse images) Suppose X denotes a function mapping some

set Ω into the extended real line ¯R ≡ R ∪ {±∞}; we denote this by X : Ω → ¯ R.

Let X+and X − denote the positive part and the negative part of X, respectively:

X+(ω) ≡

X(ω) if X(ω) ≥ 0,

0 else,(1)

We also use the following notation:

[ X = r ] ≡ X −1 (r) ≡ { ω : X(ω) = r } for all real r,

(a) For Lebesgue measure λ, a gdf is the identity function F (x) = x.

(b) For counting measure, a gdf is the greatest integer function F (x) = [x] (c) For. ..

measures

Probability Measures, Probability Spaces, and DFs

Deﬁnition 3.3 (Probability distributions P ( ·) and dfs F (·))

(a) In probability theory... of the way, we deﬁne

F ≡ {D ∈ D : AD ∈ D for all A ∈ D}.

(c)

ThenC ⊂ F, by (b) Also, for A, B ∈ F with B ⊂ A and for D ∈ D we have (since

both AD and

Định dạng
Số trang	599
Dung lượng	3,06 MB