viii PROBABILITY FOR STATISTICIANSof statististics are introduced in Section 9.2, while useful linear algebra and themultivariate normal distribution are the subjects of Section 9.3 and
Trang 1Probability for Statisticians
Galen R Shorack
Springer
Trang 8There is a thin self-contained textbook within this larger presentation.
To be sure that this is well understood, I will describe later how I have used thistext in the classroom at the University of Washington in Seattle
Let me first indicate what is different about this book As implied by the
title, there is a difference Not all the difference is based on inclusion of statisticalmaterial (To begin, Chapters 1–6, provide the mathematical foundation for therest of the text Then Chapters 7–8 hone some tools geared to probability theory,while Chapter 9 provides a brief introduction to elementary probability theory rightbefore the main emphasis of the presentation begins.)
The classical weak law of large numbers (WLLN) and strong law of large numbers(SLLN) as presented in Sections 10.2–10.4 are particularly complete, and they alsoemphasize the important role played by the behavior of the maximal summand.Presentation of good inequalities is emphasized in the entire text, and this chapter
is a good example Also, there is an (optional) extension of the WLLN in Section10.6 that focuses on the behavior of the sample variance, even in very generalsituations
Both the classical central limit theorem (CLT) and its Lindeberg and Liapunovgeneralizations are presented in two different chapters They are first presented inChapter 11 via Stein’s method (with a new twist), and they are again presented inChapter 14 using the characteristic function (chf) methods introduced in Chapter
13 The CLT proofs given in Chapter 11 are highly efficient Conditions for boththe weak bootstrap and the strong bootstrap are developed in Chapter 11, as is
a universal bootstrap CLT based on light trimming of the sample The approachemphasizes a statistical perspective Much of Section 11.1 and most of Sections11.2–11.5 are quite unusual I particularly like this chapter Stein’s method is alsoused in the treatment of U-statistics and Hoeffding’s combinatorial CLT (whichapplies to sampling from a finite population) in the optional Chapter 17 Also, thechf proofs in Section 14.2 have a slightly unusual starting point, and the approach
to gamma approximations in the CLT in Section 14.4 is new
Both distribution functions (dfs F ( ·)) and quantile functions (qfs K(·) ≡ F −1(·))
are emphasized throughout (quantile functions are important to statisticians) InChapter 7 much general information about both dfs and qfs and the Winsorizedvariance is developed The text includes presentations showing how to exploit the
inverse transformation X ≡ K(ξ) with ξ ∼ = Uniform(0, 1) In particular, Chapter 7
inequalities relating the qf and the Winsorized variance to some empirical process sults of Chapter 12 are used in Chapter 16 to treat trimmed means and L-statistics,rank and permutation tests, sampling from finite populations, and bootstrapping.(Though I am very fond of Sections 7.6–7.11, their prominence is minimized in thesubsequent parts of the text.)
re-At various points in the text choices can be voluntarily made that will offerthe opportunity for a statistical example or foray (Even if the instructor does notexercise a particular choice, a student can do so individually.) After the elementaryintroduction to probability theory in Section 9.1, many of the classical distributions
Trang 9viii PROBABILITY FOR STATISTICIANS
of statististics are introduced in Section 9.2, while useful linear algebra and themultivariate normal distribution are the subjects of Section 9.3 and Section 9.4.Following the CLT via Stein’s method in Section 11.1, extensions in Section 11.2–11.3, and application of these CLTs to the bootstrap in Sections 11.4–11.5, there
is a large collection of statistical examples in Section 11.6 During presentation ofthe CLT via chfs in Chapter 14, statistical examples appear in Sections 14.1, 14.2,and 14.4 Statistical applications based on the empirical df appear in Sections 12.10and 12.12 The highly statistical optional Chapters 16 and 17 were discussed brieflyabove Also, the conditional probability Sections 8.5 and 8.6 emphasize statistics.Maximum likelihood ideas are presented in Section A.2 of Appendix A Many usefulstatistical distributions contain parameters as an argument of the gamma function.For this reason, the gamma and digamma functions are first developed in SectionA.1 Section A.3 develops cumulants, Fisher information, and other useful facts for
a number of these distributions Maximum likelihood proofs are in Section A.4
It is my hope that even those well versed in probability theory will find somenew things of interest
I have learned much through my association with David Mason, and I would like
to acknowledge that here Especially (in the context of this text), Theorem 12.4.3
is a beautiful improvement on Theorem 12.4.2, in that it still has the potentialfor necessary and sufficient results I really admire the work of Mason and hiscolleagues It was while working with David that some of my present interestsdeveloped In particular, a useful companion to Theorem 12.4.3 is knowledge ofquantile functions Sections 7.6–7.11 present what I have compiled and produced
on that topic while working on various applications, partially with David
Jon Wellner has taught from several versions of this text In particular, hetyped an earlier version and thus gave me a major critical boost That head start
is what turned my thoughts to writing a text for publication Sections 8.6, 19.2,and the Hoffman–Jorgensen inequalities came from him He has also formulated anumber of exercises, suggested various improvements, offered good suggestions andreferences regarding predictable processes, and pointed out some difficulties Mythanks to Jon for all of these contributions (Obviously, whatever problems mayremain lie with me.)
My thanks go to John Kimmel for his interest in this text, and for his help andguidance through the various steps and decisions Thanks also to Lesley Poliner,David Kramer, and the rest at Springer-Verlag It was a very pleasant experience.This is intended as a textbook, not as a research manuscript Accordingly, themain body is lightly referenced There is a section at the end that contains somediscussion of the literature
Trang 10Use of this Text xiii
Chapter 1 Measures
1 Basic Properties of Measures 1
2 Construction and Extension of Measures 12
3 Lebesgue–Stieltjes Measures 18
Chapter 2 Measurable Functions and Convergence
1 Mappings and σ-Fields 21
2 Measurable Functions 24
3 Convergence 29
4 Probability, RVs, and Convergence in Law 33
5 Discussion of Sub σ-Fields 35
Chapter 3 Integration
1 The Lebesgue Integral 37
2 Fundamental Properties of Integrals 40
3 Evaluating and Differentiating Integrals 44
4 Inequalities 46
5 Modes of Convergence 51
Chapter 4 Derivatives via Signed Measures
1 Decomposition of Signed Measures 61
2 The Radon–Nikodym Theorem 66
3 Lebesgue’s Theorem 70
4 The Fundamental Theorem of Calculus 74
Chapter 5 Measures and Processes on Products
1 Finite-Dimensional Product Spaces 79
2 Random Vectors on (Ω, A, P ) 84
3 Countably Infinite Product Probability Spaces 86
4 Random Elements and Processes on (Ω, A, P ) 90
Chapter 6 General Topology and Hilbert Space
1 General Topology 95
2 Metric Spaces 101
3 Hilbert Space 104
Trang 11x PROBABILITY FOR STATISTICIANS
Chapter 7 Distribution and Quantile Functions
1 Character of Distribution Functions 107
2 Properties of Distribution Functions 110
3 The Quantile Transformation 111
4 Integration by Parts Applied to Moments 115
5 Important Statistical Quantities 119
6 Infinite Variance 123
7 Slowly Varying Partial Variance 127
8 Specific Tail Relationships 134
9 Regularly Varying Functions 137
10 Some Winsorized Variance Comparisons 140
11 Inequalities for Winsorized Quantile Functions 147
Chapter 8 Independence and Conditional Distributions
1 Independence 151
2 The Tail σ-Field 155
3 Uncorrelated Random Variables 157
4 Basic Properties of Conditional Expectation 158
5 Regular Conditional Probability 168
6 Conditional Expectations as Projections 174
Chapter 9 Special Distributions
1 Elementary Probability 179
2 Distribution Theory for Statistics 187
3 Linear Algebra Applications 191
4 The Multivariate Normal Distribution 199
Chapter 10 WLLN, SLLN, LIL, and Series
0 Introduction 203
1 Borel–Cantelli and Kronecker Lemmas 204
2 Truncation, WLLN, and Review of Inequalities 206
3 Maximal Inequalities and Symmetrization 210
4 The Classical Laws of Large Numbers, LLNs 215
5 Applications of the Laws of Large Numbers 223
6 General Moment Estimation 226
7 Law of the Iterated Logarithm 235
8 Strong Markov Property for Sums of IID RVs 239
9 Convergence of Series of Independent RVs 241
10 Martingales 246
11 Maximal Inequalities, Some with Boundaries 247
12 A Uniform SLLN 252
Trang 12Chapter 11 Convergence in Distribution
1 Stein’s Method for CLTs 255
2 ˜Winsorization and ˇTruncation 264
3 Identically Distributed RVs 269
4 Bootstrapping 274
5 Bootstrapping with Slowly Increasing Trimming 276
6 Examples of Limiting Distributions 279
7 Classical Convergence in Distribution 288
8 Limit Determining Classes of Functions 292
Chapter 12 Brownian Motion and Empirical Processes
1 Special Spaces 295
2 Existence of Processes on (C, C) and (D, D) 298
3 Brownian Motion and Brownian Bridge 302
4 Stopping Times 305
5 Strong Markov Property 308
6 Embedding a RV in Brownian Motion 311
7 Barrier Crossing Probabilities 314
8 Embedding the Partial Sum Process 318
9 Other Properties of Brownian Motion 323
10 Various Empirical Processes 325
11 Inequalities for the Various Empirical Processes 333
12 Applications 338
Chapter 13 Characteristic Functions
1 Basic Results, with Derivation of Common Chfs 341
2 Uniqueness and Inversion 346
3 The Continuity Theorem 350
4 Elementary Complex and Fourier Analysis 352
5 Esseen’s Lemma 358
6 Distributions on Grids 361
7 Conditions for φ to Be a Characteristic Function 363
Chapter 14 CLTs via Characteristic Functions
0 Introduction 365
1 Basic Limit Theorems 366
2 Variations on the Classical CLT 371
3 Local Limit Theorems 380
4 Gamma Approximation 383
5 Edgeworth Expansions 390
6 Approximating the Distribution of h( ¯ X n) 396
Trang 13xii PROBABILITY FOR STATISTICIANS
Chapter 15 Infinitely Divisible and Stable Distributions
1 Infinitely Divisible Distributions 399
2 Stable Distributions 407
3 Characterizing Stable Laws 410
4 The Domain of Attraction of a Stable Law 412
Chapter 16 Asymptotics via Empirical Proceses
0 Introduction 415
1 Trimmed and Winsorized Means 416
2 Linear Rank Statistics and Finite Sampling 426
1 Basic Technicalities for Martingales 467
2 Simple Optional Sampling Theorem 472
3 The Submartingale Convergence Theorem 473
4 Applications of the S-mg Convergence Theorem 481
5 Decomposition of a Submartingale Sequence 487
6 Optional Sampling 492
7 Applications of Optional Sampling 499
8 Introduction to Counting Process Martingales 501
9 Doob–Meyer Submartingale Decomposition 511
10 Predictable Processes and ·
0H dM Martingales 516
11 The Basic Censored Data Martingale 522
12 CLTs for Dependent RVs 529
Chapter 19 Convergence in Law on Metric Spaces
1 Convergence in Distribution on Metric Spaces 531
2 Metrics for Convergence in Distribution 540
Appendix A Distribution Summaries
1 The Gamma and Digamma Functions 546
2 Maximum Likelihood Estimators and Moments 551
3 Examples of Statistical Models 555
4 Asymptotics of Maximum Likelihood Estimation 563
Trang 14The University of Washington is on the quarter system, so my description will reflectthat My thoughts are offered as a potential help, not as an essential recipe.The reader will note that the problems are interspersed with the text It isimportant to read them as they are encountered.
Chapters 1–5 provide the measure-theoretic background that is necessary forthe rest of the text Many of our students have had at least some kind of anundergraduate exposure to part of this subject Still, it is important that I presentthe key parts of this material rather carefully I feel it is useful for all of them
Chapter 1 (measures; 5 lectures)
Emphasized in my presentation are generators, the monotone property of measures,the Carath´eodory extension theorem, completions, the approximation lemma, andthe correspondence theorem Presenting the correspondence theorem carefully isimportant, as this allows one the luxury of merely highlighting some proofs inChapter 5 [The minimal monotone class theorem of Section 1.1, claim 8 of theCarath´edory extension theorem proof, and most of what follows the approximationlemma in Section 1.2 would never be presented in my lectures.] {I always assign Ex-
ercises 1.1.1 (generators), 1.2.1 (completions), and 1.2.3 (the approximation lemma).Other exercises are assigned, but they vary each time.}
Chapter 2 (measurable functions and convergence; 4 lectures)
I present most of Sections 2.1, 2.2, and 2.3 Highlights are preservation of σ-fields,
measurability of both common functions and limits of simple functions, inducedmeasures, convergence and divergence sets (especially), and relating → µ to → a.s
(especially, reducing the first to the second by going to subsequences) I then assignSection 2.4 as outside reading and Section 2.5 for exploring [I never lecture oneither Section 2.4 or 2.5.] {I always assign Exercises 2.2.1 (specific σ-fields), 2.3.1
(concerning → a.e.), 2.3.3 (a substantial proof), and 2.4.1 (Slutsky’s theorem).} Chapter 3 (integration; 7 lectures)
This is an important chapter I present all of Sections 3.1 and 3.2 carefully, but
Section 3.3 is left as reading, and some of the Section 3.4 inequalities (C r, H¨older,Liapunov, Markov, and Jensen) are done carefully I do Section 3.5 carefully asfar as Vitali’s theorem, and then assign the rest as outside reading {I always
assign Exercises 3.2.1–3.2.2 (only the zero function), 3.3.3 (differentiating underthe integral sign), 3.5.1 (substantial theory), and 3.5.7 (the Scheff´e theorem).} Chapter 4 (Radon–Nikodym; 2 lectures)
I present ideas from Section 4.1, sketch the Jordan–Hahn decomposition proof, andthen give the proofs of the Lebesgue decomposition, the Radon–Nikodym theorem,and the change of variable theorem These final two topics are highlighted Thefundamental theorem of calculus of Section 4.4 is briefly discussed [I would neverpresent any of Section 4.3.] {I always assign Exercises 4.2.1 (manipulating Radon–
Nikodym derivatives), 4.2.7 (mathematically substantial), and 4.4.1, 4.4.2, and 4.4.4(so that the students must do some outside reading in Section 4.4 on their own).}
Trang 15xiv PROBABILITY FOR STATISTICIANS
Chapter 5 (Fubini; 2 lectures)
The first lecture covers Sections 5.1 and 5.2 Proving Proposition 5.2.1 is a must,and I discuss/prove Theorems 5.1.2 (product measure) and 5.1.3 (Fubini) Theremaining time is spent on Section 5.3 [I rarely lecture from Section 5.4, but I doassign it as outside reading.] {I always assign Exercises 5.3.1 (measurability in a
countable number of dimensions) and 5.4.1 (the finite-dimensional field).}
Chapter 6 (topology and Hilbert space, 0 lectures)
[This chapter is presented only for reference I do not lecture from it.]
The mathematical tools have now been developed In the next three chapters
we learn about some specialized probabilistic tools and then get a brief review ofelementary probability The presentation on the classic topics of probability theorythen commences in Chapter 10
Chapter 7 (distribution functions (dfs) and quantile functions (qfs); 4 lectures)
This chapter is quite important to this text Skorokhod’s theorem in Section 7.3must be done carefully, and the rest of Sections 7.1–7.4 should be covered Sec-tion 7.5 should be left as outside reading [Lecturing from Sections 7.6–7.11 ispurely optional, and I would not exceed one lecture.] {I always assign Exercises
7.1.1 (on continuity of dfs), 7.3.3 (F −1(·) is left continuous), 7.3.3 (change of
vari-able), and 7.4.2 (for practice working with X ≡ K(ξ)) Consider lecturing on
Theorem 7.6.1 (the infinite variance case).}
Chapter 8 (conditional expectation; 2 lectures)
The first lecture covers Sections 8.1 and 8.2 It highlights Proposition 8.1.1 (onthe preservation of independence), Theorem 8.1.2 (extending independence from
π-systems), and Kolmogorov’s 0-1 law. The other provides some discussion ofthe definition of conditional probability in Section 8.4, includes proofs of sev-eral parts of Theorem 8.4.1 (properties of conditional expectation), and discussesDefinition 8.5.1 of regular conditional probability [I never lecture on Sections 8.3,8.5, or 8.6.] {I always assign Exercises 8.1.2 and 8.1.3 (they provide routine practice
with the concepts), Exercise 8.4.1 (discrete conditional probability), Exercise 8.4.3(repeated stepwise smoothing in a particular example), and part of Exercise 8.4.4(proving additional parts of Theorem 8.4.1).}
Chapter 9 (elementary probability; 0 lectures)
Sections 9.1 and 9.2 were written to provide background reading for those graduatestudents in mathematics who lack an elementary probability background Sections9.3 and 9.4 allow graduate students in statistics to read some of the basic multi-variate results in appropriate matrix notation [I do not lecture from this chapter.]
{But I do assign Exercises 9.1.8 (the Poisson process exists) and 9.2.1(ii) (so that
the convolution formula is refreshed).}
Chapter 10 (laws of large numbers (LLNs) and inequalities; 3 lectures for now)
Since we are on the quarter system at the University of Washington, this leaves me
3 lectures to spend on the law of large numbers in Chapter 10 before the Christmasbreak at the end of the autumn quarter In the first 3 lectures I do Sections 10.1 and10.2 with Khinchine’s weak law of large numbers (WLLN), Kolmogorov’s inequalityonly from Section 10.3, and at this time I present Kolmogorov’s strong law of largenumbers (SLLN) only from Section 10.4 {I always assign Exercises 10.1.1 (Ces`aro
summability), 10.2.1 (it generates good ideas related to the proofs), 10.2.3 (as it
Trang 16practices the important O p(·) and o p(·) notation), 10.4.4 (the substantial result of
Marcinkiewicz and Zygmund), 10.4.7 (random sample size), and at least one of thealternative SLLN proofs contained in 10.4.8, 10.4.9, and 10.4.10.}
At this point at the beginning of the winter quarter the instructor will havehis/her own opinions about what to cover I devote the winter quarter to the weaklaw of large numbers (WLLN), an introduction to the law of the iterated logarithm(LIL), and various central limit theorems (CLTs) That is, the second term treatsthe material of Chapters 10–11 and 13–17 I will outline my choices for which parts
to cover
Chapter 10 (LLNs, inequalities, LIL, and series; 6 lectures)
My lectures cover Section 10.3 (symmetrization inequalities and L´evy’s inequalityfor the WLLN, and the Ottovani–Skorokhod inequality for series), Feller’s WLLNfrom Section 10.4, the Glivenko–Cantelli theorem from Section 10.5, the LIL fornormal rvs in Proposition 10.7.1, the strong Markov property of Theorem 10.8.1,and the two series Theorem 10.9.2 [I do not lecture from any of Sections 10.6,10.10, 10.11, or 10.12 at this time.] {I always assign Exercise 10.7.1 (Mills’ ratio).} Chapter 11 (CLTs via Stein’s method; 3 lectures)
From section 11.1 one can prove Stein’s first lemma and discuss his second lemma,prove the Berry–Esseen theorem, and prove Lindeberg’s CLT Note that we havenot yet introduced characteristic functions
Chapter 13 (characteristic functions (chfs); 6 lectures)
I do sections 13.1–13.5 {I always assign Exercises 13.1.1 and 13.1.3(a) (deriving
specific chfs) and 13.4.1 (Taylor series expansions of the chf).}
Chapter 14 (CLTs via chfs; 6 lectures)
The classical CLT, the Poisson limit theorem, and the multivariate CLT make anice lecture The chisquare goodness of fit example and/or the median example(of Section 11.6) make a lecture of illustrations Chf proofs of the usual CLTs aregiven in Section 14.2 (Section 13.5 could have been left until now) If Lindeberg’stheorem was proved in Chapter 11, one might do only Feller’s converse now via chfs.Other examples from either Section 14.2 or 11.6 could now be chosen, and Example11.6.4 (weighted sums of iid rvs) is my first choice [The chisquare goodness of fitexample could motivate a student to read from Sections 9.3 and 9.4.]
At this stage I still have at least 7 optional lectures at the end of the winterquarter and about 12 more at the start of the spring quarter In my final 16 lectures
of the spring quarter I feel it appropriate to consider Brownian motion in Chapter
12 and then martingales in Chapter 18 (in a fashion to be described below) Let mefirst describe some possibilities for the optional lectures, assuming that the abovecore was covered
Chapter 17 (U-statistics and Hoeffding’s combinatorial CLT)
Sections 17.1 and 17.2 are independent of each other The Berry–Esseen potential
of Lemma 11.1.1 is required for Section 17.1 Either one or two lectures could then
be presented on U-statistics from Section 17.1 The alternative Stein formulation ofMotivation 11.1.1 is required for section 17.2 Two additional lectures would givethe Hoeffding combinatorial CLT and its corollary regarding sampling from finitepopulations
Trang 17xvi PROBABILITY FOR STATISTICIANS
Chapter 11 (statistical examples)
Sections 11.6, 14.2, and 14.6 contain appropriate examples and exercises
Chapter 11 (bootstrap)
Both Sections 11.4 and 11.5 on the bootstrap require only Theorem 11.2.1
Chapters 11 and 19 (convergence in distribution)
Convergence in distribution on the line is presented in Sections 11.7 and 11.8 [This
is extended to metric spaces in Chapter 19, but I do not lecture from it.]
Chapter 11 (domain of normal attraction of the normal df)
The converse of the CLT in Theorem 11.3.2 requires the Gin´e–Zinn symmetrizationinequality and the Khinchine inequality of Section 13.3 and the Paley–Zygmundinequality of Section 3.4
Chapters 7, 10 and 11 (domain of attraction of the normal df)
Combining Sections 7.6–7.8, the Section 10.3 subsection on maximal inequalities
of another ilk, Section 10.6, and Sections 11.2–11.3 makes a nice unit L´evy’s
asymptotic normality condition (ANC) of (7.7.14) for a rv X has some prominence.
In Chapter 7 purely geometric methods plus Cauchy–Schwarz are used to derive amultitude of equivalent conditions In the process, quantile functions are carefullystudied In Section 10.6 the ANC is seen to be equivalent to a result akin to a
WLLN for the rv X2, and so in this context many additional equivalent conditionsare again derived Thus when one comes to the CLT in Sections 11.2 and 11.3, onealready knows a great deal about the ANC
Chapter 15 (infinitely divisible and stable laws)
First, Section 15.1 (infinitely divisible laws) is independent of the rest, includingSection 15.2 (stable laws) The theorem stated in Section 15.4 (domain of attraction
of stable laws) would require methods of Section 7.9 to prove, but the interestingexercises are accessible without this
Chapter 14 (higher-order approximations)
The local limit theorem in Section 14.3 can be done immediately for continuousdfs, but it also requires Section 13.6 for discrete dfs The expansions given inSections 14.4 (gamma approximation) and 14.5 (Edgeworth approximation) alsorequire Exercise 13.4.6
Assorted topics suitable for individual reading
These include Section 8.6 (on alternating conditional expectations), Section 10.12(a uniform SLLN), Section 16.4 (L-statistics), Sections 18.8–18.11 (counting processmartingales), and Section 18.12 (martingale CLTs)
The primary topics for the spring quarter are Chapter 12 (Brownian motionand elementary empirical processes) and Chapter 18 (martingales) I have nevercovered Chapter 12 until the spring, but I placed it rather early in the text to makeclear that it doesn’t depend on any of the later material
Chapter 12 (Brownian motion; 6 lectures)
I discuss Section 12.1, sketch the proof of Section 12.2 and carefully apply thatresult in Section 12.3, and treat Section 12.4 carefully (as I believe that at somepoint a lecture should be devoted to a few of the more subtle difficulties regardingmeasurability) I am a bit cavalier regarding Section 12.5 (strong Markov property),but I apply it carefully in Sections 12.6, 12.7, and 12.8 I assign Section 12.9 asoutside reading [I do not lecture on Theorem 12.8.2.] {I always assign Exercises
Trang 1812.1.2 (on (C, C)), 12.3.1 (various transforms of Brownian motion), 12.3.3 (integrals
of normal processes), 12.4.1 (properties of stopping times), 12.7.3(a) (related toembedding a rv in Brownian motion), and 12.8.2 (the LIL via embedding).}
At this point let me describe three additional optional topics that could now bepursued, based on the previous lectures from Chapter 12
Chapter 12 (elementary empirical processes)
Uniform empirical and quantile processes are considered in Section 12.10 forward applications to linear rank statistics and two-sample test of fit are included.One could either lecture from Section 12.12 (directly) or 12.11 (with a preliminarylecture from Sections 10.10–10.11, or leave these for assigned reading.)
Straight-Chapter 16 (trimmed means and/or simple linear rank statistics)
Both possibilities listed here require Section 12.10 as well as the quantile inequality
of (7.11.3) Asymptotic normality of linear rank statistics and a finite samplingCLT are derived in Section 16.2, and the bootstrap is presented in Section 16.3.The general CLT (Theorem 16.1.1) and asymptotic normality of trimmed means
(Theorem 16.1.2, but only present the β = 0 case) are derived in Section 16.1; this
will also require stating/proving the equivalence of (16.1.3) and (7.6.4), which isshown in Theorem 7.1.1
Chapter 18 (martingales; 10 lectures)
I cover most of the first seven sections {I always assign Exercises 18.1.4 (a counting
process martingale), 18.3.2 (a proof for continuous time mgs), 18.3.7, and 18.3.9 (on
L r-convergence).}
Appendix A (maximum likelihood estimation)
I see this as being of considerable interest in conjunction with statistical pursuits,rather than as a direct part of a course on probability theory
Trang 19Definition of Symbols
∼
= means “is distributed as”
≡ means “is defined to be”
a = b ⊕ c means that |a − b| ≤ c
U n=a V n means “asymptotically equal” in the sense that U n − V n → p0
X ∼ = (µ, σ2) means that X has mean µ and variance σ2
X ∼ = F (µ, σ2) means that X has df F with mean µ and variance σ2
¯
X n is the “sample mean” and ¨X n is the “sample median”
(Ω, A, µ) and (Ω, A, P ) denote a measure space and a probability space
σ[ C] denotes the σ-field generated by the class of sets C
F(X) denotes X −1( ¯B), for the Borel sets B and ¯ B ≡ σ[B, {+∞}, {−∞}]
ξ will always refer to a Uniform(0, 1) rv
means “nondecreasing” and ↑ means “strictly increasing”
1A(·) denotes the indicator function of the set A
“df” refers to a distribution function F ( ·)
“qf” refers to a quantile function K( ·) ≡ F −1(·)
The “tilde” symbol denotes ˜Winsorization
The “h´aˇcek” symbol denotes ˇTruncation
λ( ·) and λ n(·) will refer to Lebesgue measure on the line R and on R n
See page 119 for “dom(a, a )”
Brownian motionS, Brownian bridge U, and the Poisson process N
The empirical dfFn and the empirical dfGn of Uniform(0, 1) rvs
→ is associated with convergence in the LIL (see page 235)
“mg” refers to a martingale
“smg” refers to a submartingale
>
= means “≥” for a submartingale and “=” for a martingale
The symbol “=” is paired with “s-mg” in this context.>
Prominence
Important equations are labeled with numbers to give them prominence Thus,equations within a proof that are also important outside the context of that proofare numbered Though the typical equation in a proof is unworthy of a number, itmay be labeled with a letter to help with the “bookkeeping.” Likewise, digressions
or examples in the main body of the text may contain equations labeled with lettersthat decrease the prominence given to them
Integral signs and summation signs in important equations (or sufficiently plicated equations) are large, while those in less important equations are small It
com-is a matter of assigned prominence The most important theorems, definitions, andexamples have been given titles in boldface type to assign prominence to them.The titles of somewhat less important results are not in boldface type Routinereferences to theorem 10.4.1 or definition 7.3.1 do not contain capitalized initialletters The author very specifically wishes to downgrade the prominence given tothis routine use of these words Starting new sections on new pages allowed me tocarefully control the field of vision as the most important results were presented
Trang 20Motivation 1.1 (The Lebesgue integral) The Riemann integral of a continuous
function f , we will restrict attention to f (x) ≥ 0 for a ≤ x ≤ b for convenience, is
formed by subdividing the domain of f , forming approximating sums, and passing
to the limit Thus the mth Riemann sum forb
where a ≡ x m0 < x m1 < · · · < x mm ≡ b (with x m,i−1 ≤ x ∗
mi ≤ x mi for all i) satisfy
meshm ≡ max[x mi −x m,i−1]→ 0 Note that x mi −x m,i−1is the measure (or length)
of the interval [x m,i−1 , x mi ], while f (x ∗ mi ) approximates the values of f (x) for all
x m,i−1 ≤ x ≤ x mi (at least it does if f is continuous on [a, b]) Within the class C+
of all nonnegative continuous functions, this definition works reasonably well But
it has one major shortcoming The conclusionb
not even be well-defined
A different approach is needed (Note figure 1.1.)
The Lebesgue integral of a nonnegative function is formed by subdividing the
range Thus the mth Lebesgue sum forb
a f (x) dx is defined to be the limit of the LS m sums as m → ∞ For what
classM of functions f can this approach succeed? The members f of the class M
will need to be such that the measure (or length) of all sets of the form
1
Trang 212 CHAPTER 1 MEASURES
can be specified This approach leads to the concept of a σ-field A of subsets of [a, b]
that are measurable (that is, we must be able to assign to these intervals a numbercalled their “length”), and this leads to the concept of the classM of measurable
functions This class M of measurable functions will be seen to be closed under
passage to the limit and all the other operations that we are accustomed to forming on functions Moreover, the desirable propertyb
Trang 22Definition 1.1 (Set theory) Consider a nonvoid classA of subsets A of a nonvoid
set Ω (For us, Ω will be the sample space of an experiment.)
(a) Let A c denote the complement of A, let A ∪ B denote the union of A and B,
let A ∩ B and AB both denote the intersection, let A \ B ≡ AB c denote the set
difference, let A B ≡ (A c B ∪ AB c ) denote the symmetric difference, and let ∅
denote the empty set The class of all subsets of Ω will be denoted by 2Ω Sets A and B are called disjoint if AB = ∅, and sequences of sets A n or classes of sets A t
are called disjoint if all pairs are disjoint Writing A + B or∞
1 A nwill also denote
a union, but will imply the disjointness of the sets in the union As usual, A ⊂ B
denotes that A is a subset of B We call a sequence A n increasing (and we will
nearly always denote this fact by writing A n ) when A n ⊂ A n+1 for all n ≥ 1.
We call the sequence decreasing (denoted by A n ) when A n ⊃ A n+1 for all n ≥ 1.
We call the sequence monotone if it is either increasing or decreasing Let ω denote
a generic element of Ω We will use 1A(·) to denote the indicator function of A,
which equals 1 or 0 at ω according as ω ∈ A or ω ∈ A.
(b) A will be called a field if it is closed under complements and unions (That is,
A and B in A requires that A c and A ∪ B be in A.) [Note that both Ω and ∅ are
necessarily inA, as A was assumed to be nonvoid, with Ω = A ∪ A c and∅ = Ω c.](c)A will be called a σ-field if it is closed under complements and countable unions.
(That is, A, A1, A2, in A requires that A c and∪ ∞
1 A n be inA.)
(d)A will be called a monotone class provided it contains ∪ ∞
1 A n for all increasing
sequences A n inA and contains ∩ ∞
1 A n for all decreasing sequences A n inA.
(e) (Ω, A) will be called a measurable space provided A is a σ-field of subsets of Ω.
(f) A will be called a π-system provided AB is in A for all A and B in A; and A
will be called a ¯π-system when Ω in A is also guaranteed.
IfA is a field (or a σ-field), then it is closed under intersections (under countable
intersections); since AB = (A c ∪ B c)c (since ∩ ∞
1 A n = (∪ ∞
1 A c
n)c) Likewise, wecould have used “intersection” instead of “union” in our definitions by making use
of A ∪ B = (A c ∩ B c)c and∪ ∞
1 A n= (∩ ∞
1 A c
n)c
Proposition 1.1 (Closure under intersections)
(a) Arbitrary intersections of fields, σ-fields, or monotone classes are fields, σ-fields,
or monotone classes, respectively
[For example,∩{F α:F αis a field under consideration} is a field.]
(b) There exists a minimal field, σ-field, or monotone class generated by (or,
con-taining) any specified classC of subsets of Ω We call C the generators For example,
σ[ C] ≡{F α:F α is a σ-field of subsets of Ω for which C ⊂ F α }
(4)
is the minimal σ-field generated by C (that is, containing C)
(c) A collectionA of subsets of Ω is a σ-field if and only if it is both a field and a
1 B n ∈ A since the B n are in A and
are Everything else is even more trivial 2
Trang 23then µ is called a measure (or, equivalently, a countably additive measure) on (Ω, A).
The triple (Ω, A, µ) is then called a measure space We call µ finite if µ(Ω) < ∞.
We call µ σ-finite if there exists a measurable decomposition of Ω as Ω =∞
1 Ωn
with Ωn ∈ A and µ(Ω n ) < ∞ for all n.
[IfA is not a σ-field, we will still call µ a measure on (Ω, A), provided that (5)
holds for all sequences A n for which ∪ ∞
1 A n is inA We will not, however, use the
term “measure space” to describe such a triple We will consider below measures onfields, on certain ¯π-systems, and on some other collections of sets A useful property
of a collection of sets is that along with any sets A1, , A k it also include all sets
for all disjoint sequences A n inA for whichn
1A n is also in A.
Definition 1.3 (Outer measures) Consider a set function µ ∗: 2Ω→ [0, ∞].
(a) Suppose that µ ∗ satisfies the following three properties
Null: µ ∗ ∅) = 0.
Monotone: µ ∗ (A) ≤ µ ∗ (B) for all A ⊂ B.
Countable subadditivity: µ ∗ ∞
1 A n)≤∞1 µ ∗ (A n ) for all A n
Then µ ∗ is called an outer measure.
(b) An arbitrary subset A of Ω is called µ ∗ -measurable if
µ ∗ (T ) = µ ∗ (T A) + µ(T A c) for all subsets T ⊂ Ω.
(7)
Sets T used in this capacity are called test sets.
(c) We letA ∗ denote the class of all µ ∗ -measurable sets, that is,
A ∗ ≡ {A ∈ 2Ω: A is µ ∗-measurable}.
(8)
[Note that A ∈ A ∗ if and only if µ ∗ (T ) ≥ µ ∗ (T A) + µ ∗ (T A c ) for all T ⊂ Ω, since
the other inequality is trivial by the subadditivity of µ ∗.]
Motivation 1.2 (Measure) In this paragraph we will consider only one possible
measure µ, namely the Lebesgue-measure generalization of length Let C I denote
the set of all intervals of the types (a, b], ( −∞, b], and (a, +∞) on the real line R,
and for each of these intervals I we assign a measure value µ(I) equal to its length,
thus∞, b − a, ∞ in the three special cases All is well until we manipulate the sets
Trang 24inC I, as even the union of two elements in C I need not be in C I Thus,C I is not
a very rich collection of sets A natural extension is to letC F denote the collection
of all finite disjoint unions of sets inC I , where the measure µ(A) we assign to each such set A is just the sum of the measures (lengths) of all its disjoint pieces Now C F
is a field, and is thus closed under the elementary operations of union, intersection,and complementation Much can be done using onlyC F and letting “measure” bethe “exact length” But C F is not closed under passage to the limit, and it is thus
insufficient for many of our needs For this reason the concept of the smallest σ-field
containingC F, labeledB ≡ σ[C F], is introduced We callB the Borel sets But let
us work backwards Let us assign an outer measure value µ ∗ (A) to every subset A in
the class 2R of all subsets of the real line R In particular, to any subset A we assign the value µ ∗ (A) that is the infimum of all possible numbers∞
n=1 µ(A n), in which
each A n is in the fieldC F (so that we know its measure) and in which the A n’s form
a cover of A (in that A ⊂ ∪ ∞
1 A n) Thus each number∞
1 µ(A n) is a natural upper
bound to the measure (or generalized length) of the set A, and we will specify the infimum of such upper bounds to be the outer measure of A Thus to each subset
A of the real line we assign a value µ ∗ (A) of generalized length This value seems
“reasonable”, but does it “perform correctly”? Let us say that a particular set A is
µ ∗ -measurable (that is, it “performs correctly”) if µ ∗ (T ) = µ ∗ (T A) + µ ∗ (T A c) for
all subsets of the real line R, that is, if the A versus A c division of the line divides
every subset T of the line into two pieces in a fashion that is µ ∗-additive This isundoubtedly a combination of reasonableness and fine technicality that took sometime to evolve in the mind of its creator, Carath´eodory, while he searched for acondition that “worked” In what sense does it “work”? The collection A ∗ of all
µ ∗ -measurable sets turns out to be a σ-field Thus the collection A ∗ is closed under
all operations that we are likely to perform; and it is big enough, in that it is a
σ-field that contains C F Thus we will work with the restriction µ ∗ |A ∗ of µ ∗to the
sets of A ∗ (here, the vertical line means “restricted to”) This is enough to meet
our needs
There are many measures other than length For an and right-continuous
function F on the real line (called a generalized df) we define the Stieltjes measure
of an arbitrary interval (a, b] (with −∞ ≤ a < b ≤ ∞) in C I by µ F ((a, b]) =
F (b) − F (a), and we extend it to sets in C F by adding up the measure of the pieces
Reapplying the previous paragraph, we can extend µ F to the µ ∗ F-measurable sets It
is the important Carath´eodory extension theorem that will establish that all Stieltjes
measures (including the case of ordinary length, where F (x) = x as considered in
the first paragraph) can be extended from C F to the Borel sets B That is, all
Borel sets are µ ∗-measurable for every Stieltjes measure One further extension ispossible, in that every measure can be “completed” (see the end of section 1.2) We
note here only that when the Stieltjes measure µ F associated with the generalized df
F is “completed”, its domain of definition is extended from the Borel sets B (which
all Stieltjes measures have in common) to a larger collection ˆB µ F that depends on
the particular F It is left to section 1.2 to simply state that this is as far as we
can go That is, except in trivial special cases, we find that ˆB µ F is a proper subset
of 2R (That is, it is typically impossible to try to define the measure of all subsets
of Ω in a suitable fashion.) 2
Trang 256 CHAPTER 1 MEASURES
Example 1.1 (Some examples of measures, informally)
(a) Lebesgue measure:
Let λ(A) denote the length of A.
(b) Counting measure:
Let #(A) denote the number of “points” in A (or the cardinality of A).
(c) Unit point mass:
Let δ ω0(A) ≡ 1 {ω0} (A), assigning measure 1 or 0 as ω0∈ A or not 2
Example 1.2 (Borel sets)
(a) Let Ω = R and let C consist of all finite disjoint unions of intervals of the types
(a, b], ( −∞, b], and (a, +∞) Clearly, C is a field Then B ≡ σ[C] will be called the Borel sets (or the Borel subsets of R) Let µ(A) be defined to be the sum of the
lengths of the intervals composing A, for each A ∈ C Then µ is a c.a measure on
the field C, as will be seen in the proof of theorem 1.3.1 below.
(b) If (Ω, d) is a metric space and U ≡ {all d-open subsets of Ω}, then B ≡ σ[U]
will be called the Borel sets or the Borel σ-field
(c) If (Ω, d) is (R, | · |) for absolute value | · |, then σ[C] = σ[U] even though C = U.
[This claim is true, sinceC ⊂ σ[U] and U ⊂ σ[C] are clear Then, just make a trivial
appeal to exercise 1.1.]
(d) Let ¯R ≡ [−∞, = ∞] denote the extended real line and let ¯ B ≡ σ[B, {−∞}, {+∞}] 2
Proposition 1.2 (Monotone property of measures) Let (Ω, A, µ) denote a
measure space Let (A1, A2, ) be in A.
(a) If A n ⊂ A n+1 for all n, then
[Letting Ω denote the real line R, letting A n = [n, ∞), and letting µ denote either
Lebesgue measure or counting measure, we see the need for some requirement.]
(c) (Countable subadditivity) Whenever (A1, A2, ) and ∪ ∞
1 A n are all inA, then µ( ∞
1 A k)≤∞1 µ(A k) ;
and this also holds true for a measure on a field or on a ¯π-system.
Proof. (a) Now,
Trang 26(b) Without loss of generality, redefine A1= A2=· · · = A n0 Let B n ≡ A1\A n,
so that B n Thus, on the one hand,
1 Then these newly
defined sets B k are disjoint, and ∪ n
Let n → ∞, and use part (a) to get the result 2
Definition 1.4 (Continuity of measures) A measure µ is continuous from below
(above) if µ(lim A n ) = lim µ(A n ) for all A n (for all A n , with at least one µ(A n ) finite) We call µ continuous in case it is continuous both from below and
Trang 27giving the required countable additivity Suppose next that µ is finite and is also
continuous from above at∅ Then f.a (even if A is only a field) gives
Definition 1.5 (liminf and limsup of sets) Let
lim A n ≡∞ n=1∞ k=n A k ={ω : ω is in all but finitely many A n’s}
where we use i.o to abbreviate infinitely often.
[It is important to learn to read these two mathematical equations in a way thatmakes it clear that the verbal description is correct.] Note that we always have
lim A n ⊂ lim A n Define
lim A n ≡ lim A n whenever lim A n = lim A n
(14)
We also let lim inf A n ≡ lim A n and lim sup A n ≡ lim A n, giving us alternativenotations
Definition 1.6 (lim inf and lim sup of numbers) Recall that for real number
sequences a n one defines lim a n ≡ lim inf a n and lim a n ≡ lim sup a n by
lim infn →∞ a n ≡ lim n →∞(infk ≥n a k) = supn≥1(infk ≥n a k) andlim supn→∞ a n ≡ lim n→∞
Trang 28Definition 1.7 (“Little oh”, “big oh”, and “at most” ⊕) We write:
This last notation allows us to string inequalities together linearly, instead of having
to start a new inequality on a new line (I use it often.)
Proposition 1.4 Clearly, lim A n equals∪ ∞
1 A n when A n is an sequence, and
lim A n equals∩ ∞
1 A n when A n is a sequence.
Exercise 1.2 We always have µ(lim inf A n)≤ lim inf µ(A n), while the inequality
lim sup µ(A n)≤ µ(lim sup A n ) holds if µ(Ω) < ∞.
Exercise 1.3 (π-systems and λ-systems) A classD of subsets is called a λ-system
if it contains the space Ω and all proper differences (A \ B, for A ⊂ B with both
A, B ∈ D) and if it is closed under monotone increasing limits [Recall that a class
is called a π-system if it is closed under finite intersections, while ¯ π-systems are also
(c) LetC be a π-system and let D be a λ-system Then C ⊂ D implies that σ[C] ⊂ D.
Proposition 1.5 (Dynkin’s π-λ theorem) Let µ and µ be two measures on
the measurable space (Ω, A) Let C be a ¯π-system, where C ⊂ A Then
µ = µ on the ¯π-system C implies µ = µ on σ[ C].
(18)
Proof. We first show that
σ[ C] = λ[C] whenC is a π-system.
(19)
LetD ≡ λ[C] By the easy exercise 1.3(a)(b) it suffices to show that D is a π-system
(that is, that A, B ∈ D implies A ∩ B ∈ D) We first go just halfway; let
E ≡ {A ∈ D : AC ∈ D for all C ∈ C}.
(a)
Then C ⊂ E Also, for A, B ∈ E with B ⊂ A and for C ∈ C we have (since both
AC and BC are in D) that (A \ B)C = (AC \ BC) ∈ D, so that A \ B ∈ E Thus
E = D, since D was the smallest such class We have thus learned of D that
AC ∈ D for all C ∈ C, for each A ∈ D.
(b)
Trang 2910 CHAPTER 1 MEASURES
To go the rest of the way, we define
F ≡ {D ∈ D : AD ∈ D for all A ∈ D}.
(c)
ThenC ⊂ F, by (b) Also, for A, B ∈ F with B ⊂ A and for D ∈ D we have (since
both AD and BD are in D) that (A \ B)D = (AD \ BD) ∈ D, so that A \ B ∈ F.
ThusF = D, since D was the smallest such class We have thus learned of D that
AD ∈ D for all A ∈ D, for each D ∈ D.
(d)
That is,D is closed under intersections; and thus D is a π-system.
We will now demonstrate thatG ≡ {A ∈ A : µ(A) = µ (A) } is a λ-system on Ω.
First, Ω∈ G, since Ω is in the ¯π-system C Second, when A ⊂ B are both in G we
have the equality
µ(B \ A) = µ(B) − µ(A) = µ (B) − µ (A) = µ (B \ A),
(e)
giving B \ A ∈ G Finally, let A n A with all A n’s inG Then proposition 1.2(i)
yields the result
µ(A) = lim µ(A n ) = lim µ (A n ) = µ (A),
(f)
so that A ∈ G Thus G is a λ-system.
Thus the set G on which µ = µ is a λ-system that contains the ¯ π-system C.
Applying (19) shows that σ[ C] ⊂ G 2
The previous proposition is very useful in extending independence from smallclasses of sets to large ones The next proposition is used in proving the Carath´eodoryextension theorem, Fubini’s theorem, and the existence of a regular conditionalprobability distribution
Proposition 1.6 (Minimal monotone class; Halmos) The minimal monotoneclass M ≡ m[C] containing the field C and the minimal σ-field σ[C] generated
by the same fieldC satisfy
m[ C] = σ[C] whenC is a field.
(20)
Proof. Since σ-fields are monotone classes, we have that σ[ C] ⊃ M If we now
show thatM is a field, then proposition 1.1.1(c) will imply that σ[C] ⊂ M.
To show thatM is a field, it suffices to show that
A, B in M implies AB, A c B, AB c are inM.
(a)
Suppose that (a) has been established We will now show that (a) implies thatM
is a field
Complements: Let A ∈ M, and note that Ω ∈ M, since C ⊂ M Then A, Ω ∈ M
implies that A c = A cΩ∈ M by (a).
Trang 30Unions: Let A, B ∈ M Then A ∪ B = (A c ∩ B c)c ∈ M.
ThusM is indeed a field, provided that (a) is true It thus suffices to prove (a).
For each A ∈ M, let M A ≡ {B ∈ M : AB, A c B, AB c ∈ M} Note that it
suffices to prove that
M A=M for each fixed A ∈ M.
(b)
We first show that
M A is a monotone class
(c)
Let B n be monotone in M A , with limit set B Since B n is monotone in M A, it
is also monotone in M, and thus B ≡ lim n B n ∈ M Since B n ∈ M A, we have
AB n ∈ M, and since AB n is monotone inM, we have AB = lim n AB n ∈ M In
like fashion, A c B and AB c are in M Therefore, B ∈ M A, by definition of M A.That is, (c) holds
We next show that
M A=M for each fixed A ∈ C.
(d)
Let A ∈ C and let C ∈ C Then A ∈ M C, since C is a field But A ∈ M C if and
only if C ∈ M A, by the symmetry of the definition of M A Thus C ⊂ M A That
is, C ⊂ M A ⊂ M, and M A is a monotone class by (c) But M is the minimal
monotone class containingC, by the definition of M Thus (d) holds But in fact,
we shall now strengthen (d) to
M B =M for each fixed B ∈ M.
(e)
The conditions for membership inM imposed on pairs A, B are symmetric Thus
for A ∈ C, the statement established above in (d) that B ∈ M(= M A) is true if and
only if A ∈ M B ThusC ⊂ M B, whereM B is a monotone class ThusM B=M,
since (as was earlier noted)M is the smallest such monotone class Thus (e) (and
hence (a)) is established 2
Trang 3112 CHAPTER 1 MEASURES
Definition 2.1 (Outer extension) Let Ω be arbitrary Let µ be a measure on a
fieldC of subsets Ω For each A ∈ 2Ωdefine
µ ∗ (A) ≡ inf
∞ n=1
Theorem 2.1 (Carath´ eodory extension theorem) A measure µ on a field C
can be extended to a measure on the σ-field σ[ C] generated by C, by defining
µ(A) ≡ µ ∗ (A) for each A in A ≡ σ[C].
(2)
If µ is σ-finite on C, then the extension is unique on A and is also σ-finite.
Proof. The proof proceeds by a series of claims
Claim 1: µ ∗ is an outer measure on (Ω, 2Ω)
Null: Now, µ ∗ ∅) = 0, since ∅, ∅, is a covering of ∅.
Monotone: Let A ⊂ B Then every covering of B is also a covering of A Thus
µ ∗ (A) ≤ µ ∗ (B).
Countably subadditive: Let all A n n there
is a covering{A nk : k ≥ 1} such that
Claim 2: µ ∗ |C = µ (that is, µ ∗ (C) = µ(C) for all C ∈ C) and C ⊂ A ∗.
Let C ∈ C Then µ ∗ (C) ≤ µ(C), since C, ∅, ∅, is a covering of C For the other
direction, we let A1, A2, be any covering of C Since µ is c.a on C, and since
∪ ∞
1 (A n ∩ C) = C ∈ C, we have from proposition 1.1.2(c) that
µ(C) = µ( ∞
1 (A n ∩ C)) =∞1 µ(A n ∩ C) ≤∞1 µ(A n ), and thus µ(C) ≤ µ ∗ (C) Thus µ(C) = µ ∗ (C) We next show that any C ∈ C is
also in A ∗ Let C
{A n } ∞ ⊂ C of T such that
Trang 32µ ∗ ≥∞1 µ(A n) since µ ∗ (T ) is an infimum
(a)
=∞
1 µ(CA n) +∞
1 µ(C c A n)
since µ is c.a on C with C and A n inC
≥ µ ∗ (CT ) + µ ∗ (C c T ) since CA n covers CT and C c A n covers C c T.
(b)
∈ A ∗ ThusC ⊂ A ∗.
Claim 3: The classA ∗ of µ ∗-measurable subsets of Ω is a field that containsC.
Now, A ∈ A ∗ implies that A c ∈ A ∗ : The definition of µ ∗-measurable is symmetric
in A and A c And A, B ∈ A ∗ implies that AB ∈ A ∗ : For any test set T ⊂ Ω we
have the required inequality
Claim 4: µ ∗ is a f.a measure onA ∗.
Let A, B ∈ A ∗ be disjoint Finite additivity follows from
µ ∗ (A + B) = µ ∗ ((A + B)A) + µ ∗ ((A + B)A c)
since A ∈ A ∗ with test set A + B
= µ ∗ (A) + µ ∗ (B).
(d)
Trivially, µ ∗ (A) ≥ 0 for all sets A And µ ∗ ∅) = 0, since φ, φ, is a covering of φ.
Claim 5: A ∗ is a σ-field, and it contains σ[ C].
We will show that A ≡∞1 A n ∈ A ∗ whenever all A
Trang 33in its own right.
Claim 7: Uniqueness holds when µ is a finite measure.
Let µ1and µ2denote any two extensions of µ Let M ≡ {A ∈ σ[C] : µ1(A) = µ2(A) }
denote the class where they are equal We will first show that
M is a monotone class.
(h)
Let A n be monotone inM Then
µ1( lim A n ) = lim µ1(A n) by propositions 1.1.4 and 1.1.2
Thus (h) holds SinceC ⊂ M, the minimal monotone class result of proposition 1.1.6
implies that σ[ C] ⊂ M Thus µ1= µ2on σ[ C] (and possibly on even more sets than
this) Thus the claimed uniqueness holds [Appeal to proposition 1.1.6 could be
replaced by appeal to Dynkin’s π-λ theorem of proposition 1.1.5.]
Claim 8: Uniqueness holds when µ is a σ-finite measure (label the sets of the
measurable partition as Ωn)
We must again demonstrate the uniqueness Fix n We will consider µ, µ1, µ2onC,
on σ[ C] ∩ Ω n , and on σ[ C ∩ Ω n] We first show that
completing the proof 2
Question We extended our measure µ from the field C to a collection A ∗that is
at least as big as the σ-field σ[ C] Have we actually gone beyond σ[C]? Can we go
further?
Trang 34Definition 2.2 (Complete measures) Let (Ω, A, µ) denote a measure space.
If µ(A) = 0, then A is called a null set We call (Ω, A, µ) complete if whenever we
have B ⊂ (some A) ∈ A with µ(A) = 0, we necessarily also have B ∈ A [That is,
all subsets of sets of measure 0 are required to be measurable.]
Exercise 2.1 (Completion) Let (Ω, A, µ) denote a measure space Show that
for all A ∈ A and for all N ⊂ (some B) ∈ A having µ(B) = 0 Show that (Ω, ˆ A µ , ˆ
is a complete measure space for which ˆµ |A = µ [Note: A proof must include
a demonstration that definition (7) leads to a well-defined ˆµ That is, whenever
A1∪N1= A2∪N2we must have µ(A1) = µ(A2), so that ˆµ(A1∪N1) = ˆµ(A2∪N2).]
Definition 2.3 (Lebesgue sets) The completion of Legesgue measure on (R, B, λ)
is still called Lebesgue measure The resulting completed σ-field ˆ B λ of the BorelsetsB is called the Lebesgue sets.
Corollary 1 When we complete a measure µ on a σ-field A, this completed
measure ˆµ is the unique extension of µ to ˆ A µ [It is typical to denote the extension
by µ also (rather than ˆ µ).]
Corollary 2 Thus when we begin with a σ-finite measure µ on a field C, both
the extension to A ≡ σ[C] and the further extension to ˆ A µ ≡ ˆσ[C] µ are unique.Here, we note that all sets in ˆA µ = ˆσ[ C] µare in the classA ∗ of µ ∗-measurable sets.
Proof. Consider corollary 1 first Let ν denote any extension to ˆ A µ We willdemonstrate that
ν(A ∪ N) = µ(A) for all A ∈ A, and all null sets N
(a)
(that is, ν = ˆ µ) Assume not Then there exist sets A ∈ A and N ⊂ (some B) in A
with µ(B) = 0 such that ν(A ∪ N) > µ(A) [necessarily, ν(A ∪ N) ≥ ν(A) = µ(A)].
For this A and N we have
µ(A) < ν(A ∪ N) = ν(A ∪ (A c
N )) where A c N ⊂ A c
B = (a null set)
= ν(A) + ν(A c N ) ≤ ν(A) + ν(B)
(b)
since ν is a measure on the completion
= µ(A) + µ(B) since ν is an extension of µ.
(c)
Hence µ(B) > 0, which is a contradiction Thus the extension is unique.
Trang 3516 CHAPTER 1 MEASURES
We now turn to corollary 2 Only the final claim needs demonstrating Suppose
A is in ˆ σ[ C] µ Then A = A ∪ N for some A ∈ A and some N satisfying N ⊂ B
with µ(B) = 0 Since A ∗ is a σ-field, it suffices to show that any such N is in A ∗.
Since µ ∗ is subadditive and monotone, we have
µ ∗ (T ) ≤ µ ∗ (T N ) + µ ∗ (T N c ) = µ ∗ (T N c)≤ µ ∗ (T ),
(d)
because µ ∗ (T N ) = 0 follows from using B, ∅ ∅, to cover T N Thus equality holds
in this last equation, showing that N is µ ∗-measurable 2
Exercise 2.2 Let µ and ν be finite measures on (Ω, A).
(a) Show by example that ˆA µ and ˆA ν need not be equal
(b) Prove or disprove: ˆA µ= ˆA ν if and only if µ and ν have exactly the same sets
of measure zero
(c) Give an example of an LS-measure µ on R for which ˆ B µ= 2R
Exercise 2.3 (Approximation lemma; Halmos) Let the σ-finite measure µ
on the field C be extended to A = σ[C], and also refer to the extension as µ Then
for each A ∈ A (or in ˆ A µ ) such that µ(A) <
µ(A for some set C ∈ C.
(8)
[Hint Truncate the sum in (1.2.1) to define C, when A ∈ A.]
Definition 2.4 (Regular measures on metric spaces) Let d denote a metric on Ω,
letA denote the Borel sets, and let µ be a measure on (Ω, A) Suppose that for each
set A in ˆ A µ and a closed set C
for which both C ⊂ A ⊂ O and µ(O \ C ∞,
one then requires that the set C be compact Then µ is called a regular measure.
[Note exercise 1.3.1 below.]
Exercise 2.4 (Nonmeasurable sets) Let Ω consist of the sixteen values 1, , 16.
(Think of them arranged in four rows of four values.) Let
C1={1, 2, 3, 4, 5, 6, 7, 8}, C2={9, 10, 11, 12, 13, 14, 15, 16},
C3={1, 2, 5, 6, 9, 10, 13, 14}, C4={3, 4, 7, 8, 11, 12, 15, 16}.
LetC = {C1, C2, C3, C4}, and let A = σ[C].
(a) Show thatA ≡ σ[C] = 2Ω
(d) Illustrate proposition 2.2 below in the context of this exercise
Proposition 2.1 (Not all sets are measurable) Let µ be a measure on A ≡ σ[C],
withC a field If B ∈ ˆ A µ , then there are infinitely many measures on σ[ ˆ A µ ∪ {B}]
that agree with µ on C [Thus the σ-field ˆ A µ is as far as we can go with the uniqueextension process.] (We merely state this observation for reference, without proof.)
[To exhibit a subset of R not in B requires the axiom of choice.]
Trang 36Proposition 2.2 (Not all subsets are Lebesgue sets) There is a subset D of R
that is not in ˆB λ
Proof. Define the equivalence relation∼ on elements of [0, 1) by x ∼ y if x−y is a
rational number Use the axiom of choice to specify a set D that contains exactly one element from each equivalence class Now define D z ≡ {z + x (modulo 1) : x ∈ D}
for each rational z in [0, 1), so that [0, 1) =
z D z represents [0, 1) as a countable union of disjoint sets Moreover, all D z must have the same outer measure; call it a Assume D = D0 is measurable But then 1 = λ([0, 1)) =
Proposition 2.3 (Not all Lebesgue sets are Borel sets) There necessarily exists
a set A ∈ ˆ B λ \ B that is a Lebesgue set but not a Borel set.
Proof. This proof follows exercise 7.3.3 below 2
Exercise 2.5 Every subset A of Ω having µ ∗ (A) = 0 is a µ ∗-measurable set
Coverings
Earlier in this section we encountered Carath´eodory coverings
Exercise 2.6 (Vitali cover) (a) We say that a family V of intervals I is a Vitali cover of a set D if for each x
which x
(b) (Vitali covering theorem) Let D ⊂ R be arbitrary Let V be a Vitali cover of
D Then there exists a finite number of pairwise disjoint intervals (I1, , I m) inV
for which Lebesgue outer measure λ ∗ satisfies
λ ∗ (D \m
j=1 I j
(9)
[Lebesgue measure λ will be formally shown to exist in the next section, and λ ∗
will be discussed more fully.] [Result (9) will be useful in establishing the Lebesgue
result that increasing functions on R necessarily have a derivative, except perhaps
on a set having Lebesgue measure zero.]
Exercise 2.7 (Heine–Borel) If {U t : t ∈ T } is an arbitrary collection of open
sets that covers a compact subset D of R, then there exists a finite number of them U1, , U m that also covers D [We are merely stating this well-known and
frequently used result in the disguise of an exercise so that the reader can easilycontrast it with the two other ideas of Carath´eodory covering and Vitali coveringintroduced in this chapter.]
Trang 3718 CHAPTER 1 MEASURES
At the moment we know only a few measures informally We now construct thelarge class of measures that lies at the heart of probability theory
Definition 3.1 (Lebesgue–Stieltjes measure) A measure µ on the real line
R assigning finite values to finite intervals is called a Lebesgue–Stieltjes measure.
[The measure µ on (R, 2 R ) whose value µ(A) for any set A equals the number of rationals in A is not a Lebesgue–Stieltjes measure.]
Definition 3.2 (gdf) A finite function F on R that is right-continuous is
called a generalized df (to be abbreviated gdf) Then F −(·) ≡ lim y· F (y) denotes
the left-continuous version of F The mass function of F is defined by
∆F ( ·) ≡ F (·) − F −(·), while F (a, b] ≡ F (b) − F (a) for all a < b
is called the increment function of F We identify gdfs having the same increment function Only one member F of each equivalence class obtained by such identifi- cation satisfies F − (0) = 0, and this F can (and occasionally will) be used as the
representative member of the class (also to be called the representative gdf).
Example 3.1 We earlier defined three measures on (R, B) informally.
(a) For Lebesgue measure λ, a gdf is the identity function F (x) = x.
(b) For counting measure, a gdf is the greatest integer function F (x) = [x] (c) For unit point mass at x0, a gdf is F (x) = 1 [x0,∞) (x) 2
Theorem 3.1 (Correspondence theorem; Lo` eve) The relationship
µ((a, b]) = F (a, b] for all − ∞ ≤ a < b ≤ +∞
(1)
establishes a 1-to-1 correspondence between Lebesgue–Stieltjes measures µ on B
and the representative members of the equivalence classes of generalized dfs [Each
such µ extends uniquely to ˆ B µ.]
Notation 3.1 We formally establish some notation that will be used throughout.Important classes of sets include:
C I ≡ {all intervals (a, b], (−∞, b], or (a, +∞) : −∞ < a < b < +∞}.
Trang 38Proof. Given an LS-measure µ, define the increment function F (a, b] via (1).
We clearly have 0 ≤ F (a, b] < ∞ for all finite a, b, and F (a, b] → 0 as b a, by
proposition 1.1.2 Now specify F −(0)≡ 0, F (0) ≡ µ({0}), F (b) ≡ F (0) + F (0, b] for
b > 0, and F (a) = F (0) − F (a, 0] for a < 0 This F (·) is the representative gdf.
Given a representative gdf, we define µ on the collection I of all finite intervals
(a, b] via (1) We will now show that µ is a well-defined and c.a measure on this
collectionI.
Nonnegative: µ ≥ 0 for any (a, b], since F is .
Null: µ( ∅) = 0, since ∅ = (a, a] and F (a, a] = 0.
Countably additive and well-defined: Suppose I ≡ (a, b] =∞1 I n ≡∞1 (a n , b n]
We must show that µ(∞
1 I n) =∞
1 µ(I n)
First, we will show that∞
1 µ(I n)≤ µ(I) Fix n Thenn
Letting n → ∞ in (a) gives the first claim.
Next, we will show that µ(I) ≤∞1 µ(I n ) Suppose b
is trivial, as µ( ∅) = 0) For each n ≥ 1, use the right continuity of F to choose an
n > 0 so small that
F (b n , b n n n , and define J n ≡ (a n , c n)≡ (a n , b n n ).
(b)
These J n
through these intervals one at a time, choose (a1, c1) to contain b, choose (a2, c2) to
contain a1, choose (a3, c3) to contain a2, ; finally (for some K), choose (a K , c K)
n I n ∈ C F with each I n of type (a, b], then we define µ(A) ≡n µ(I n) If we
also have another representation A =
m I m of this set, then we must show (where
the subscripts m and n could take on either a finite or a countably infinite number
= µ(A).
(g)
Finally, a measure µ on C F determines a unique measure onB, as is guaranteed
by the Carath´eodory extension of theorem 1.2.1 2
Trang 3920 CHAPTER 1 MEASURES
Exercise 3.1 Show that all Lebesgue–Stieltjes measures on (R, B) are regular
measures
Probability Measures, Probability Spaces, and DFs
Definition 3.3 (Probability distributions P ( ·) and dfs F (·))
(a) In probability theory we think of Ω as the set of all possible outcomes of some
experiment, and we refer to it as the sample space The individual points ω in Ω are referred to as the elementary outcomes The measurable subsets A in the collection
A are referred to as events A measure of interest is now denoted by P ; it is called a probability measure, and must satisfy P (Ω) = 1 We refer to P (A) as the probability
of A, for each event A in ˆ A P The triple (Ω, A, P ) (or (Ω, ˆ A P , ˆ P ), if this is different)
is referred to as a probability space.
(b) An right-continuous function F on R having F (−∞) ≡ lim x→−∞ F (x) = 0
and F (+ ∞) ≡ lim x→+∞ F (x) = 1 is called a distribution function (which we will
abbreviate as df) [For probability measures, setting F ( −∞) = 0 is used to specify
the representative df.]
Corollary 1 (The correspondence theorem for dfs) Defining P ( ·) on all intervals
(a, b] via P ((a, b] ) ≡ F (b) − F (a) for all −∞ ≤ a < b ≤ +∞ establishes a 1-to-1
correspondence between all probability distributions P ( ·) on (R, B) and all dfs F (·)
on R.
Exercise 3.2 Prove the corollary
Trang 40Measurable Functions and
Convergence
Notation 1.1 (Inverse images) Suppose X denotes a function mapping some
set Ω into the extended real line ¯R ≡ R ∪ {±∞}; we denote this by X : Ω → ¯ R.
Let X+and X − denote the positive part and the negative part of X, respectively:
X+(ω) ≡
X(ω) if X(ω) ≥ 0,
0 else,(1)
We also use the following notation:
[ X = r ] ≡ X −1 (r) ≡ { ω : X(ω) = r } for all real r,
... informally.(a) For Lebesgue measure λ, a gdf is the identity function F (x) = x.
(b) For counting measure, a gdf is the greatest integer function F (x) = [x] (c) For. ..
measures
Probability Measures, Probability Spaces, and DFs
Definition 3.3 (Probability distributions P ( ·) and dfs F (·))
(a) In probability theory... of the way, we define
F ≡ {D ∈ D : AD ∈ D for all A ∈ D}.
(c)
ThenC ⊂ F, by (b) Also, for A, B ∈ F with B ⊂ A and for D ∈ D we have (since
both AD and