Markov chains operations research and FInancial engineering

The modern theory of discrete state-space Markov chains actually started in the1930s with the work well ahead of its time of Doeblin 1938, 1940, and most of thetheory classiﬁcation of st

Trang 1

Springer Series in Operations Research

and Financial Engineering

Randal Douc · Eric Moulines Pierre Priouret · Philippe Soulier Markov

Chains

Trang 2

Springer Series in Operations Research and Financial Engineering

Series editors

Thomas V Mikosch

Sidney I Resnick

Stephen M Robinson

Trang 4

Randal Douc • Eric Moulines

Markov Chains

123

Trang 5

Springer Series in Operations Research and Financial Engineering

https://doi.org/10.1007/978-3-319-97704-1

Library of Congress Control Number: 2018950197

Mathematics Subject Classiﬁcation (2010): 60J05, 60-02, 60B10, 60J10, 60J22, 60F05

This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part

of the material is concerned, speci ﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on micro ﬁlms or in any other physical way, and transmission

or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional af ﬁliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Trang 6

Markov chains are a class of stochastic processes very commonly used to modelrandom dynamical systems Applications of Markov chains can be found in manyﬁelds, from statistical physics to ﬁnancial time series Examples of successfulapplications abound Markov chains are routinely used in signal processing andcontrol theory Markov chains for storage and queueing models are at the heart ofmany operational research problems Markov chain Monte Carlo methods and alltheir derivatives play an essential role in computational statistics and Bayesianinference

The modern theory of discrete state-space Markov chains actually started in the1930s with the work well ahead of its time of Doeblin (1938, 1940), and most of thetheory (classiﬁcation of states, existence of an invariant probability, rates of con-vergence to equilibrium, etc.) was already known by the end of the 1950s Ofcourse, there have been many specialized developments of discrete-state-spaceMarkov chains since then, see for example Levin et al (2009), but these devel-opments are only taught in very specialized courses Many books cover the classicaltheory of discrete-state-space Markov chains, from the most theoretical to the mostpractical With few exceptions, they deal with almost the same concepts and differonly by the level of mathematical sophistication and the organization of the ideas.This book deals with the theory of Markov chains on general state spaces Thefoundations of general state-space Markov chains were laid in the 1940s, especiallyunder the impulse of the Russian school (Yinnik, Yaglom, et al.) A summary

of these early efforts can be found in Doob (1953) During the sixties and theseventies, some very signiﬁcant results were obtained such as the extension of thenotion of irreducibility, recurrence/transience classiﬁcation, the existence ofinvariant measures, and limit theorems The books by Orey (1971) and Foguel(1969) summarize these results

Neveu (1972) brought many signiﬁcant additions to the theory by introducingthe taboo potential a function instead of a set This approach is no longer widelyused today in applied probability and will not be developed in this book (see,however, Chapter4) The taboo potential approach was later expanded in the book

v

Trang 7

by Revuz (1975) This latter book contains much more and essentially summarizesall that was known in the mid seventies.

A breakthrough was achieved in the works of Nummelin (1978) and Athreyaand Ney (1978), which introduce the notion of the split chain and embeddedrenewal process These methods allow one to reduce the study to the case ofMarkov chains that possess an atom, that is, a set in which a regeneration occurs.The theory of such chains can be developed in complete analogy with discrete statespace The renewal approach leads to many important results, such as geometricergodicity of recurrent Markov chains (Nummelin and Tweedie 1978; Nummelinand Tuominen 1982, 1983) and limit theorems (central limit theorems, law ofiterated logarithms) This program was completed in the book Nummelin (1984),which contains a considerable number of results but is admittedly difﬁcult to read.This preface would be incomplete if we did not quote Meyn and Tweedie(1993b), referred to as the bible of Markov chains by P Glynn in his prologue tothe second edition of this book (Meyn and Tweedie 2009) Indeed, it must beacknowledged that this book has had a profound impact on the Markov chaincommunity and on the authors Three of us learned the theory of Markov chainsfrom Meyn and Tweedie (1993b), which has therefore shaped and biased ourunderstanding of this topic

Meyn and Tweedie (1993b) quickly became a classic in applied probability and

is praised by both theoretically inclined researchers and practitioners This bookoffers a self-contained introduction to general state-space Markov chains, based onthe split chain and embedded renewal techniques The book recognizes theimportance of Foster–Lyapunov drift criteria to assess recurrence or transience of aset and to obtain bounds for the return time or hitting time to a set It also provides,for positive Markov chains, necessary and sufﬁcient conditions for geometricconvergence to stationarity

The reason we thought it would be useful to write a new book is to survey some

of the developments made during the 25 years that have elapsed since the cation of Meyn and Tweedie (1993b) To save space while remainingself-contained, this also implied presenting the classical theory of generalstate-space Markov chains in a more concise way, eliminating some developmentsthat we thought are more peripheral

publi-Since the publication of Meyn and Tweedie (1993b), theﬁeld of Markov chainshas remained very active New applications have emerged such as Markov chainMonte Carlo (MCMC), which now plays a central role in computational statisticsand applied probability Theoretical development did not lag behind Triggered bythe advent of MCMC algorithms, the topic of quantitative bounds of convergencebecame a central issue Much progress has been achieved in thisﬁeld, using eithercoupling techniques or operator-theoretic methods This is one of the main themes

of several chapters of this book and still an active ﬁeld of research Meyn andTweedie (1993b) deals only with geometric ergodicity and the associated Foster–Lyapunov drift conditions Many works have been devoted to subgeometric rates ofconvergence to stationarity, following the pioneering paper of Tuominen andTweedie (1994), which appeared shortly after the ﬁrst version of Meyn and

Trang 8

Tweedie (1993b) These results were later sharpened in a series of works of Jarnerand Roberts (2002) and Douc et al (2004a), where a new drift condition wasintroduced There has also been substantial activity on sample paths, limit theorems,and concentration inequalities For example, Maxwell and Woodroofe (2000) andRio (2017) obtained conditions for the central limit theorems for additive functions

of Markov chains that are close to optimal

Meyn and Tweedie (1993b) considered exclusively irreducible Markov chainsand total variation convergence There are, of course, many practically importantsituations in which the irreducibility assumption fails to hold, whereas it is stillpossible to prove the existence of a unique stationary probability and convergence

to stationarity in distances weaker than the total variation This quickly became animportantﬁeld of research

Of course, there are signiﬁcant omissions in this book, which is already muchlonger than we initially thought it would be We do not cover large deviationstheory for additive functionals of Markov chains despite the recent advances made

in this field in the work of Balaji and Meyn (2000) and Kontoyiannis and Meyn(2005) Similarly, significant progress has been made in the theory of moderatedeviations for additive functionals of Markov chains in a series of Chen (1999),Guillin (2001), Djellout and Guillin (2001), and Chen and Guillin (2004) Theseefforts are not reported in this book We do not address the theory offluid limitintroduced in Dai (1995) and later refined in Dai and Meyn (1995), Dai and Weiss(1996) and Fort et al (2006), despite its importance in analyzing the stability ofMarkov chains and its success in analyzing storage systems (such as networks ofqueues) There are other significant omissions, and in many chapters we wereobliged sometimes to make difficult decisions

The book is divided into four parts In Part I, we give the foundations of Markovchain theory All the results presented in these chapters are very classical There aretwo highlights in this part: Kac’s construction of the invariant probability inChapter3 and the ergodic theorems in Chapter 5 (where we also present a shortproof of Birkhoff’s theorem)

In Part II, we present the core theory of irreducible Markov chains, which is asubset of Meyn and Tweedie (1993b) We use the regeneration approach to derivemost results Our presentation nevertheless differs from that of Meyn and Tweedie(1993b) Weﬁrst focus on the theory of atomic chains in Chapter6 We show thatthe atoms are either recurrent or transient, establish solidarity properties for atoms,and then discuss the existence of an invariant measure In Chapter7, we apply theseresults to discrete state spaces We would like to stress that this book can be readwithout any prior knowledge of discrete-state-space Markov chains: all the resultsare established as a special case of atomic chains In Chapter8, we present the keyelements of discrete time-renewal theory We use the results obtained fordiscrete-state-space Markov chains to provide a proof of Blackwell and Kendall’stheorems, which are central to discrete-time renewal theory As aﬁrst application,

we obtain a version of Harris’s theorem for atomic Markov chains (based on theﬁrst-entrance last-exit decomposition) as well as geometric and polynomial rates ofconvergence to stationarity

Trang 9

For Markov chains on general state spaces, the existence of an atom is more theexception than the rule The splitting method consists in extending the state space toconstruct a Markov chain that contains the original Markov chain (as its firstmarginal) and has an atom Such a construction requires that one havefirst definedsmall sets and petite sets, which are introduced in Chapter9 We have adopted a

deﬁnition of irreducibility that differs from the more common usage This avoidsthe delicate theorem of Jain and Jamison (1967) (which is, however, proved in theappendix of this chapter for completeness but is not used) and allows us to deﬁneirreducibility on arbitrary state spaces (whereas the classical assumption requiresthe use of a countably generatedr-algebra) In Chapter10, we discuss recurrence,Harris recurrence, and transience of general state-space Markov chains In Chapter

11, we present the splitting construction and show how the results obtained in theatomic framework can be translated for general state-space Markov chains The lastchapter of this part, Chapter12, deals with Markov chains on complete separablemetric spaces We introduce the notions of Feller, strong-Feller, andT-chains andshow how the notions of small and petite sets can be related in such cases tocompact sets This is a very short presentation of the theory of Feller chains, whichare treated in much greater detail in Meyn and Tweedie (1993b) and Borovkov(1998)

Thefirst two parts of the book can be used as a text for a one-semester course,providing the essence of the theory of Markov chains but avoiding difficult tech-nical developments The mathematical prerequisites are a course in probability,stochastic processes, and measure theory at no deeper level than, for instance,Billingsley (1986) and Taylor (1997) All the measure-theoretic results that we useare recalled in the appendix with precise references We also occasionally use someresults from martingale theory (mainly the martingale convergence theorem), whichare also recalled in the appendix Familiarity with Williams (1991) or thefirst threechapters of Neveu (1975) is therefore highly recommended We also occasionallyneed some topology and functional analysis results for which we mainly refer to thebooks Royden (1988) and Rudin (1987) Again, the results we use are recalled inthe appendix

Part III presents more advanced results for irreducible Markov chains In Chapter

13, we complement the results that we obtained in Chapter8 for atomic Markovchains In particular, we cover subgeometric rates of convergence The proofspresented in this chapter are partly original In Chapter14we discuss the geometricregularity of a Markov chain and obtain the equivalence of geometric regularitywith a Foster–Lyapunov drift condition We use these results to establish geometricrates of convergence in Chapter 15 We also establish necessary and sufﬁcientconditions for geometric ergodicity These results are already reported in Meyn andTweedie (2009) In Chapter16, we discuss subgeometric regularity and obtain theequivalence of subgeometric regularity with a family of drift conditions Most

of the arguments are taken from Tuominen and Tweedie (1994) We then discussthe more practical subgeometric drift conditions proposed in Douc et al (2004a),which are the counterpart of the Foster–Lyapunov conditions for geometric

Trang 10

regularity In Chapter17we discuss the subgeometric rate of convergence to tionarity, using the splitting method.

sta-In the last two chapters of this part, we reestablish the rates of convergence bytwo different types of methods that do not use the splitting technique

In Chapter18 we derive explicit geometric rates of convergence by means ofoperator-theoretic arguments and the ﬁxed-point theorem We introduce the uni-form Doeblin condition and show that it is equivalent to uniform ergodicity, that is,convergence to the invariant distribution at the same geometric rate from everypoint of the state space As a by-product, this result provides an alternative proof

of the existence of an invariant measure for an irreducible recurrent kernel that doesnot use the splitting construction We then prove nonuniform geometric rates ofconvergence by the operator method, using the ideas introduced in Hairer andMattingly (2011)

In the last chapter of this part, Chapter19, we discuss coupling methods thatallow us to easily obtain quantitative convergence results as well as short andelegant proofs of several important results We introduce different notions ofcoupling, starting almost from scratch: exact coupling, distributional coupling, andmaximal coupling This part owes much to the excellent treatises on couplingmethods Lindvall and (1979) and Thorisson (2000), which of course cover muchmore than this chapter We then show how exact coupling allows us to obtainexplicit rates of convergence in the geometric and subgeometric cases The use ofcoupling to obtain geometric rates was introduced in the pioneering work ofRosenthal (1995b) (some improvements were later supplied by Douc et al (2004b)

We also illustrate the use of the exact coupling method to derive subgeometric rates

of convergence; we follow here the work of Douc et al (2006, 2007) Although thecontent of this part is more advanced, part of it can be used in a graduate course onMarkov chains The presentation of the operator-theoretic approach of Hairer andMattingly (2011), which is both useful and simple, is of course a must I also think

it interesting to introduce the coupling methods, because they are both useful andelegant

In Part IV we focus especially on four topics The choice we made was a difﬁcultone, because there have been many new developments in Markov chain theory overthe last two decades There is, therefore, a great deal of arbitrariness in thesechoices and important omissions In Chapter20, we assume that the state space is acomplete separable metric space, but we no longer assume that the Markov chain isirreducible Since it is no longer possible to construct an embedded regenerativeprocess, the techniques of proof are completely different; the essential difference isthat convergence in total variation distance may no longer hold, and it must bereplaced by Wasserstein distances We recall the main properties of these distancesand in particular the duality theorem, which allows us to use coupling methods Wehave essentially followed Hairer et al (2011) in the geometric case and Butkovsky(2014) and Durmus et al (2016) for the subgeometric case However, the methods

of proof and some of the results appear to be original Chapter 21covers centrallimit theorems of additive functions of Markov chains The most direct approach is

to use a martingale decomposition (with a remainder term) of the additive

Trang 11

functionals by introducing solutions of the Poisson equation The approach isstraightforward, and Poisson solutions exist under minimal technical assumptions(see Glynn and Meyn 1996), yet this method does not yield conditions close tooptimal Aﬁrst approach to weaken these technical conditions was introduced inKipnis and Varadhan (1985) and further developed by Maxwell and Woodroofe(2000): it keeps the martingale decomposition with remainder but replaces Poisson

by resolvent solutions and uses tightness arguments It yields conditions that arecloser to being sufficient A second approach, due to Gordin and Lifšic (1978) andlater refined by many authors (see Rio 2017), uses another martingale decompo-sition and yields closely related (but nevertheless different) sets of conditions Wealso discuss different expressions for the asymptotic variance, following Häggströmand Rosenthal (2007) In Chapter22, we discuss the spectral property of a MarkovkernelP seen as an operator on an appropriately defined Banach space of complexfunctions and complex measures We study the convergence to the stationarydistribution using the particular structure of the spectrum of this operator; deepresults can be obtained when the Markov kernelP is reversible (i.e., self-adjoint), asshown, for example, in Roberts and Tweedie (2001) and Kontoyiannis and Meyn(2012) We also introduce the notion of conductance and prove geometric con-vergence using conductance thorough Cheeger’s inequalities, following Lawler andSokal (1988) and Jarner and Yuen (2004) Finally, in Chapter 23 we give anintroduction to sub-Gaussian concentration inequalities for Markov chains Wefirstshow how McDiarmid’s inequality can be extended to uniformly ergodic Markovkernels following Rio (2000a) We then discuss the equivalence betweenMcDiarmid-typesub-Gaussian concentration inequalities and geometric ergodicity,using a result established in Dedecker and Gouëzel (2015) We finally obtainextensions of these inequalities for separately Lipschitz functions, followingDjellout et al (2004) and Joulin and Ollivier (2010)

We have chosen to illustrate the main results with simple examples Moresubstantial examples are considered in the exercises at the end of each chapter; thesolutions of a majority of these exercises are provided The reader is invited usethese exercises (which are mostly fairly direct applications of the material) to testtheir understanding of the theory We have selected examples from differentﬁelds,including signal processing and automatic control, time-series analysis and Markovchain Monte Carlo simulation algorithms

We do not cite bibliographical references in the body of the chapters, but wehave added at the end of each chapter bibliographical indications We give precisebibliographical indications for the most recent developments For former results, we

do not necessarily seek to attribute authorship to the original results Meyn andTweedie (1993b) covers in much greater detail the genesis of the earlier works.The authors would like to thank the large number of people who at timescontributed to this book Alain Durmus, Gersende Fort, and François Roueff gave

us valuable advice and helped us to clarify some of the derivations Their butions were essential Christophe Andrieu, Gareth Roberts, Jeffrey Rosenthal andAlexander Veretennikov also deserve special thanks They have been a veryvaluable source of inspiration for years

Trang 12

contri-We also beneﬁted from the work of many colleagues who carefully reviewedparts of this book and helped us to correct errors and suggested improvements in thepresentation: Yves Atchadé, David Barrera, Nicolas Brosse, Arnaud Doucet,Sylvain Le Corff, Matthieu Lerasle, Jimmy Olsson, Christian Robert, ClaudeSaint-Cricq, and Amandine Schreck.

We are also very grateful to all the students who for years helped us to polishwhat was at the beginning a set of rough lecture notes Their questions and sug-gestions greatly helped us to improve the presentation and correct errors

13 14 15 16 17 18 19

20 21 22 23 24

Fig 1 Suggestion of playback order with respect to the different chapters of the book The red arrows correspond to a possible path for a reader eager to focus only on the most fundamental results The skipped chapters can then be investigated on a second reading The blue arrows provide a fast track for a proof of the existence of an invariant measure and geometric rates of convergence for irreducible chains without the splitting technique The chapters in the last part of the book are almost independent and can be read in any order.

Trang 13

Part I Foundations

1 Markov Chains: Basic Deﬁnitions 3

1.1 Markov Chains 3

1.2 Kernels 6

1.3 Homogeneous Markov Chains 12

1.4 Invariant Measures and Stationarity 16

1.5 Reversibility 18

1.6 Markov Kernels on Lp(p) 20

1.7 Exercises 21

1.8 Bibliographical Notes 25

2 Examples of Markov Chains 27

2.1 Random Iterative Functions 27

2.2 Observation-Driven Models 35

2.3 Markov Chain Monte Carlo Algorithms 38

2.4 Exercises 49

3 Stopping Times and the Strong Markov Property 53

3.1 The Canonical Chain 54

3.2 Stopping Times 58

3.3 The Strong Markov Property 60

3.4 First-Entrance, Last-Exit Decomposition 64

3.5 Accessible and Attractive Sets 66

3.6 Return Times and Invariant Measures 67

3.7 Exercises 73

xiii

Trang 14

4 Martingales, Harmonic Functions and Poisson–Dirichlet

Problems 75

4.1 Harmonic and Superharmonic Functions 75

4.2 The Potential Kernel 77

4.3 The Comparison Theorem 81

4.4 The Dirichlet and Poisson Problems 85

4.5 Time-Inhomogeneous Poisson–Dirichlet Problems 88

4.6 Exercises 89

5 Ergodic Theory for Markov Chains 97

5.1 Dynamical Systems 97

5.2 Markov Chain Ergodicity 104

5.3 Exercises 111

Part II Irreducible Chains: Basics 6 Atomic Chains 119

6.1 Atoms 119

6.2 Recurrence and Transience 121

6.3 Period of an Atom 126

6.4 Subinvariant and Invariant Measures 128

6.5 Independence of the Excursions 134

6.6 Ratio Limit Theorems 135

6.7 The Central Limit Theorem 137

6.8 Exercises 140

7 Markov Chains on a Discrete State Space 145

7.1 Irreducibility, Recurrence, and Transience 145

7.2 Invariant Measures, Positive and Null Recurrence 146

7.3 Communication 148

7.4 Period 150

7.5 Drift Conditions for Recurrence and Transience 151

7.6 Convergence to the Invariant Probability 154

7.7 Exercises 159

8 Convergence of Atomic Markov Chains 165

8.1 Discrete-Time Renewal Theory 165

8.2 Renewal Theory and Atomic Markov Chains 175

8.3 Coupling Inequalities for Atomic Markov Chains 180

8.4 Exercises 187

Trang 15

9 Small Sets, Irreducibility, and Aperiodicity 191

9.1 Small Sets 191

9.2 Irreducibility 194

9.3 Periodicity and Aperiodicity 201

9.4 Petite Sets 206

9.5 Exercises 211

9.A Proof of Theorem 9.2.6 215

10 Transience, Recurrence, and Harris Recurrence 221

10.1 Recurrence and Transience 221

10.2 Harris Recurrence 228

10.3 Exercises 236

11 Splitting Construction and Invariant Measures 241

11.1 The Splitting Construction 241

11.2 Existence of Invariant Measures 247

11.3 Convergence in Total Variation to the Stationary Distribution 251

11.4 Geometric Convergence in Total Variation Distance 253

11.5 Exercises 258

11.A Another Proof of the Convergence of Harris Recurrent Kernels 259

12 Feller andT-Kernels 265

12.1 Feller Kernels 265

12.2 T-Kernels 270

12.3 Existence of an Invariant Probability 274

12.4 Topological Recurrence 277

12.5 Exercises 279

12.A Linear Control Systems 285

Part III Irreducible Chains: Advanced Topics 13 Rates of Convergence for Atomic Markov Chains 289

13.1 Subgeometric Sequences 289

13.2 Coupling Inequalities for Atomic Markov Chains 291

13.3 Rates of Convergence in Total Variation Distance 303

13.4 Rates of Convergence inf -Norm 305

13.5 Exercises 311

Trang 16

14 Geometric Recurrence and Regularity 313

14.1 f -Geometric Recurrence and Drift Conditions 313

14.2 f -Geometric Regularity 321

14.3 f -Geometric Regularity of the Skeletons 327

14.4 f -Geometric Regularity of the Split Kernel 332

14.5 Exercises 334

15 Geometric Rates of Convergence 339

15.1 Geometric Ergodicity 339

15.2 V-Uniform Geometric Ergodicity 349

15.3 Uniform Ergodicity 353

15.4 Exercises 356

16 (f , r)-Recurrence and Regularity 361

16.1 (f , r)-Recurrence and Drift Conditions 361

16.2 (f , r)-Regularity 370

16.3 (f , r)-Regularity of the Skeletons 377

16.4 (f , r)-Regularity of the Split Kernel 381

16.5 Exercises 382

17 Subgeometric Rates of Convergence 385

17.1 (f , r)-Ergodicity 385

17.2 Drift Conditions 392

17.A Young Functions 399

18 Uniform andV-Geometric Ergodicity by Operator Methods 401

18.1 The Fixed-Point Theorem 401

18.2 Dobrushin Coefﬁcient and Uniform Ergodicity 403

18.3 V-Dobrushin Coefﬁcient 409

18.4 V-Uniformly Geometrically Ergodic Markov Kernel 412

18.5 Application of Uniform Ergodicity to the Existence of an Invariant Measure 415

18.6 Exercises 417

19 Coupling for Irreducible Kernels 421

19.1 Coupling 422

19.2 The Coupling Inequality 432

19.3 Distributional, Exact, and Maximal Coupling 435

19.4 A Coupling Proof ofV-Geometric Ergodicity 441

19.5 A Coupling Proof of Subgeometric Ergodicity 444

Trang 17

19.6 Exercises 449

Part IV Selected Topics 20 Convergence in the Wasserstein Distance 455

20.1 The Wasserstein Distance 456

20.2 Existence and Uniqueness of the Invariant Probability Measure 462

20.3 Uniform Convergence in the Wasserstein Distance 465

20.4 Nonuniform Geometric Convergence 471

20.5 Subgeometric Rates of Convergence for the Wasserstein Distance 476

20.6 Exercises 480

20.A Complements on the Wasserstein Distance 486

21 Central Limit Theorems 489

21.1 Preliminaries 490

21.2 The Poisson Equation 495

21.3 The Resolvent Equation 503

21.4 A Martingale Coboundary Decomposition 508

21.5 Exercises 517

21.A A Covariance Inequality 520

22 Spectral Theory 523

22.1 Spectrum 523

22.2 Geometric and Exponential Convergence in L2(p) 530

22.3 Lp(p)-Exponential Convergence 538

22.4 Cheeger’s Inequality 545

22.5 Variance Bounds for Additive Functionals and the Central Limit Theorem for Reversible Markov Chains 553

22.6 Exercises 560

22.A Operators on Banach and Hilbert Spaces 563

22.B Spectral Measure 572

23 Concentration Inequalities 575

23.1 Concentration Inequality for Independent Random Variables 576

23.2 Concentration Inequality for Uniformly Ergodic Markov Chains 581

23.3 Sub-Gaussian Concentration Inequalities forV-Geometrically Ergodic Markov Chains 587

Trang 18

23.4 Exponential Concentration Inequalities Under Wasserstein

Contraction 594

23.5 Exercises 599

Appendices 603

A Notations 605

B Topology, Measure and Probability 609

B.1 Topology 609

B.2 Measures 612

B.3 Probability 618

C Weak Convergence 625

C.1 Convergence on Locally Compact Metric Spaces 625

C.2 Tightness 626

D Total and V-Total Variation Distances 629

D.1 Signed Measures 629

D.2 Total Variation Distance 631

D.3 V-Total Variation 635

E Martingales 637

E.1 Generalized Positive Supermartingales 637

E.2 Martingales 638

E.3 Martingale Convergence Theorems 639

E.4 Central Limit Theorems 641

F Mixing Coefﬁcients 645

F.1 Deﬁnitions 645

F.2 Properties 646

F.3 Mixing Coefﬁcients of Markov Chains 653

G Solutions to Selected Exercises 657

References 733

Index 753

Trang 19

Foundations

Trang 20

Chapter 1

Markov Chains: Basic Definitions

Heuristically, a discrete-time stochastic process has the Markov property if the pastand future are independent given the present In this introductory chapter, we givethe formal definition of a Markov chain and of the main objects related to this type

of stochastic process and establish basic results In particular, we will introduce inSection 1.2the essential notion of a Markov kernel, which gives the distribution

of the next state given the current state In Section1.3, we will restrict attention totime-homogeneous Markov chains and establish that a fundamental consequence ofthe Markov property is that the entire distribution of a Markov chain is characterized

by the distribution of its initial state and a Markov kernel In Section1.4, we willintroduce the notion of invariant measures, which play a key role in the study of thelong-term behavior of a Markov chain Finally, in Sections1.5and1.6, which can beskipped on a first reading, we will introduce the notion of reversibility, which is veryconvenient and is satisfied by many Markov chains, and some further properties ofkernels seen as operators and certain spaces of functions

1.1 Markov Chains

Let(Ω,F ,P) be a probability space, (X,X ) a measurable space, and T a set A

family ofX-valued random variables indexed by T is called an X-valued stochastic process indexed by T

Throughout this chapter, we consider only the cases T = N and T = Z.

A filtration of a measurable space(Ω,F ) is an increasing sequence {F k , k ∈ T}

of sub-σ-fields ofF A filtered probability space (Ω, F ,{F k , k ∈ T},P) is a

probability space endowed with a filtration

A stochastic process{X k , k ∈ T} is said to be adapted to the filtration {F k , k ∈

T } if for each k ∈ T, X kisF k-measurable The notation{(X k ,F k ), k ∈ T} will be

used to indicate that the process{X k , k ∈ T} is adapted to the filtration {F k , k ∈ T}.

Theσ-fieldF k can be thought of as the information available at time k Requiring

R Douc et al., Markov Chains, Springer Series in Operations Research

and Financial Engineering, https://doi.org/10.1007/978-3-319-97704-1 1

3

Trang 21

the process to be adapted means that the probability of events related to X k can be

computed using solely the information available at time k.

The natural filtration of a stochastic process{X k , k ∈ T} defined on a probability

space(Ω,F ,P) is the filtration {F X

P(X k+1∈ A|F k ) = P(X k+1∈ A|X k) P − a.s. (1.1.1)

Condition (1.1.1) is equivalent to the following condition: for all f ∈ F+(X) ∪

Fb(X),

E [ f (X k+1)|F k ] = E [ f (X k+1)|X k] P − a.s. (1.1.2)Let {G k , k ∈ T} denote another filtration such that for all k ∈ T, G k ⊂ F k If

{(X k ,F k ), k ∈ T} is a Markov chain and {X k , k ∈ T} is adapted to the filtration {G k , k ∈ T}, then {(X k ,G k ), k ∈ T} is also a Markov chain In particular, a Markov

chain is always a Markov chain with respect to its natural filtration

We now give other characterizations of a Markov chain

Theorem 1.1.2 Let (Ω,F ,{F k , k ∈ T},P) be a filtered probability space and {(X k ,F k ), k ∈ T} an adapted stochastic process The following properties are

equivalent.

(i) {(X k ,F k ), k ∈ T} is a Markov chain.

(ii) For every k ∈ T and boundedσ(X j , j ≥ k)-measurable random variable Y,

E [Y|F k ] = E [Y|X k] P − a.s. (1.1.3)

(iii) For every k ∈ T, bounded σ(X j , j ≥ k)-measurable random variable Y, and bounded F X

k -measurable random variable Z,

E [YZ|X k ] = E [Y|X k ]E [Z|X k] P − a.s. (1.1.4)

Proof. (i)⇒(ii) Fix k ∈ T and consider the following property (where F b(X) isthe set of bounded measurable functions):

Trang 22

1.1 Markov Chains 5

(P n): (1.1.3) holds for all Y = ∏n

j=0g j (X k + j ), where g j ∈ F b (X) for all j ≥ 0 (P0) is true Assume that (P n ) holds and let {g j , j ∈ N} be a sequence of functions

inFb(X) The Markov property (1.1.2) yields

which proves(P n+1) Therefore, (P n ) is true for all n ∈ N.

Consider the set

H =Y ∈σ(X j , j ≥ k) : E [Y|F k ] = E [Y|X k ] P − a.s..

It is easily seen thatH is a vector space In addition, if {Y n , n ∈ N} is an increasing

sequence of nonnegative random variables inH and if Y = lim n→∞ Y nis bounded,then by the monotone convergence theorem for conditional expectations,

E [Y|F k] = limn→∞ E [Y n |F k] = limn→∞ E [Y n |X k ] = E [Y|X k] P − a.s.

By TheoremB.2.4, the spaceH contains allσ(X j , j ≥ k)-measurable random

vari-ables

(ii)⇒(iii) If Y is a boundedσ(X j , j ≥ k)-measurable random variable and Z is

a boundedF k-measurable random variable, an application of(ii)yields

E [YZ|F k ] = ZE [Y|F k ] = ZE [Y|X k] P − a.s.

Trang 23

This proves(i).

2

Heuristically, Condition (1.1.4) means that the future of a Markov chain is ditionally independent of its past, given its present state

con-An important caveat must be made; the Markov property is not hereditary If

{(X k ,F k ), k ∈ T} is a Markov chain on X and f is a measurable function from (X,X ) to (Y,Y ), then, unless f is one-to-one, {( f (X k ),F k ),k ∈ T} need not be a

Markov chain In particular, ifX = X1×X2is a product space and{(X k ,F k ), k ∈ T}

is a Markov chain with X k = (X1,k ,X2,k ) then the sequence {(X1,k ,F k ),k ∈ T} may

fail to be a Markov chain

1.2 Kernels

We now introduce Markov kernels, which will be the core of the theory

Definition 1.2.1 Let (X,X ) and (Y,Y ) be two measurable spaces A kernel N on

X × Y is a mapping N : X × Y → [0,∞] satisfying the following conditions:

(i) for every x ∈ X, the mapping N(x,·) : A → N(x, A) is a measure on Y ; (ii) for every A ∈ Y , the mapping N(·,A) : x → N(x,A) is a measurable function from (X,X ) to ([0,∞],B ([0,∞]).

• N is said to be bounded if sup x∈X N (x,Y) < ∞.

• N is called a Markov kernel if N(x,Y) = 1, for all x ∈ X.

• N is said to be sub-Markovian if N(x,Y) ≤ 1, for all x ∈ X.

Example 1.2.2 (Discrete state space kernel) Assume that X and Y are

count-able sets Each element x ∈ X is then called a state A kernel N on X × P(Y),

where P(Y) is the set of all subsets of Y, is a (possibly doubly infinite) matrix

N = (N(x,y) : x,y ∈ X×Y) with nonnegative entries Each row {N(x,y) : y ∈ Y} is

a measure on(Y,P(Y)) defined by

N (x,A) = ∑

y∈A

N (x,y) ,

for A ⊂ Y The matrix N is said to be Markovian if every row {N(x,y) : y ∈ Y} is a

probability on(Y,P(Y)), i.e., ∑ y∈Y N (x,y) = 1 for all x ∈ X The associated kernel

Example 1.2.3 (Measure seen as a kernel) A σ-finite measure ν on a space

(Y,Y ) can be seen as a kernel on X × Y by defining N(x,A) =ν(A) for all x ∈ X and A ∈ Y It is a Markov kernel ifνis a probability measure

Trang 24

is a kernel The function n is called the density of the kernel N with respect to the

measure λ The kernel N is Markovian if and only if

Yn (x,y)λ(dy) = 1 for all

Let N be a kernel on X×X and f ∈ F+(Y) A function N f : X → R+is defined

by setting, for x ∈ X,

N f (x) = N(x,dy) f (y) For all functions f ofF(Y) (where F(Y) stands for the set of measurable functions

on(Y,Y )) such that N f+and N f − are not both infinite, we define N f = N f+−

N f − We will also use the notation N (x, f ) for N f (x) and, for A ∈ X , N(x,1 A) or

N1A (x) for N(x,A).

Proposition 1.2.5 Let N be a kernel on X × Y For all f ∈ F+(Y), N f ∈

F+(X) Moreover, if N is a Markov kernel, then |N f |∞≤ | f |∞.

Proof Assume first that f is a simple nonnegative function, i.e., f = ∑i∈Iβi1B i

for a finite collection of nonnegative numbers βi and sets B i ∈ Y Then for x ∈

X, N f (x) = ∑ i∈Iβi N (x,B i), and by property (ii)of Definition 1.2.1, the function

N f is measurable Recall that every function f ∈ F+(X) is a pointwise limit of anincreasing sequence of measurable nonnegative simple functions{ f n , n ∈ N}, i.e.,

limn →∞ ↑ f n (y) = f (y) for all y ∈ Y Then by the monotone convergence theorem, for all x ∈ X,

N f (x) = lim n→∞ N f n (x) Therefore, N f is the pointwise limit of a sequence of nonnegative measurable functions, hence is measurable If, moreover, N is a Markov kernel on X × Y and

f ∈ F b (Y), then for all x ∈ X,

N f (x) =

Yf (y)N(x,dy) ≤ | f |∞

YN (x,dy) = | f |∞N (x,Y) = | f |∞.

With a slight abuse of notation, we will use the same symbol N for the kernel and the associated operator N :F+(Y) → F+(X), f → N f This operator is additive and positively homogeneous: for all f ,g ∈ F+(Y) andα∈ R+, one has N ( f + g) =

N f + Ng and N(αf) =αN f The monotone convergence theorem shows that if

Trang 25

{ f n , n ∈ N} ⊂ F+(Y) is an increasing sequence of functions, then limn →∞ ↑ N f n=

N(limn →∞ ↑ f n) The following result establishes a converse

Proposition 1.2.6 Let M :F+(Y) → F+(X) be an additive and positively

homogeneous operator such that lim n →∞ M ( f n ) = M(lim n →∞ f n ) for every

increasing sequence { f n , n ∈ N} of functions in F+(Y) Then

(i) the function N defined on X×Y by N(x,A) = M(1 A )(x), x ∈ X, A ∈ Y , is

a kernel;

(ii) M ( f ) = N f for all f ∈ F+(Y).

Proof (i) Since M is additive, for each x ∈ X, the function A → N(x,A) is

addi-tive Indeed, for n ∈ N ∗ and mutually disjoint sets A

1, ,A n ∈ Y , we obtain N

Let{A i , i ∈ N} ⊂ Y be a sequence of mutually disjoints sets Then, by additivity

and the monotone convergence property of M, we get, for all x ∈ X,

This proves that for all x ∈ X, A → N(x,A) is a measure on (Y,Y ) For all A ∈ X ,

x → N(x,A) = M(1 A )(x) belongs to F+(X) Then N is a kernel on X × Y

(ii) If f = ∑i ∈Iβi1B i for a finite collection of nonnegative numbersβi and sets

B i ∈ Y , then the additivity and positive homogeneity of M shows that

M ( f ) =∑i∈Iβi M(1B i) =∑i∈Iβi N1B i = N f Let now f ∈ F+(Y) (where F+(Y) is the set of measurable nonnegative functions)and let{ f n , n ∈ N} be an increasing sequence of nonnegative simple functions such

that limn→∞ f n (y) = f (y) for all y ∈ Y Since M( f ) = lim n→∞ M ( f n) and by the

monotone convergence theorem N f = limn→∞ N f n , we obtain M( f ) = N f

2

Kernels also act on measures Let μ∈ M+(X ), where M+(X ) is the set of

(nonnegative) measures on(X,X ) For A ∈ Y , define

μN (A) =

Xμ(dx) N(x, A)

Trang 26

1.2 Kernels 9

Proposition 1.2.7 Let N be a kernel on X × Y andμ∈ M+(X ) ThenμN ∈

M+(Y ) If N is a Markov kernel, thenμN(Y) =μ(X).

Proof Note first thatμN (A) ≥ 0 for all A ∈ Y andμN (/0) = 0, since N(x, /0) = 0 for all x ∈ X Therefore, it suffices to establish the countable additivity ofμN Let {A i , i ∈ N} ⊂ Y be a sequence of mutually disjoint sets For all x ∈ X, N(x,·)

is a measure on(Y,Y ); thus the countable additivity implies that N(x, ∞i=1A i) =

∑∞i=1N (x,A i ) Moreover, the function x → N(x,A i) is nonnegative and measurable

for all i ∈ N; thus the monotone convergence theorem yields

Proposition 1.2.8 (Composition of kernels) Let (X,X ), (Y,Y ), (Z,Z ) be

three measurable sets and let M and N be two kernels on X × Y and Y × Z

There exists a kernel on X × Z , called the composition or the product of M

and N, denoted by MN, such that for all x ∈ X, A ∈ Z and f ∈ F+(Z),

MN (x,A) =

Furthermore, the composition of kernels is associative.

Proof The kernels M and N define two additive and positively homogeneous

oper-ators on F+(X) Let ◦ denote the usual composition of operators Then M ◦ N

is positively homogeneous, and for every nondecreasing sequence of functions

{ f n , n ∈ N} in F+(Z), by the monotone convergence theorem, limn→∞ M ◦ N( f n) =limn→∞ M (N f n ) = M ◦N(lim n→∞ f n) Therefore, by Proposition1.2.6, there exists a

kernel, denoted by MN, such that for all x ∈ X and f ∈ F+(Z),

M ◦ N( f )(x) = M(N f )(x) = MN (x,dz) f (z) Hence for all x ∈ X and A ∈ Z , we get

Trang 27

In the case of a discrete state space X, a kernel N can be seen as a matrix with

nonnegative entries indexed by X Then the kth power of the kernel N k defined

in (1.2.3) is simply the kth power of the matrix N The Chapman–Kolmogorov tion becomes, for all x ,y ∈ X,

equa-N n +k (x,y) =∑

z∈X

N n (x,z)N k (z,y) (1.2.5)

1.2.2 Tensor Products of Kernels

Proposition 1.2.9 Let (X,X ), (Y,Y ), and (Z,Z ) be three measurable spaces,

and let M be a kernel on X × Y and N a kernel on Y × Z Then there exists

a kernel on X × (Y ⊗ Z ), called the tensor product of M and N, denoted by

M ⊗ N, such that for all f ∈ F+(Y × Z,Y ⊗ Z ),

M ⊗ N f (x) =

YM (x,dy)

Zf (y,z)N(y,dz) (1.2.6)

• If the kernels M and N are both bounded, then M ⊗N is a bounded kernel.

• If M and N are both Markov kernels, then M ⊗ N is a Markov kernel.

• If (U,U ) is a measurable space and P is a kernel on Z × U , then (M ⊗

N ) ⊗ P = M ⊗ (N ⊗ P); i.e., the tensor product of kernels is associative.

Proof Define the mapping I :F+(Y ⊗ Z) → F+(X) by

I f (x) =

YM (x,dy)

Zf (y,z)N(y,dz) The mapping I is additive and positively homogeneous Since I[lim n→∞ f n] =limn→∞ I ( f n ) for every increasing sequence { f n , n ∈ N}, by the monotone conver-

Trang 28

1.2 Kernels 11gence theorem, Proposition1.2.6shows that (1.2.6) defines a kernel onX × (Y ⊗

Z ) The proofs of the other properties are left as exercises 2

For n ≥ 1, the nth tensorial power P ⊗n of a kernel P on X × Y is the kernel on

(X,X ⊗n) defined by

P ⊗n f (x) =

Xn f (x1, ,x n )P(x,dx1)P(x1,dx2)···P(x n−1 ,dx n ) (1.2.7)

Ifνis aσ-finite measure on(X,X ) and N is a kernel on X × Y , then we can also

define the tensor product of ν and N, denoted byν⊗ N, which is a measure on

(X × Y,X ⊗ Y ) defined by

ν⊗ N(A × B) =

1.2.3 Sampled Kernel, m-Skeleton, and Resolvent

Definition 1.2.10 (Sampled kernel, m-skeleton, resolvent kernel) Let a be a

prob-ability on N, that is, a sequence {a(n), n ∈ N} such that a(n) ≥ 0 for all n ∈ N and

∑∞k=0a (k) = 1 Let P be a Markov kernel on X × X The sampled kernel K a is defined by

K a=∑∞

n=0

(i) For m ∈ N ∗ and a=δm , Kδm = P m is called the m-skeleton.

(ii) Ifε∈ (0,1) and aεis the geometric distribution, i.e.,

aε(n) = (1 −ε)εn , n ∈ N , (1.2.10)

then K aε is called the resolvent kernel.

Let{a(n), n ∈ N} and {b(n), n ∈ N} be two sequences of real numbers We denote

by{a∗b(n), n ∈ N} the convolution of the sequences a and b defined, for n ∈ N, by

a ∗ b(n) = ∑n

k=0

a (k)b(n − k)

Lemma 1.2.11 If a and b are probabilities on N, then the sampled kernels K a and

K b satisfy the generalized Chapman–Kolmogorov equation

Trang 29

Proof Applying the definition of the sampled kernel and the Chapman–Kolmogorov

equation (1.2.4) yields (note that all the terms in the sum below are nonnegative)

We can now define the main object of this book Let T = N or T = Z.

Definition 1.3.1 (Homogeneous Markov chain) Let (X,X ) be a measurable

space and let P be a Markov kernel on X × X Let (Ω,F ,{F k , k ∈ T},P) be a filtered probability space An adapted stochastic process {(X k ,F k ), k ∈ T} is called

a homogeneous Markov chain with kernel P if for all A ∈ X and k ∈ T,

P(X k+1∈ A|F k ) = P(X k ,A) P − a.s. (1.3.1)

If T = N, the distribution of X0is called the initial distribution.

Remark 1.3.2 Condition (1.3.1) is equivalent to E [ f (X k+1)|F k ] = P f (X k ) P −

a.s for all f ∈ F+(X) ∪ F b(X)

Remark 1.3.3 Let {(X k ,F k ), k ∈ T} be a homogeneous Markov chain Then

{(X k ,F X

k ), k ∈ T} is also a homogeneous Markov chain Unless specified

other-wise, we will always consider the natural filtration, and we will simply write that

{X k , k ∈ T} is a homogeneous Markov chain.

From now on, unless otherwise specified, we will consider T= N The most tant property of a Markov chain is that its finite-dimensional distributions areentirely determined by the initial distribution and its kernel

impor-Theorem 1.3.4 Let P be a Markov kernel on X × X , andν a probability sure on (X,X ) An X-valued stochastic process {X , k ∈ N} is a homogeneous

Trang 30

mea-1.3 Homogeneous Markov Chains 13

Markov chain with kernel P and initial distributionνif and only if the distribution

of (X0, ,X k ) isν⊗ P ⊗k for all k ∈ N.

Proof Fix k ≥ 0 Let H kbe the subspaceFb(Xk+1,X ⊗(k+1)) of measurable

func-tions f such that

E[ f (X0, ,X k)] =ν⊗ P ⊗k ( f ) (1.3.2)Let{ f n , n ∈ N} be an increasing sequence of nonnegative functions in H ksuch thatlimn→∞ f n = f with f bounded By the monotone convergence theorem, f belongs

toH k By TheoremB.2.4, the proof will be concluded if we moreover check that

H kcontains the functions of the form

tion and the direct part of the proof

Conversely, assume that (1.3.2) holds This obviously implies thatν is the

dis-tribution of X0 We must prove that for each k ≥ 1, f ∈ F+(X), and each F X

k−1measurable random variable Y ,

-E[ f (X k )Y] = E[P f (X k −1 )Y] (1.3.4)LetG kbe the set ofF X

k −1 -measurable random variables Y satisfying (1.3.4) Then

G k is a vector space, and if{Y n , n ∈ N} is an increasing sequence of nonnegative

random variables such that Y = limn→∞ Y n is bounded, then Y ∈ G k by the tone convergence theorem The property (1.3.2) implies (1.3.4) for Y= ∏k−1

mono-i=0 f i (X i),

where for j ≥ 0, f j ∈ F b(X) The proof is concluded as previously by applying

Trang 31

Corollary 1.3.5 Let P be a Markov kernel on X × X and letν be a bility measure on (X,X ) Let {X k , k ∈ N} be a homogeneous Markov chain

proba-on X with kernel P and initial distributionν Then for all n,k ≥ 0, the bution of (X n , ,X n +k ) isνP n ⊗ P ⊗k , and for all n,m,k ≥ 0 and all bounded measurable functions f defined onXk ,

sequence, i.e., X k+1= f (X k ,Z k+1), where {Z k , k ∈ N} is a sequence of i.i.d random

variables with values in a measurable space(Z,Z ), X0is independent of{Z k , k ∈

N}, and f is a measurable function from (X × Z,X ⊗ Z ) into (X,X ).

This can be easily proved for a real-valued Markov chain{X k , k ∈ N} with initial

distributionν and Markov kernel P Let X be a real-valued random variable and let F(x) = P(X ≤ x) be the cumulative distribution function of X Let F −1 be thequantile function, defined as the generalized inverse of F by

F −1 (u) = inf{x ∈ R : F(x) ≥ u} (1.3.5)

The right continuity of F implies that u ≤ F(x) ⇔ F −1 (u) ≤ x Therefore, if Z is

uniformly distributed on [0,1], then F −1 (Z) has the same distribution as X, since P(F −1 (Z) ≤ t) = P(Z ≤ F(t)) = F(t) = P(X ≤ t).

Define F0(t) =ν((−∞,t]) and g = F −1

0 Consider the function F from R × R to [0,1] defined by F(x,x ) = P(x,(−∞,x ]) Then for each x ∈ R, F(x,·) is a cumulative distribution function Let the associated quantile function f (x,·) be defined by

f (x,u) = infx ∈ R : F(x,x ) ≥ u. (1.3.6)The function(x,u) → f (x,u) is Borel measurable, since (x,x ) → F(x,x ) is itself a

Borel measurable function If Z is uniformly distributed on [0,1], then for all x ∈ R and A ∈ B(R), we obtain

P( f (x,Z) ∈ A) = P(x,A)

Let{Z k , k ∈ N} be a sequence of i.i.d random variables, uniformly distributed on

[0,1] Define a sequence of random variables {X k , k ∈ N} by X0= g(Z0), and for

k ≥ 0,

X k+1= f (X k ,Z k+1)

Trang 32

1.3 Homogeneous Markov Chains 15Then{X k , k ∈ N} is a Markov chain with Markov kernel P and initial distributionν.

We state without proof a general result for reference only, since it will not beneeded in the sequel

Theorem 1.3.6 Let (X,X ) be a measurable space and assume that X is countably

generated Let P be a Markov kernel on X×X and letνbe a probability on (X,X ).

Let {Z k , k ∈ N} be a sequence of i.i.d random variables uniformly distributed on

[0,1] There exist a measurable function g from ([0,1],B([0,1])) to (X,X ) and

a measurable function f from (X × [0,1],X ⊗ B([0,1])) to (X,X ) such that the

sequence {X k , k ∈ N} defined by X0= g(Z0) and X k+1= f (X k ,Z k+1) for k ≥ 0 is a

Markov chain with initial distributionνand Markov kernel P.

From now on, we shall deal almost exclusively with homogeneous Markovchains, and for simplicity, we shall omit mentioning “homogeneous” in the state-ments

Definition 1.3.7 (Markov chain of order p) Let p ≥ 1 be an integer Let (X,X ) be

a measurable space Let(Ω,F ,{F k , k ∈ N},P) be a filtered probability space An adapted stochastic process {(X k ,F k ), k ∈ N} is called a Markov chain of order p

if the process {(X k , ,X k +p−1 ),k ∈ N} is a Markov chain with values in X p

Let{X k , k ∈ N} be a Markov chain of order p ≥ 2 and let K pbe the kernel of thechain{X k , k ∈ N} with X k = (X k , ,X k +p−1), that is,

PX1∈ A1× ··· × A pX0= (x0, ,x p−1)

= K p ((x0, ,x p−1 ),A1× ··· × A p )

Since X0and X1have p − 1 common components, the kernel K phas a particular

form More precisely, defining the kernel K onXp × X by

We thus see that an equivalent definition of a homogeneous Markov chain of order p

is the existence of a kernel K onXp × X such that for all n ≥ 0,

E X n +p ∈ A F X

n +p−1 = K((X n , ,X n +p−1 ),A)

Trang 33

1.4 Invariant Measures and Stationarity

Definition 1.4.1 (Invariant measure) Let P be a Markov kernel on X × X

• A nonzero measureμis said to be subinvariant ifμisσ-finite andμP ≤μ.

• A nonzero measureμis said to be invariant if it isσ-finite andμP=μ.

• A nonzero signed measureμis said to be invariant ifμP=μ.

A Markov kernel P is said to be positive if it admits an invariant probability measure.

A Markov kernel may admit one or more than one invariant measure, or none if

X is not finite Consider the kernel P on N such that P(x,x + 1) = 1 Then P does

not admit an invariant measure Considered as a kernel onZ, P admits the counting measure as its unique invariant measure The kernel P on Z such that P(x,x+2) = 1

admits two invariant measures with disjoint supports: the counting measure on theeven integers and the counting measure on the odd integers

It must be noted that an invariant measure is σ-finite by definition Consider

again the kernel P defined by P(x,x + 1) = 1, now as a kernel on R The counting

measure onR satisfiesμP=μ, but it is notσ-finite We will provide in Section3.6

a criterion that ensures that a measureμthat satisfiesμ=μP isσ-finite

If an invariant measure is finite, it may be normalized to an invariant probabilitymeasure The fundamental role of an invariant probability measure is illustrated

by the following result Recall that a stochastic process{X k , k ∈ N} defined on a

probability space(Ω,F ,P) is said to be stationary if for all integers k, p ≥ 0, the

distribution of the random vector(X k , ,X k +p ) does not depend on k.

Theorem 1.4.2 Let(Ω,F ,{F k , k ∈ N},P) be a filtered probability space and let P

be a Markov kernel on a measurable space (X,X ).AMarkovchain{(X k ,F k ), k ∈ N}

defined on(Ω,F ,{F k , k ∈ N},P) with kernel P is a stationary process if and only if its initial distribution is invariant with respect to P.

Proof Let π denote the initial distribution If the chain {X k } is stationary, then

the marginal distribution is constant In particular, the distribution of X1 is equal

to the distribution of X0, which means precisely thatπP=π Thusπ is invariant.Conversely, ifπP=π, thenπP h=πfor all h ≥ 1 Then for all integers h and n, by

Corollary1.3.5, the distribution of(X h , ,X n +h) isπP h ⊗ P ⊗n=π⊗ P ⊗n. 2

For a finite signed measureξ on(X,X ), we denote byξ+andξ−the positive

and negative parts ofξ (see TheoremD.1.3) Recall thatξ+andξ−are two

mutu-ally singular measures such thatξ=ξ+−ξ− A set S such thatξ+(S c) =ξ− (S) = 0

is called a Jordan set forξ

Trang 34

1.4 Invariant Measures and Stationarity 17

Lemma 1.4.3 Let P be a Markov kernel andλ an invariant signed measure Then

Since P(x,X) = 1 for all x ∈ X, it follows that λ+ (X) =λ+(X) This and the

Definition 1.4.4 (Absorbing set) A set B ∈ X is called absorbing if P(x,B)=1 for all x ∈ B.

This definition subsumes that the empty set is absorbing Of course, the ing absorbing sets are nonempty

interest-Proposition 1.4.5 Let P be a Markov kernel on X×X admitting an invariant

probability measureπ If B ∈ X is an absorbing set, thenπB=π(B ∩ ·) is

an invariant finite measure Moreover, if the invariant probability measure is unique, thenπ(B) ∈ {0,1}.

Proof Let B be an absorbing set Using thatπB ≤π,πP=π, and B is absorbing,

we get that for all C ∈ X ,

πB P (C) =πB P (C ∩ B) +πB P (C ∩ B c ) ≤πP (C ∩ B) +πB P (B c) =π(C ∩ B) =πB (C) ReplacingC byC cand noting thatπB P(X) =πB (X) < ∞, it follows thatπBis an invari-

ant finite measure To complete the proof, assume that P has a unique invariant

prob-ability measure Ifπ(B) > 0, thenπB /π(B) is an invariant probability measure and is

therefore equal toπ SinceπB (B c) = 0, we getπ(B c) = 0 Thusπ(B) ∈ {0,1} 2

Theorem 1.4.6 Let P be a Markov kernel on X × X Then

(i) The set of invariant probability measures for P is a convex subset ofM+(X ).

(ii) Foreverytwodistinctinvariantprobabilitymeasuresπ,π forP,thefinitemeasures

(π−π)+and(π−π)− are nontrivial, mutually singular, and invariant for P.

Trang 35

Proof (i) P is an additive and positively homogeneous operator on M+(X ).

Therefore, ifπ,π are two invariant probability measures for P, then for every scalar

a ∈ [0,1], using first the linearity and then the invariance,

(aπ+ (1 − a)π )P = aπP + (1 − a)π P = aπ+ (1 − a)π

(ii) We apply Lemma1.4.3to the nonzero signed measureλ=π−π The

mea-sures(π−π)+and(π−π)−are singular, invariant, and nontrivial, since

(π−π)+(X) = (π−π)−(X) =1

2|π−π |(X) > 0

2

We will see in the forthcoming chapters that it is sometimes more convenient

to study one iterate P k of a Markov kernel than P itself However, if P kadmits an

invariant probability measure, then so does P.

Lemma 1.4.7 Let P be a Markov kernel For every k ≥ 1, P k admits an invariant probability measure if and only if P admits an invariant probability measure Proof Ifπ is invariant for P, then it is obviously invariant for P k for every k ≥ 1.

Conversely, if ˜π is invariant for P k, set π= k −1∑k−1 i=0π˜P i Thenπ is an invariant

probability measure for P Indeed, since ˜π= ˜πP k, we obtain

ξ⊗ P(A × B) =ξ⊗ P(B × A) , (1.5.1)

whereξ⊗ P is defined in (1.2.8).

Equivalently, reversibility means that for all bounded measurable functions f defined

on(X × X,X ⊗ X ),

Trang 36

1.5 Reversibility 19

X×Xξ(dx)P(x,dx ) f (x,x ) =

X×Xξ(dx)P(x,dx ) f (x ,x) (1.5.2)

If X is a countable state space, a (finite orσ-finite) measureξ is reversible with

respect to P if and only if for all (x,x ) ∈ X × X,

ξ(x)P(x,x ) =ξ(x )P(x ,x) , (1.5.3)

a condition often referred to as the detailed balance condition

If {X k , k ∈ N} is a Markov chain with kernel P and initial distributionξ, thereversibility condition (1.5.1) means precisely that(X0,X1) and (X1,X0) have the

same distribution, i.e., for all f ∈ F b (X × X,X ⊗ X ),

Eξ[ f (X0,X1)] = ξ(dx0)P(x0,dx1) f (x0,x1) (1.5.4)

= ξ(dx0)P(x0,dx1) f (x1,x0) = Eξ[ f (X1,X0)] This implies in particular that the distribution of X1is the same as that of X0, andthis means thatξ is P-invariant: reversibility implies invariance This property can

be extended to all finite-dimensional distributions

Proposition 1.5.2 Let P be a Markov kernel on X × X andξ ∈ M1(X ),

whereM1(X ) is the set of probability measures on X Ifξ is reversible with respect to P, then

(i)ξ is P-invariant;

(ii) the homogeneous Markov chain {X k , k ∈ N} with Markov kernel P and initial distribution ξ is reversible, i.e., for all n ∈ N, (X0, ,X n ) and (X n , ,X0) have the same distribution.

Proof. (i) Using (1.5.1) with A = X and B ∈ X , we get

ξP (B) =ξ⊗ P(X × B) =ξ⊗ P(B × X) =

ξ(dx)1 B (x)P(x,X) =ξ(B)

(ii) The proof is by induction For n= 1, (1.5.4) shows that(X0,X1) and (X1,X0)

have the same distribution Assume that such is the case for some n ≥ 1 By the

Markov property, X0and(X1, ,X n ) are conditionally independent given X1and

X n+1, and(X n , ,X0) are conditionally independent given X1 Moreover, by tionarity and reversibility,(X n+1,X n ) has the same distribution as (X0,X1), and bythe induction assumption, (X1, ,X n+1) and (X n , ,X0) have the same distribu-tion This proves that(X0, ,X n+1) and (X n+1, ,X0) have the same distribution

sta-2

Trang 37

1.6 Markov Kernels on Lp( π )

Let(X,X ) be a measurable space andπ∈ M1(X ) For p ∈ [1,∞) and f a

measur-able function on(X,X ), we set

for which f Lp(π)< ∞.

Remark 1.6.1 The maps ·Lp(π) are not norms but only seminorms, since

 f Lp(π)= 0 impliesπ( f = 0) = 1 but not f ≡ 0 Define the relation π by fπg

if and only ifπ( f = g) = 0 Then the quotient spaces L p(π)/ πare Banach spaces,but the elements of these spaces are no longer functions, but equivalence classes offunctions For the sake of simplicity, as is customary, this distinction will be tacitlyunderstood, and we will identify Lp(π) and its quotient by the relation πand treat

If f ∈ L p(π) and g ∈ L q(π), with 1/p + 1/q = 1, then f g ∈ L1(π), since byH¨older’s inequality,

 f gL1 (π)≤ f Lp(π)gLq(π) . (1.6.1)

Lemma 1.6.2 Let P be a Markov kernel on X × X that admits an invariant

proba-bility measureπ.

(i) Let f ,g ∈ F+(X) ∪ F b (X) If f = g π− a.e., then P f = Pg π− a.e.

(ii) Let p ∈ [1,∞) and f ∈ F+(X) ∪ F b (X) If f ∈ L p(π), then P f ∈ L p(π) and

P f Lp(π)≤ f Lp(π) Proof (i) Write N = {x ∈ X : f (x) = g(x)} By assumption,π(N) = 0, and since

Xπ(dx)P(x,N) =π(N) = 0, it is also the case that P(x,N) = 0 for all x in a

sub-setX0such thatπ(X0) = 1 Then for all x ∈ X0, we have

P (x,dy) f (y) =

N c P (x,dy) f (y) =

N c P (x,dy)g(y) = P (x,dy)g(y)

This proves(i)

(ii) Applying Jensen’s inequality and then Fubini’s theorem, we obtain

π(|P f | p) =

f (y)P(x,dy)

pπ(dx) ≤ | f (y)| p P (x,dy)π(dx) =π(| f | p )

2

Trang 38

1.7 Exercises 21

The next proposition then allows us to consider P as a bounded linear operator on

the spaces Lp(π), where p ∈ [1,∞].

Proposition 1.6.3 Let P be a Markov kernel on X × X with invariant

proba-bility measureπ For every p ∈ [1,∞], P can be extended to a bounded linear operator on L p(π), and

Proof For f ∈ L1(π), define

A f = {x ∈ X : P| f |(x) < ∞} =x ∈ X : f ∈ L1(P(x,·)). (1.6.3)Sinceπ(P| f |) =π(| f |) < ∞, we haveπ(A f ) = 1, and we may therefore define P f

on the whole spaceX by setting

space L1(π) It is easily seen that for all f ,g ∈ L1(π) and t ∈ R, P(t f ) = tP f ,

P ( f +g) = P f +Pg, and we have just shown that P f L1 (π)≤ f L1 (π)< ∞

There-fore, the relation (1.6.4) defines a bounded operator on the Banach space L1(π)

Let p ∈ [1,∞) and f ∈ L p ( f ) Then f ∈ L1(π), and thus we can define P f

Apply-ing Lemma1.6.2 (ii)to| f | proves that PLp(π)≤ 1 for p < ∞.

For f ∈ L∞(π), one has f L ∞ (π) = limp →∞ f Lp(π), and so P f L ∞ (π) ≤

 f L∞ (π), and thus it is also the case thatPL ∞ (π)≤ 1.

1.7 Exercises

1.1 Let(X,X ) be a measurable space,μaσ-finite measure, and n : X×X → R+a

nonnegative function For x ∈ X and A ∈ X , define N(x,A) =A n (x,y)μ(dy) Show that for for every k ∈ N ∗ , the kernel N khas a density with respect toμ

Trang 39

1.2 Let{Z n , n ∈ N} be an i.i.d sequence of random variables independent of X0.

Define recursively X n=φX n −1 + Z n

1 Show that{X n , n ∈ N} defines a time-homogeneous Markov chain.

2 Write its Markov kernel in the cases (i) Z1is a Bernoulli random variable withprobability of success 1/2 and (ii) the law of Z1has density q with respect to

the Lebesgue measure

3 Assume that Z1 is Gaussian with zero mean and variance σ2 and that X0 isGaussian with zero mean and varianceσ2

0 Compute the law of X k for every

k ∈ N Show that if |φ| < 1, then there exists at least an invariant probability.

1.3 Let(Ω,F ,P) be a probability space and {Z k , k ∈ N ∗ } an i.i.d sequence of

real-valued random variables defined on(Ω,F ,P) Let U be a real-valued random

variable independent of{Z k , k ∈ N} and consider the sequence defined recursively

by X0= U and for k ≥ 1, X k = X k−1 + Z k

1 Show that{X k , k ∈ N} is a homogeneous Markov chain.

Assume that the law of Z1has a density with respect to the Lebesgue measure

2 Show that the kernel of this Markov chain has a density

Consider now the sequence defined by Y0= U+and for k ≥ 1, Y k = (Y k −1 + Z k)+.

3 Show that{Y k , k ∈ N} is a Markov chain.

4 Write the associated kernel

1.4 In Section1.2.3, the sampled kernel was introduced We will see in this cise how this kernel is related to a Markov chain sampled at random time instants.Let (Ω0,F ,{F n , n ∈ N},P) be a filtered probability space and {(X n ,F n ), n ∈ N} a homogeneous Markov chain with Markov kernel P and initial distribution

exer-ν∈ M1(X ) Let (Ω1,G ,Q) be a probability space and {Z n , n ∈ N ∗ } a sequence

of independent and identically distributed (i.i.d.) integer-valued random variables

distributed according to a = {a(k), k ∈ N}, i.e., for every n ∈ N ∗ and k ∈ N,

Q(Z n = k) = a(k) Set S0= 0, and for n ≥ 1, define recursively S n = S n −1 + Z n.PutΩ=Ω0×Ω1,H = F ⊗ G , and for every n ∈ N,

H n=σ(A × {S j = k},A ∈ F k ,k ∈ N, j ≤ n)

1 Show that{H n , n ∈ N} is a filtration.

Put ¯P = P ⊗ Q and consider the filtered probability space (Ω,H ,{H n , n ∈ N}, ¯P),

whereH =∞

n=0H n For every n ∈ N, set Y n = X S n

2 Show that for every k ,n ∈ N, f ∈ F+(X) and A ∈ F k,

¯E[1A×{S n =k} f (Y n+1)] = ¯E1A×{S n =k} K a f (Y n),

where K is the sampled kernel defined in Definition1.2.10

Trang 40

1.7 Exercises 23

3 Show that{(Y n ,H n ), n ∈ N} is a homogeneous Markov chain with initial

dis-tributionνand transition kernel K a

1.5 Let(X,X ) be a measurable space,μ∈ M+(X ) aσ-finite measure, and p ∈

F+(X2,X ⊗2 ) a positive function (p(x,y) > 0 for all (x,y) ∈ X×X) such that for all

x ∈ X,Xp (x,y)μ(dy) = 1.

For all x ∈ X and A ∈ X , set P(x,A) =A p (x,y)μ(dy).

1 Let πbe an invariant probability measure Show that for all f ∈ F+(X),π( f ) =

Xf (y)q(y)μ(dy) with q(y) =Xp (x,y)π(dx).

2 Deduce that every invariant probability measure is equivalent toμ

3 Show that P admits at most an invariant probability [Hint: Use Theorem

1.4.6 (ii).]

1.6 Let P be a Markov kernel on X × X Letπ be an invariant probability and

X1⊂ X withπ(X1) = 1 We will show that there exists B ⊂ X1such thatπ(B) = 1 and P (x,B) = 1 for all x ∈ B (i.e., B is absorbing for P).

1 Show that there exists a decreasing sequence{X i ,i ≥ 1} of sets X i ∈ X such

thatπ(Xi ) = 1 for all i = 1,2, and P(x, X i ) = 1, for all x ∈ X i+1

2 Define B=∞

i=1Xi ∈ X Show that B is not empty.

3 Show that B is absorbing and conclude.

1.7 Consider a Markov chain whose state spaceX = (0,1) is the open unit interval.

If the chain is at x, then pick one of the two intervals (0, x) and (x, 1) with equal

probability 1/2 and move to a point y according to the uniform distribution on the

chosen interval Formally, let{U k , k ∈ N} be a sequence of i.i.d random variables

uniformly distributed on (0,1); let {εk , k ∈ N} be a sequence of i.i.d Bernoulli

random variables with probability of success 1/2, independent of {U k , k ∈ N}; and

let X0be independent of{(U k ,εk ), k ∈ N} with distributionξ on(0,1) Define the

density with respect to Lebesgue measure, which will be denoted by p.

2 Show that p must satisfy the following equation:

is reversible, i.e., for all n ∈ N, (X

, ,X n ) and (X... be tacitlyunderstood, and we will identify Lp(π) and its quotient by the relation πand treat

If f ∈ L p(π) and g ∈ L q(π),... time-homogeneous Markov chain.

2 Write its Markov kernel in the cases (i) Z1is a Bernoulli random variable withprobability of success 1/2 and (ii) the law

Định dạng
Số trang	758
Dung lượng	9,2 MB