Chapter 9: LIMIT THEOREMS

CHAPTER 9 Limit theorems 9.1 The early limit theorems The term ‘limit theorems’ refers to several theorems in probability theory under the generic names, ‘law of large numbers’ LLN

Trang 1

CHAPTER 9

Limit theorems

9.1 The early limit theorems

The term ‘limit theorems’ refers to several theorems in probability theory under the generic names, ‘law of large numbers’ (LLN) and ‘central limit theorem’ (CLT) These limit theorems constitute one of the most important and elegant chapters of probability theory and play a crucial role in statistical inference The origins of these theorems go back to the seventeenth-century result proved by James Bernoulli

Bernoulli's theorem

Let S,, be the number of occurrences of an event A in n independent trials of a random experiment & and p= P(A) is the probability of occurrence of A in each of the trials Then for any e>0

S

HH œ n

i.e the limit of the probability of the event |[(S,/n)—p]|<e

approaches one as the number of trials goes to infinity

Shortly after the publication of Bernoulli’s result De Moivre and Laplace

in their attempt to provide an easier way to calculate binomial probabilities proved that when [(S,,/n) — p] is multiplied by a factor equal to the inverse of its standard error the resulting quantity has a distribution which approaches the normal as n > x, i.e

K2p= 7 + exp{ —4u?! du I (9.2)

Trang 2

166 Limit theorems

These two results gave rise to a voluminous literature related to the various ramifications and extensions of the Bernoulli and De Moivre— Laplace theorems known today as ‘the’ LLN and ‘the’ CLT respectively The purpose of this chapter is to consider some of the extensions of the

Bernoulli and De Moivre—Laplace results In the discussion which follows

emphasis is placed on the intuitive understanding of the conclusions as well

as the crucial assumptions underlying the various limit theorems The discussion is semi-historical in a conscious attempt to motivate the various extensions and the weakening of the underlying assumptions giving rise to the results

The main conditions underlying the Bernoulli and De Moivre—Laplace results are the following:

(LT1) S,=>?., X;, that is, S, defined as the sum of n random variables

(r.v.’s)

(LT2) X;,;=1,if A occurs, and X,;=0, otherwise, i=1,2, ,n, i.e the X;s

are Bernoulli r.v.’s and hence S,, is a binomially distributed r.v (LT3) X,,X>, , X, are independent r.v.’s

(LT4) f(x,)=f(x.)=++-=f(x,), ie X,, X, ., X, are identically

distributed with Pr(X, = 1)=p, Pr(X,;=0)=1—p fori=1,2, ,n (LT5) E(S,/n)=p, i.e we consider the event of the difference between arw

and its expected value

The main difference between the Bernoulli and De Moivre—Laplace

theorems lies in their notion of convergence, the former referring to the convergence of the probability associated with the sequence of events

|[(S,/n)—p]|<e and the latter to the convergence of the probability

associated with a very specific sequence of events, that is, events of the form (Z <z) which define the distribution function F(z) In order to discriminate

between them we call the former ‘convergence in probability’ and the latter

‘convergence in distribution’

Definition 1

A sequence of r.v.’s {Y,,n> 1} is said to converge in probability to a

rv (or constant) Y if for every e>0

ñ>œ

P

We denote this with Y, — Y

Definition 2

A sequence of rv.’s {Y,,n>1} with distribution functions {F,(y), n> 1} is said to converge in distribution to a r.v Y with distribution

Trang 3

9.1 The early limit theorems 167 function F(y) if

D

at all points of continuity of F(y); denoted by Y, — Y

It should be emphasised that neither of the above types of convergence tells us anything about any convergence of the sequence {Y,} to Y in the sense used in mathematical analysis, such as for each ¢>0 and séS, there exists an N= N(e,s) such that

Both convergence types refer only to convergence of probabilities or

functions associated with probabilities On the other hand, the definition of

a r.v has nothing to do with probabilities and the above convergence of Y,

to Y on S is convergence of real valued functions defined on S The type of stochastic convergence which comes closer to the above mathematical convergence is known as ‘almost sure’ convergence

Definition 3

A sequence of r.v.’s {Y,, n> 1} converges to Y (ar.v ora constant) almost surely (or with probability one) if

Pr tim Y,= v= 1; denoted by Y, ~ Y, (9.6)

or, equivalently, if for any e>0

This is a much stronger mode of convergence than either convergence in probability or convergence in distribution For a more extensive discussion

of these modes of convergence and their interrelationships see Chapter 10

The limit theorems associated with convergence almost surely are

appropriately called ‘strong law of large numbers’ (SLLN) The term is used

to emphasise the distinction with the ‘weak law of large numbers’ (WLLN)

associated with convergence in probability

In the next section the law of large numbers is used as an example of the

developments the various limit theorems have undergone since Bernoulli For this reason the discussion is intentionally rather long in an attempt to

motivate a deeper understanding of the crucial assumptions giving rise to all the limit theorems considered in the sequel.

Trang 4

168 Limit theorems

9.2 The law of large numbers

qd) The weak law of large numbers (WLLN)

Early in the nineteenth century Poisson realised that the condition LT4 asserting identical distributions for X,, , X,, was not necessary for the result to go through

Poisson's theorem

Let {X,, n= 1} be a sequence of independent Bernoulli r.v.s with Pn(X,= 1)=p; and Pr(X,=0) = I—p,,¡= l,2, , n, then, for any c>0,

H 1

| Si, lim PT n 2 D;

The important breakthrough in relation to the WLLN was made by Chebyshev who realised that not only LT4 but LT2 was unnecessary for the result to follow That is, the fact that X,, , X,, were Bernoulli r.v.’s was not contributing to the result in any essential way What was crucially important was the fact that we considered the summation of n r.v.’s to form

S,= 7 ,X; and comparing it with its mean

Chebyshev’s theorem

Let {X,,,n 2 1} be a sequence of independent r.v.’s such that E(X;)= u;, Var(X;)=0?<c<a,i=1,2 ,n, then for any e>0,

1

H— % h

In order to see how these conditions ensure the result let us prove Chebyshev’s theorem

n

» X;- ¬x

Proof: Since the X,s are independent

1 H ] H

va YX } nã » of < n

Hút

Using Chebyshev’s mae for (1/n) ¥, X; we get

H ¿=1

since lim La lim dị

nớ VỆ n> 06

oF

=

2 Sn? ne

Š Xing >>

:)=0

Trang 5

9.2 The law of large numbers 169

Markov, a student of Chebyshev’s, noticed in the proof of Chebyshev’s theorem the fact that the X,, X,, , X, are ae etn played only a

minor role in enabling us to deduce that Var(S,)=(1/n7) )?_, 07 The

above proof goes through provided that (1/n7) vang n) 7 Oasn > x Since

nox

Var(S => Var(X,)+)_5, Cov(X,X,)† (9.10)

iAj

we need to assume that Var(); X;) is of smaller order of magnitude (see Chapter 10) than n? for the result to follow Hence LT3 is not a crucial condition

Markov’s theorem

Let {X,,, ne 1} be a sequence of r.v.s such that

|

1

lim "|

now H

Khinchin, a student of Markov's, realised that, in the case ofindependent

and identically distributed (IID) 1.v.’s, Markov’s condition was not a necessary condition In fact in the ITD case no restriction on the nature of the variances is needed

then

H I H

Khinchin’s theorem

Let {X,, n> 1} be a sequence of UD rwv.s, then the existence of E(X,)= for alli is sufficient to imply that for any e>0

1

lim Pr( | 3 X,=H

¿=1

Kolmogorov (1926) settled the issue by providing both necessary as well

as sufficient conditions for the WLLN

Kolmogorov’s theorem |

The sequence of r.vis {X,, 121} obev the WLLN if and only if

s-pa]

nh

+ There are n? terms and thus if all of them are bounded the Var(}""_, x;) is at least of the same order as n*

Trang 6

170 Limit theorems

(2) The strong law of large numbers (SLNN)

The first result relating to the almost sure convergence of S, for the Bernoulli distributed r.v.’s case was proved by Borel in 1909

Borel’s theorem

Let {X,,} be a sequence of 11D Bernoulli r.v.’s with Pr(X ;= 1) = p and

Pr(X;=0)=1—-—p for all i, then

S

nO n

In other words, the event defined by {s:lim, , [S,(s)]/n=p, s¢S}, has probability one; S being the sample space An equivalent way to express this 1S

m

lim Pr( ma

This brings out the relationship between the SLLN and the WLLN since the former refers to the simultaneous realisation of the inequalities and

<max

màn

This implies that '—>”='—'

Kolmogorov, by replacing the Markov condition

for the WLLN in the case of independent r.v.’s, with the stronger condition

|

k=1

proved the first SLLN for a general sequence of independent r.v.’s

Kolmogorov’s theorem 2

Let {X„, n> 1} be a sequence of independent r.v.’s such that E(X ;) and Var(X;) exist for all i=1, 2, ., then if they satisfy the condition (19) we can deduce that

Prim A ; [X)- BUX) }=0 =

n> a i=1

Trang 7

9.2 The law of large numbers 171

This SLLN is analogous to Chebyshev’s WLLN and in the same way we can prove it using an inequality The inequality used in this context is

Kolmogorov’s inequality: If X,, X2, , X,, are independent r.v.’s, such that Var(X,)=07<x,i=1,2, ,n, then for any e>0

1 n

Kolmogorov went on to prove that in the case where {X,, n21} is a

sequence of IID r.v.’s such that E(X;)< o then

“ Var(X,) 2 1 (*

k=1 k=1 —k

which implies that for such a sequence the existence of expectation is a

necessary as well as sufficient condition for the SLLN

Having argued that some of the conditions of the Bernoulli theorem did

not contribute (in any essential way) to the result, the question that arises naturally is, ‘what are the important elements giving rise to the “law of large numbers” (SLLN, WLLN)?” The Markov condition (18) for the

WLLN, and Kolmogorov’s condition (19) for the SLLN, hold the key to the

answer of this question It is clear from these two conditions that the most important ingredient is the restriction on the variance of the partial sums S,,, that is, we need the Var(S,) to increase at most as quickly as n More formally we need Var(S,) to be at most of order n and we write Var(S,)= O(n) In order to see this let us consider some of the cases discussed above

In the IID case if Var(X,)=o? for all i, then Var(S,,)=no? = O(n)

In the case of independent r.v.’s with

Var(X,)=ø°<øœ., i=1,2, , then Var(S,)= ) o7= O(n)

¡=1

(9.22)

Moreover, the Markov condition can be written as Var(S„)= o(n?) where small ‘o’ reads ‘of smaller order than’ achieves the same effect since

Var(S,,) = O(n) = Var(S,)=o(n?) (see Chapter 10) The Kolmogorov

condition is a more restrictive form of the Markov condition, requiring the variance of the partial sums to be uniformly of at most of order n This being the case, it becomes obvious that the conditions LT3 and LT4, assuming independence and identically distributed r.v.’s, are not fundamental ingredients Indeed, if we drop the identically distributed condition altogether and weaken independence to martingale orthogonality the above

limit theorems go through with minor modifications We say that a sequence of r.v.’s {X,, n> 1} is martingale orthogonal if E(X,,/o(X,-1, ,

Trang 8

172 Limit theorems

X,))=0,n> † It should come as no surprise to learn that both important tools in proving the WLLN and SLLN, the Chebyshev and Kolmogorov inequalities hold true for orthogonal r.v.’s This enables us to prove the WLLN and SLLN under much weaker conditions than the ones discussed above The most useful of these results are the ones related to martingales because they can be seen as direct extensions of the ‘independent’ case and the results are general enough to cover most types of dependencies we are interested in

(3) The law of large numbers for martingales

Let {S,, Zn EN} be a martingale such that £(S,)=9, for all n, and define

Y, =S, —S,—1, n2=1 (Sp=0) As discussed in Section 8.4, if S, defines a martingale with respect to &%, then by construction Y, defines an orthogonal process and thus, assuming a bounded variance for Y,, the above limit theorems can go through with minor modifications

WLLN for martingales

Let {X,,, n> 1} be a sequence of r.v.s with respect to the increasing

sequence of o-fields {%,,n>1} such that E(|X\)<o and

P(X, >x) <cP(|X|>x) for x>0 and n> 1, c-constant (i.e all Xjs are bounded by some rv X) Then

lim Pr

n

where S,= )'., Y; is a martingale with respect to Y,, n>1, and

An equivalent way to state the WLLN is

li

SLLN for martingales

For the martingale (X,,, G,,n = 1} satisfying the assumptions of the

WLLN if the sequences {X„„n> L} and {E(X,/2,-¡), n>1} are

stationary, then

nh LX, - EX/G-1)] > 0 (9.26)

i=l

This result shows clearly how the assumption of stationarity of {X,,n> 1} H

Trang 9

9.3 The central limit theorem 173

and {E(X„/2„_¡),n> 1} (see Chapter 8) can strengthen the WLLN result to

that of the SLLN

The above discussion suggests that the most important ingredients of the Bernoulli theorem are that:

(i) we consider the probabilistic behaviour of centred r.v.’s of the form

Z,=S,—np= 7- 1 (X;— E(X)));

(H) Var(S„)= O(m): and

(iii) for Y,=X,—E(X,), the sequence {Y,, n21} is a martingale

difference, i.e E(Y,/o(¥,-,, - Y,)=0,n2 1

This suggests that martingales provide a very convenient framework for these limit theorems because by definition they are r.v.’s with respect to an increasing sequence of o-fields and under some general conditions they converge to some r.v asm > x The latter being of great importance when convergence to a non-degenerate r.v is needed Moreover, for any

martingale sequence (X,, Y,, n21} the martingale differences sequence

LY, = 1} defines a martingale orthogonal sequence of r.v.’s which can help

us ensure (ii) above

Remark: The SLLN is sometimes credited as providing a mathematical foundation for the frequency approach to probability This is, however, erroneous because the definition is rendered circular given that we need a notion of probability to define the SLLN in the first place

As with the WLLN and SLLN, it was realised that LT2 was not contributing in any essential way to the De Moivre—Laplace theorem and the literature considered sequences of r.v.’s with restrictions on the first few

moments Let {X,,,n > 1} be a sequence of r.v.’s and $, = )"_, X;, the CLT

considers the limiting behaviour of

Si ~ E(S,)

which is a normalised version of S, — E(S,,), the subject matter of the WLLN and SLLN

Lindeberg—Levy theorem

Let (X,, n> 1} be a sequence of IID res such that E(X;)=n,

Var(X,=ø?< x for all i Then for F,(y) the DF of Y,

lim F„(y)= lim P(Y„<y)= —>>~— €Xp{ —}w?} du

(9.28)

Trang 10

174 Limit theorems

Liapunov’s theorem

Let {X,,n2 1! be a sequence of independent r.v.’s with

E(X))=u;, Var(X,)=a7 <x, E(X,j?°)<œ, 5>0

Define

(Ee)

lim (ars ` EWN, xn (9.29)

lim F„(y)= Ỉ em exp! —4u?! du (9.30)

Liapunov’s theorem is rather restrictive because it requires the existence of

moments higher than the second A more satisfactory result providing both

necessary and sufficient conditions is the next theorem; Lindeberg in 1923 established the ‘if part and Feller in 1935 the ‘only if? part

Lindeberg—Feller theorem

Let {X,,,n2 1} be a sequence of independent r.v.’s with distribution

functions {F,(x),n2 1} such that

i) E(X,)=m;

i=1,2, {°

(ii) Var(X, =o? <x, (9.31)

Then the relations

(a) lim max “=0, where „=(Š ø) ; (9.32)

n>œ I<iSn Cn 1

¥ 1

hold true, if and only if,

lim Ệ S | (x —p,)? AFI) }=0, (9.34)

[x Hi] > ec;

n

» \ : (x —y,)? dF {x)=Olc2) for all e>0 (9.35)

T-m|>e;

¡=1

The necessary and sufficient condition is known as the Lindeberg condition

and provides an intuitive insight into ‘what really gives rise to the result’.

Tiêu đề	The early limit theorems
Chuyên ngành	Probability theory

Định dạng
Số trang	18
Dung lượng	538,32 KB