Since the probability 6 .1 .4 will in general be small, the usual methods of establishing limit theorems via characteristic functions and partial dif-ferential equations are too crude fo
Trang 1Chapter 6
LIMIT THEOREMS FOR LARGE DEVIATIONS
§ 1 Introduction and examples
In this and succeeding chapters we shall examine the simplest problems
in the theory of large deviations Let X1 , X2 , be independent, identically distributed random variables, with
E(X1) = 0, V (Xi)= U2 , and let Z„ denote the normalised sum
Z n = (X 1+ X2+ + Xn) / ant
Then, for any x 0,
x
P(Z n <x)-(21t) -2 e--1,2dt-+O
as n-* oo, uniformly in IxI< xo If the XX have a probability density p(X), then the results of § 4 3 show that, under weak conditions, the density
Pn(x) of Z n satisfies
as x-+ oc, uniformly in jxj < x o
In many problems encountered in such different branches of science as mathematical statistics [18], [24], information theory [185], the statistical physics of polymers [181] and even the analytic arithmetic of the hyper-complex numbers [103], more precise information about the distribution
of Zn is required than is contained in the classical theorems In particular, such problems require the estimation of
(6 1 2)
Trang 26 1 INTRODUCTION AND EXAMPLES 1 5 5
when both n and x are large Such problems constitute the theory of large deviations
Since the probability (6 1 4) will in general be small, the usual methods of establishing limit theorems (via characteristic functions and partial dif-ferential equations) are too crude for the derivation of sufficiently general results, and most of the theorems about large deviations are proved under very stringent conditions Before formulating the problem in general,
we consider some simple but characteristic special results Consider a Bernoulli scheme of n independent trials, with a probability
p > 0 of success Write Y=1 if thejth trial results in a success, and Y =0
otherwise If
b (m, n, p) = P Y; = m ,
(j=i then of course
n!
b (m, n, p) = m
.
(n-m)
1 p m q n - m ,
where q= 1- p If X; = Y; - p, then
E (X j ) = 0 , V (X X ) =pq,
and Z„ takes only the values
X m = (m - np) / (npq)'
(m = 0, 1, 2, , n), with respective probabilities b (m, n, p) If we apply Stirling's formula to (6 1 5), we obtain the following local limit theorem
if x m =o(n) as n-+oo, then
We remark that the asymptotic formula (6 1 6) can be very useful, and is often much easier to compute than the exact expression (6 1 5)
Suppose that the random variables X; introduced at the beginning of the section satisfy Cramer's condition that, for some a>0,
Then the following theorem will be proved later
(6 1 5)
b (m, n , p) - (2rcnpq) ' exp { -ixm - ( xm) } , (6 1 6)
where
Pv-1-(-q)v-1 xv
(6 1 7)
~(x) = nPq ~ v=3
v(v-1) (npq) 2v
Trang 3Here 2 (z) is a power series constructed by means of the cumulants of the XX, and converging in a neighbourhood of z = 0, which conversely determines the distribution of XX, and
x
G (x) = (27r) -'
00
e-2`Zdt
This theorem displays an important characteristic property In it x is only restricted to the range [0, o(n2)], but suppose we restrict it to the narrower interval [0, n"], where a < 2 Then it is unnecessary to include in (6 1 9) the whole power series
since the truncated form
gives the same asymptotic formula, where s is the integer satisfying
Now it will be seen that the coefficients 2k (k < s) are determined by the cumulants of X; up to order (s+3) Thus if we have two sequences
A } and {Xj'}, both satisfying Cramer's condition, whose moments agree up to order (s + 3), and Z„ and Z ;, are the corresponding normalised sums, then for Ix I < n",
PZn <x( ) PZ n >x( ) 6.1 14
as n + oo
Thus the asymptotic behaviour of the tails of the distribution of Z", in the range lxi < n" (a< 2), is determined for distributions satisfying
Cra-156 LIMIT THEOREMS FOR LARGE DEVIATIONS Chap 6
Theorem 6.1 1 If x>,0 and x = o (n2) as n + oo, then
P(Z„>x) x3 7x x+l~
-n 2 \n 2 ) 1+O
-exp
and
P (Zn < - x) x 3
= ex
Trang 46 1 INTRODUCTION AND EXAMPLES 157
mer's condition by a finite number of parameters, the first (s+ 3) moments
of Xj This situation is analogous to the classical case, in which a whole class of distributions is attracted to the same stable law It is however in sharp contrast to the case a=-!, in which the whole function a (z) enters, since two different distributions have different functions A(z) Theorems
of the former type we will describe as having a "collective" character
In the range x = o (n+) the asymptotic expressions (6 1 9) and (6.1 10) are less valuable, since they are not collective They only have a computational value if it is easier to compute A (z) than to calculate the convolutions directly At the same time, these expressions can have a role in the ap-proximate estimation of the probabilities of large deviations (cf [18], [185]) Sometimes it is necessary to give bounds for such probabilities in wider ranges x=0(ni), in which the case of the Bernoulli scheme shows that we can have P (Z© > x) = 0 For such cases Bernstein's inequality („ 7 5) gives an upper bound of wide applicability
Let us remark that the study of the very large deviations x = 0 (n 2) gives rise to an expression involving the entropy of a certain system of events (Sanov [166]) We illustrate this by a simple example of the multinomial distribution
Suppose we require to test two alternative hypotheses H0 , H1 by means of
a series of n independent trials with possible outcomes A 1 , A 2 , , A r
According to H§ the respective probabilities of these outcomes are pl,
P 2 , P r ; according to H1 they are all equal to 1/r The likelihood ratio test accepts H§ if
(H) = r" P1 Pi 2 .pmr >
(6.1 15)
(H I )
where mi is the number of trials resulting in the outcome A l ,
n!
L(H§) ml!M2! mr! P1 P2 p
;`r
is the likelihood of H0, and L(H 1 ) is similarly defined Now (6 1 15) can
be thrown into the form
m1 log p 1 + m 2 log p 2 + +m1 log p r >log ~ -n log r, (6 1 16) and the expectation of the left-hand side, under H0, is n times the entropy
of the scheme A 1 , A 2 , , A r under this hypothesis Now suppose that
H1 is true, so that H§ is false, and the observations m 1, m 2 , , m rrepresent
Trang 51 5 8 LIMIT THEOREMS FOR LARGE DEVIATIONS Chap 6
large deviations from np l , np 2 , , np r Were this not so, we could apply the well-known Laplace approximation
r 2(r - 1) 2 1 2
L(HO) ( 27 rn) (PiP2. .Pr) exp -2
gixi ,
i=1
where
qi = 1 - pi , x i = (mi-npi)/(npigi)2
Using this approximation, the likelihood ratio criterion would give a quadratic rather than a linear form (6.1 16) The reason for the discrepancy
is that (6 1 17) does not hold when xi is of ordern2 ; the correct asymptotic expression includes terms of entropy type
„ 2 Statement of the problem
For the variables X; introduced at the beginning of the chapter, we examine the behaviour of the tail probabilities
as n-> oo for x in the range [0, 0 (n)], ip (n)being a function tending mono-tonically to infinity We shall seek theorems which imply that, for all
xc- [0, q1(n) ], as n > oo,
P(Zn>x)/cP(x, a1, a2, , ak, n)-+1 , (6 2 2) P(Zn < - x)/O(-x, b 1 , b 2 , , bl , n)-+1 , (6 2 3)
where the parameters a 1 , , a k , b 1 , , b, are linear functions of the dis-tribution F of the variables X; Such a limit theorem will have a collective character, since it will show that all distributions for which these linear functionals have given values have the same limiting behaviour To put
it another way, we can speak of the "domain of attraction" of the "limiting tails" 0 The problem of discovering the possible forms of the limiting tails is closely analogous to the classical problem of characterising the possible limit laws for centralised and normalised sums of independent variables, i e the stable laws And of course there is a corresponding problem of local limit theorems
In the following chapters, several systems of limiting tails are considered When theXjhave finite variance, the appropriate system is due to Cramer,
(6 1 17)
Trang 66 2 STATEMENT OF THE PROBLEM 1 5 9
the ai , bj are moments of XX, and collective theorems hold for / ( n) = na (a < Z) If not all the moments ofXXexist, limit theorems may still be valid, the a;, b ; being "pseudo-moments" defined by analytic continuation
In these theorems /( n) can be arbitrarily large There is one property of limit theorems for large deviations which should
be remarked ; the local theorems are usually easier to prove than the corresponding integral theorems This is because, although the former are stronger, they are naturally stated under stronger conditions, and these considerably ease the proofs Because of this, we begin with local limit theorems