The first moment method allows one to control the order of magnitude of a random variableX by its expectationE(X). In many cases, this control is insufficient, and one also needs to establish that X usually does not deviate too greatly from its expected value. These types of estimates are known aslarge deviation inequali- ties, and are a fundamental set of tools in the subject. They can be significantly more powerful than the first moment method, but often require some assumptions concerning independence or approximate independence.
The simplest such large deviation inequality isChebyshev’s inequality, which controls the deviation in terms of the varianceVar(X):
Theorem 1.5 (Chebyshev’s inequality) Let X be a random variable. Then for any positiveλ
P
|X−E(X)|> λVar(X)1/2
≤ 1
λ2. (1.8)
Proof We may assume Var(X)>0 as the case Var(X)=0 is trivial. From Markov’s inequality we have
P(|X−E(X)|2> λ2Var(X))≤ E(|X−E(X)|2) λ2Var(X) = 1
λ2
and the claim follows.
Thus Chebyshev’s inequality asserts thatX =E(X)+O(Var(X)1/2) with high probability, while in the converse direction it is clear that|X−E(X)| ≥Var(X)1/2 with positive probability. The application of these facts is referred to as thesecond moment method. Note that Chebyshev’s inequality provides both upper tail and lower tail bounds on X, with the tail decaying like 1/λ2 rather than 1/λ. Thus
the second moment method tends to give better distributional control than the first moment method. The downside is that the second moment method requires computing the variance, which is often trickier than computing the expectation.
Assume that X=X1+ ã ã ã +Xn, whereXis are random variables. In view of (1.3), one might wonder whether
Var(X)=Var(X1)+ ã ã ã +Var(Xn). (1.9) This equality holds in the special case when theXis are pairwise independent (and in particular when they are jointly independent), but does not hold in general. For arbitrary Xis, we instead have
Var(X)= n
i=1
Var(Xi)+
i,j∈[1,n]:i=j
Cov(Xi,Xj), (1.10) where thecovarianceCov(Xi,Xj) is defined as
Cov(Xi,Xj) :=E((Xi−E(Xi))(Xj−E(Xj))=E(XiXj)−E(Xi)E(Xj). Applying (1.9) to the special case whenX = |B|, whereBis some randomly generated subset of a setA, we see from (1.1) that if the eventsa ∈Bare pairwise independent for alla∈ A, then
Var(|B|)=
a∈A
P(a∈ B)−P(a∈ B)2 (1.11) and in particular we see from (1.4) that
Var(|B|)≤E(|B|). (1.12)
In the case when the eventsa ∈Bare not pairwise independent, we must replace (1.11) by the more complicated identity
Var(|B|)=
a∈A
P(a∈ B)−P(a∈ B)2+
a,a∈A:a=a
Cov(I(a∈ B),I(a∈ B)).
(1.13)
1.2.1 The number of prime divisors
Now we present a nice application of the second moment method to classical number theory. To this end, let1
ν(n) :=
p≤n
I(p|n)
1We shall adopt the convention that whenever a summation is over the indexp, thenpis understood to be prime.
denote the number of prime divisors ofn. This function is among the most studied objects in classical number theory. Hardy and Ramanujan in the 1920s showed that
“almost” allnhave about log lognprime divisors. We give a very simple proof of this result, found by Tur´an in 1934 [369].
Theorem 1.6 Letω(n)tend to infinity arbitrarily slowly. Then
|{x∈[1,n] :|ν(x)−log logx|> ω(n)
log logn}| =o(n). (1.14) Informally speaking, this result asserts that for a “generic” integerx, we have ν(x)=log logx+O(√
log logx) with high probability.
Proof Letxbe chosen uniformly at random from the interval{1,2, . . . ,n}. Our task is now to show that
P(|ν(x)−log logx|> ω(n)
log logn)=o(1).
Due to a technical reason, instead ofν(x) we shall consider the related quantity
|B|, where
B:=
pprime : p≤n1/10,p|x .
Sincex cannot have 10 different prime divisors larger thann1/10, it follows that
|B| −10≤ν(x)≤ |B|. Thus, to prove (1.14), it suffices to show P(||B| −log logn| ≥ω(n)
ln logn)=o(1).
Note that log logx=log logn+O(1) with probability 1−o(1). In light of Chebyshev’s inequality, this will follow from the following expectation and vari- ance estimates:
E(|B|),Var(|B|)=log logn+O(1).
It remains to verify the expectation and variance estimate. From linearity of expec- tation (1.4) we have
E(|B|)=
p≤n1/10
P(p|x) while from the variance identity (1.13) we have
Var(|B|)=
p≤n1/10
(P(p|x)−P(p|x)2)+
p,q≤n1/10:p=q
Cov(I(p|x),I(q|x)). Observe thatI(p|x)I(q|x)=I(pq|x). SinceP(d|x)= 1d +O(1n) for anyd ≥1, we conclude that
P(p|x)= 1 p +O
1 n
and
Cov(I(p|x),I(q|x))= 1 pq +O
1 n −
1 p+O
1 n
1 q +O
1 n
=O 1
n . We thus conclude that
E(|B|)=
p≤n1/10
1 p +O
n−9/10 and
Var(|B|)=
p≤n1/10
1 p − 1
p2 +O n−8/10
.
The expectation and variance estimates now follow from Mertens’ theorem (see Proposition 1.51) and the convergence of the sum
k 1
k2.
Exercises
1.2.1 When does equality hold in Chebyshev’s inequality?
1.2.2 If X and Y are two random variables, verify the Cauchy–Schwarz inequality|Cov(X,Y)| ≤Var(X)1/2Var(Y)1/2and thetriangle inequal- ityVar(X+Y)1/2≤Var(X)1/2+Var(Y)1/2. When does equality occur?
1.2.3 Prove (1.10).
1.2.4 Ifφ:R→Ris a convex function and X is a random variable, verify Jensen’s inequality E(φ(X))≤φ(E(X)). If φ is strictly convex, when does equality occur?
1.2.5 Generalize Chebyshev’s inequality using higher moments E(|X− E(X)|p) instead of the variance.
1.2.6 By obtaining an upper bound on the fourth moment, improve Theorem 1.6 to
1
N|{x∈[1,N] :|ν(x)−log logN|>K
log logN}| =O(K−4). Can you generalize this to obtain a bound ofOm(K−m) for any even integer m≥2, where the constant in the O() notation is allowed to depend on m?