Con-sequently, a new approach, large dimensional random matrices LDRM theory, has beenproposed to replace the classical large sample theory.. In this thesis, usingthe Bernstein polynomia
Trang 1LARGE DIMENSIONAL RANDOM MATRICES
WANG XIAOYING
NATIONAL UNIVERSITY OF SINGAPORE
2009
Trang 2STATISTICS FOR LARGE DIMENSIONAL RANDOM
MATRICES
WANG XIAOYING(B.Sc Northeast Normal University, China)
A THESIS SUBMITTEDFOR THE DEGREE OF DOCTOR OF PHILOSOPHY
DEPARTMENT OF STATISTICS AND APPLIED PROBABILITY
NATIONAL UNIVERSITY OF SINGAPORE
2009
Trang 3I would like to express my deep and sincere gratitude to my supervisors, Professor BaiZhidong and Associate Professor Zhou Wang Their valuable guidance and continuoussupport have been crucial to the completion of this thesis I do appreciate all the timeand efforts they have spent in helping me to solve the problems I encountered I havelearned many things from them, especially regarding academic research and characterbuilding
Special acknowledgement are also due to Assistant Professor Pan Guangming and
Mr Wang Xiping for discussions on various topics of large dimensional random matricestheory
It is a great pleasure to record my thanks to my dear friends Ms Zhao Wanting, Ms.Zhao Jingyuan, Ms Zhang Rongli, Ms Li Hua, Ms Zhang Xiaoe, Ms Li Xiang, Mr.Khang Tsung Fei, Mr Li Mengxin, Mr Deng Niantao, Mr Su Yue, Mr Wang Daqing,and Mr Loke Chok Kang, who have given me much help not only in my study but also
in my daily life Sincere thanks to all my friends who helped me in one way or another
Trang 4for their friendship and encouragement.
On a personal note, I thank my parents, husband, sisters and brother for their endlesslove and continuous support during the entire period of my PhD programme I also thank
my baby for giving me a lot of happy times and a sense of responsibility
Finally, I would like to attribute the completion of this thesis to other members andstaff of the department for their help in various ways and providing such a pleasantstudying environment I also wish to express my gratitude to the university and thedepartment for supporting me through an NUS research scholarship
Trang 51.1 Large Dimensional Random Matrices 1
1.2 Spectral Analysis of LDRM 3
1.3 Methodologies 5
1.3.1 Moment Method 5
1.3.2 Stieltjes Transform 8
1.3.3 Orthogonal Polynomial Decomposition 11
1.4 Organization of the Thesis 12
Trang 62 Literature Review 14
2.1 Limiting Spectral Distribution (LSD) of LDRM 14
2.1.1 Wigner Matrix 14
2.1.2 Sample Covariance Matrix 16
2.1.3 Product of Two Random Matrices 18
2.2 Limits of Extreme Eigenvalues 19
2.3 Convergence Rate of ESD 21
2.4 CLT of Linear Spectral Statistics (LSS) 22
3 CLT of LSS for Wigner Matrices 26 3.1 Introduction and Main Result 26
3.2 Bernstein Polynomial Approximation 29
3.3 Truncation and Preliminary Formulae 33
3.3.1 Simplification by Truncation 33
3.3.2 Preliminary Formulae 35
3.4 The Mean Function of LSS 36
3.5 Convergence of∆ − E∆ 49
Trang 74 CLT of LSS for Sample Covariance Matrices 61
4.1 Introduction and Main Result 61
4.2 Bernstein Polynomial Approximations 65
4.3 Simplification by Truncation and Normalization 68
4.4 Convergence of∆ − E∆ 70
4.5 The Mean Function of LSS 89
5 Conclusion and Further Research 96 5.1 Conclusion and Discussion 96
5.2 Future Research 98
Trang 8With the rapid development of computer science, large dimensional data have becomeincreasingly common in various disciplines These data resist conventional multivariateanalysis that rely on large sample theory, since the number of variables for each obser-vation can be very large and comparable to the sample size The classical multivariateanalysis appears to have intolerable errors in dealing with large dimensional data Con-sequently, a new approach, large dimensional random matrices (LDRM) theory, has beenproposed to replace the classical large sample theory
Spectral analysis of LDRM plays an important role in large dimensional data ysis After finding the limiting spectral distribution (LSD) of the empirical spectraldistribution (ESD i.e the empirical distribution of the eigenvalues) of LDRM, one caneasily derive the limit of the corresponding linear spectral statistics (LSS) Then, in order
anal-to conduct further statistical inference, it is important anal-to find the limiting distribution ofLSS of LDRM
A general conjecture about the convergence rate of ESD to LSD puts it at the order of
Trang 9O(n−1) If this is true, then it seems natural to consider the asymptotic properties of theempirical process Gn(x)= n(Fn(x) − F(x)) Unfortunately, many lines of evidence showthat the process Gn(x) cannot converge in any metric space As an alternative, we turned
to finding the limiting distribution of LSS of the process Gn(x) In this thesis, usingthe Bernstein polynomial approximation and Stieltjes transform method, under suitablemoment conditions, we prove the central limit theorem (CLT) of LSS with a generalizedregular class C4of the kernel functions for large dimensional Wigner matrices and sam-ple covariance matrices These asymptotic properties of LSS suggest that more efficientstatistical inferences, such as hypothesis testing, constructing confidence intervals or re-gions, etc., on a class of population parameters, are possible The improved criteria onthe constraint conditions of kernel functions in our results should also provide a betterunderstanding of the asymptotic properties of the ESD of the corresponding LDRM
Trang 10Wn(k) the submatrix extracted from Wn by removing its k-th
row and k-th column
γm the contour formed by the boundary of the rectangle with
vertices (±a ± i/√m)
γmh, γmv the union of the two horizontal/vertical parts of γm
Trang 12Chapter 1
Introduction
1.1 Large Dimensional Random Matrices
In classical multivariate analysis, large sample theory assumes that the data dimension
pis very small and fixed; the number of observations, sample size n, is large or tends
to infinity However, in recent four or five decades, this is not always the case with therapidly developing computer science For some contemporary data, the dimension p isalso large and comparable to the sample size n, in some cases, even larger than the samplesize These phenomena are commonplace in various fields, such as finance, genetics,bioinformatics, wireless communications, signal processing, and environmental scienceand so on Hence, the new features of contemporary data bring a series of new tasks tostatisticians, for example, how to properly describe these new features of data; whetherthe classical limiting theory is suitable for analyzing large dimensional data and if not,
Trang 13how to amend it.
In the 1972 Wald Memorial Lecture, Huber (1973) proposed a new reasonably totic setup for large sample theory After summarizing and analyzing several possibilitiesfor the concomitant of p as n tends to infinity, he strongly suggested studying the situa-tion of dimension p increasing together with n in linear regression analysis For LDRM,
asymp-it is the convention to exploasymp-it the properly simple asymptotic setup, p tends to infinasymp-ityproportional to n, that is, p/n → y ∈ (0, ∞) This new setup leads to two kinds of limitingresults: classical limiting theory (for fixed dimension p) and large dimensional limitingtheory (for large dimension p and also called LDRM theory) Therefore, it is natural
to ask which one is closer to reality, meaning which kind of limiting theory should beapplied to a particular problem
Bai and Saranadasa (1996) encouraged statisticians to reexamine classical statisticalapproaches when dealing with high dimensional data As an example for the effect ofhigh dimension, a two sample problem was investigated They showed that both Demp-ster’s non-exact test (Dempster, 1958) and their asymptotically normally distributed testhave higher power than classical Hotelling’s test when the data dimension is proportion-ally close to the within sample degree of freedom Another example was presented inBai and Silverstein (2004) When p increases proportionally to n, an important statistics
in multivariate analysis Ln = ln(detSn) performs in a complete different manner than itdoes on data of low dimension with large sample size (Here Snis the sample covariancematrix of n samples from a p-dimensional mean zero random vector with population
Trang 14matrix I) Thus, when p is large, any test which assumes asymptotic normality of Ln, i.e.employs the classical limiting theory will result in a serious error.
These examples show that some classical limiting theory might be no longer suitablefor dealing with large dimensional data The LDRM theory might offer one possiblemethod to analyze large dimensional data and has attracted much interest from statisti-cians At the international conference “Mathematical Challenges of the 21st Century”,Donoho (2000) stated that, “we can say with complete confidence that in the coming cen-tury, high-dimensional data analysis will be a very significant activity, and completelynew methods of high-dimensional data analysis will be developed; we just don’t knowwhat they are yet.”
1.2 Spectral Analysis of LDRM
A major part of LDRM theory is the spectral analysis of LDRM Suppose A is an n ×
n matrix with eigenvalues λj, j = 1, 2, · · · , n If all these eigenvalues are real, e.g.,
if A is symmetric or Hermitian for complex entries, we can define a one-dimensionaldistribution function:
Trang 15indi-ESD of the matrix A:
FA(x, y)= 1
n × cardinal number of { j ≤ n : Re(λj) ≤ x, Im(λj) ≤ y}
= 1n
n
X
j =1
I{Re(λj) ≤ x, Im(λj) ≤ y},
where Re(λj) and Im(λj) denote the real and imaginary parts of the complex number λj,respectively
One of the original motivations of spectral analysis of LDRM arose in nuclear physicsduring the 1950’s In a quantum mechanical system, the energy levels of quantum cannot
be observed but can be represented by eigenvalues of a matrix of some physical ments or observations Furthermore, most nuclei have thousands of energy levels, whichare too complex to be described exactly Since the 1950’s, a large number of physicistsand statisticians have showed keen interest in the research on spectral analysis of LDRMand have carried out some research results on it Many theorems and applications of theLDRM theory in quantum mechanic and other related areas were well summarized byMehta (1990)
measure-In multivariate statistical inference, the motivation of spectral analysis of LDRM
is due to the fact that many important statistics can be expressed as functionals of thespectral distribution of some random matrices For concrete applications, the reader mayrefer to Bai (1999)
Two key problems in spectral analysis of LDRM theory are to investigate the vergence and its rate of the sequence of empirical spectral distributions FA n for a given
Trang 16con-sequence of random matrices {An} The limit distribution F (possibly defective), which
is usually non-random, is called the Limiting Spectral Distribution (LSD) of the matrixsequence {An} The following section will review three main methodologies of findingthe limits and improving the convergence rates
1.3 Methodologies
It is known that the eigenvalues of a matrix are continuous functions of the elements
of the matrix elements However, the explicit forms of eigenvalues are too complex tocalculate when the matrix dimension is larger than 4 In order to investigate the spectraldistributions of LDRM, three primary methods have been employed in the literature.They are moment method, Stieltjes transform and orthogonal polynomial decomposition
of the exact density of eigenvalues
Moment method is one of the most popular methods in LDRM theory, which is based onthe Frechet-Shohat Moment Convergence Theorem (MCT, see Lo`eve, 1977) Suppose{Fn} denotes a sequence of distribution functions with finite moments of all orders TheMCT investigates under what conditions the convergence of moments of all fixed ordersimplies the weak convergence of the sequence of the distributions Fn The sufficient
Trang 17conditions are precisely described in the following three lemmas One may refer to Baiand Silverstein (2006, Appendix B) for their detailed proof.
Let
βnk = βk(Fn) :=
Z
xkdFn(x)
be the k-th moment of the distribution Fn
Lemma 1.3.1 (Unique Limit) A sequence of distribution functions {Fn} converges weakly
to a limit if the following conditions are satisfied:
(1) Each {Fn} has finite moments of all orders
(2) For each fixed integer k ≥0, βnk converges to a finite limit βk as n → ∞
(3) If two right continuous nondecreasing functions F, G have the same momentsequence βk, then F = G+constant
One can prove Lemma 1.3.1 by using Helly’s selection theorem and the property ofdistribution function When we applying this lemma, besides verifying condition (1)and (2), we need to check condition (3) of this lemma The following two lemmas give
sufficient conditions which imply condition (3) in Lemma 1.3.1
Lemma 1.3.2 (Carleman) Let {βk = βk(F)} be the sequence of moments of the tion function F If the following Carleman condition is satisfied:
distribu-∞
X β−1/2k 2k = ∞,
Trang 18then, F is uniquely determined by the moment sequence {βk, k = 0, 1, · · · }.
The following Lemma 1.3.3 is a corollary of Craleman condition in Lemma 1.3.2.The Riesz condition referring to (1.1) is easy to check and is powerful enough in spectralanalysis of LDRM
Lemma 1.3.3 (M Riesz) Let {βk} be the sequence of moments of the distribution tion F If
then, F is uniquely determined by the moment sequence {βk, k = 0, 1, · · · }
In LDRM theory, the k-th moment of FAcan be written as
Trang 19One of the important advantages of Stieltjes transform is that it always exists for allfunctions with bounded variations defined on the real line The following lemmas in thissection about the properties and inequalities of Stieltjes transform are well summarizedand proved in Bai and Silverstein (2006, Appendix B).
Lemma 1.3.4 (Inversion formula) If G is a distribution function, for any continuitypoints a < b of G,
G(b) − G(a)= lim
v↓0
1π
Z b a
Im(sG(z))du
This lemma provides a one-to-one correspondence between the distribution functionsand their Stieltjes transforms Furthermore, it also offers an easy way to extract thedistribution function if its Stieltjes transform is known
Lemma 1.3.5 (Continuity) Assume that {Gn} is a sequence of bounded variation tions and Gn(−∞)= 0 for all n Then
func-limsGn(z)= s(z) ∀z ∈ C+,
Trang 20if and only if there is function of bounded variation G with G(−∞) = 0 and Stieltjestransform s(z) such that Gn → G vaguely.
This Lemma describes continuity properties between the family of bounded variationfunctions and the family of their Stieltjes transforms
Lemma 1.3.6 (Differentiability) Let G be a function of bounded variation and x0 ∈ R
Suppose that ImsG(x0) = lim
z∈C+→x 0
ImsG(z) exists Then G is differentiable at x0, and itsderivative is 1πImsG(x0)
From this lemma, one can find another important advantage of Stieltjes transform
is that the density function of a distribution function can be obtained via its Stieltjestransform
In LDRM theory, if A be an n × n matrix, the Stieltjes transform of FA has thefollowing expression
where I is the identity matrix
Applying the inverse matrix formula, we have
Trang 21The following three lemmas describe the distance between distributions in term oftheir Stielfjes transforms and pave way for estimating convergence rates of ESD ofLDRM to its LSD.
Lemma 1.3.7 Let F be a distribution function and let G be a function of bounded ation satisfying R |F(x) − G(x)|dx < ∞ Denote their Stieltjes transforms by f (z) andg(z), respectively Then, we have
where z= u + iv, v > 0, and a and γ are constants related to each other by
Trang 22Lemma 1.3.8 Under the assumptions of Lemma 1.3.7, we have
where A and B are positive constants such that A > B and κ = 4B
π(A−B)(2γ−1) < 1
The following Lemma 1.3.9 is an immediate corollary of Lemma 1.3.8
Lemma 1.3.9 In addition to the assumptions of Lemma 1.3.8, assume further that, forsome constant B > 0, F([−B, B]) = 1 and |G|((−∞, −B)) = |G|((B, ∞)) = 0, where
|G|((a, b)) denotes the total variation of the signed measure G on the interval (a, b).Then, we have
where A, B and κ are defined in Lemma 1.3.8
If all elements of the matrix A has a joint density pn(A) = H(λ1, , λn), then the jointdensity of the eigenvalues will be given by
f(λ1, , λn)= cJ(λ1, , λn)H(λ1, , λn),
Trang 23where J is the integral of the Jacobian of the transform from the matrix space to itseigenvalue-eigenvector space.
Generally, it is assumed that H has the form H(λ1, , λn)= Πn
k =1g(λk) and J has theform J(λ1, , λn) = Πi< j(λi − λj)βΠn
k =1hn(λk) For example, β = 1 and hn = 1 for realGaussian matrix, β = 2 and hn = 1 for complex Gaussian matrix, β = 4 and hn = 1 forquaternion Gaussian matrix, β = 1 and hn= xn−pfor real Wishart matrix with n ≥ p
Note that the orthogonal polynomial decomposition can only be applied under the sumption that the exact density of eigenvalues is known However, in this thesis, we willnot assume the existence of density functions which is too restrictive Instead, we con-sider a general situation The underlying distribution of the elements of matrices could
as-be discrete Hence the detailed discussion on the orthogonal polynomial decomposition
is beyond the scope of this study
1.4 Organization of the Thesis
This thesis consists of five chapters and is organized as follows In this chapter, Chapter
1, we have provided a general introduction to the motivation of the LDRM theory andthe spectral analysis as well as three main methodologies in this field
In Chapter 2, we present a detailed review on spectral analysis of LDRM
Chapter 3 and Chapter 4 are the main parts of this thesis We prove our main results,
Trang 24Theorem 3.1.1 and Theorem 4.1.1.
In the last chapter, Chapter 5, we discuss some applications and possible future search
Trang 25in the complex case); and the entries on or above the diagonal are independent.
The Wigner matrix is named after the famous physicist Eugene Wigner; and it plays
an important role in nuclear physics (see Mehta (1990)) It also has strong statistical
Trang 26meaning in multivariate analysis as it is the limit of the normalized Wishart Matrix.
The study of spectral analysis of large dimensional Wigner matrix can date back toEugene Wigner’s (1955, 1958) famous semicircular law He proved that the expectedESD of an n × n standard Gaussian matrix Wn, normalized by (1/√n), converges to thesemicircular law F with the density
√
4 − x2, if |x| ≤ 2;
(2.1)
Grenander (1963) proved that kF√1n Wn
− Fk → 0 in probability It was further alized by Arnold (1967, 1971) in various aspects Bai (1999) derived the almost sureversion using both the moment method and the Stieltjes transform method This result
gener-is presented in the following theorem
Theorem 2.1.1 Suppose Wn= (xi j) is an n × n generalized Wigner matrix whose entriesabove the diagonal are i.i.d complex random variables with variance σ2, and whosediagonal entries are i.i.d real random variables (without any moment requirement).Then, as n → ∞,with probability 1, the ESD F√1n Wn
tends to the semicircle law withscale-parameter σ, whose density is given by
√4σ2− x2, if |x| ≤ 2σ;
(2.2)
The following theorem is the generalized result of the non-i.i.d case proved by Baiand Silverstein (2006, page 23)
Trang 27Theorem 2.1.2 Suppose that Wnis a Wigner matrix, the entries above or on the diagonal
of Wnare independent but may be dependent on n and may not be necessarily identicallydistributed Assume that all the entries of Wnare of mean zero and variance 1 and satisfythe following condition For any constant η >0,
Then, the ESD of Wnconverges to the semicircular law almost surely
Definition 2.1.2 (Sample Covariance Matrix) Let Xn = (xi j)p×n, 1 ≤ i ≤ p, 1 ≤ j ≤ n,
be an observation matrix of size n from a certain p-dimensional population distributionand xj = (x1 j, · · ·, xp j)t be the j-th column of Xn Then the sample covariance matrix is
j =1xj and A∗denotes the complex conjugate transpose of matrix A
In spectral analysis of large dimensional sample covariance matrix, it is usual to studythe simplified sample covariance matrices which is given by
Bn = 1n
Trang 28The first success in finding the LSD of sample covariance matrix was attributed toMarˇcenko and Pastur (1967) They found the limiting distribution, presently known asMarˇcenko-Pastur law (MP law) Subsequent work was done in Grenander and Silver-stein (1977), Jonsson (1982), Silverstein (1995), Wachter (1978) and Yin (1986) Thefollowing theorem in Bai (1999) for the complex case is a generalized version of Yin(1986) where the real case was studied.
Theorem 2.1.3 Suppose that {xi j, 1 ≤ i ≤ p, 1 ≤ j ≤ n} is a double array of i.i.d.complex random variables with mean zero and variance σ2 and p/n → y Then, withprobability 1, the ESD of Bntends to a limiting distribution with the density
√(b − x)(x − a), if a ≤ x ≤ b;
and a point mass 1 − 1/y at the origin if y > 1, where a = σ2(1 − √y)2 and b =
σ2(1+ √y)2; the constant y is the dimension to sample size ratio index
The limiting distribution of Theorem 2.1.3 is called the Marˇcenko-Pastur law with ratioindex y and scale parameter σ2 If σ2 = 1, the MP law is known as the standard MP law
The following theorem extend the above result to the non-i.i.d case for samplecovariance matrices proposed in Bai and Silverstein (2006, page 46)
Theorem 2.1.4 Suppose that for each n, the entries of Xnare independent complex ables with a common mean µ and variance σ2 Assume that p/n → y and that for any
Trang 29i j
E|xi j|2I{|xi j| ≥η√n}= 0,Then, with probability one, the ESD of sample covariance matrices FBn converges to the
MP law with ratio index y and scale parameter σ2
The study of a product of two random matrices originates from two areas The first
is the investigation of the LSD of a sample covariance matrix S T when the populationcovariance matrix T is not a multiple of the identity matrix The second is the study ofLSD of a multivariate F-matrix F = S1S−1
2 which is a product of a sample covariancematrix and the inverse of another sample covariance matrix, both independent of eachother
Yin and Krishnaiah (1983) investigated the limiting distribution of a product of aWishart matrix S and a positive definite matrix T Other variations of the product wasconsidered by Bai, Yin and Krishnaiah (1986) Silverstein and Bai (1995) showed theexistence of the ESD of the generalized version B= A +1
nX∗T X The set-up of matrix Boriginated from nuclear physics, but is also encountered in multivariate statistics
As for the F-matrix, pioneering work was done by Wachter (1980), who consideredthe LSD of F when S1and S2are independent Wishart matrices Yin, Bai and Krishnaiah(1983) also showed the existence of the LSD of the multivariate F-matrix The explicit
Trang 30form of the LSD of multivariate F-matrices was derived in Bai, Yin, and Krishnaiah(1987) and Silverstein (1985a) Under the same structure, Bai, Yin, and Krishnaiah(1986) established the existence of the LSD when the underlying distribution of S isisotropic.
2.2 Limits of Extreme Eigenvalues
In multivariate analysis, many statistics generated from a random matrix can be written
as functions of integrals with respect to the ESD of the random matrix When the LSD isknown, the approximate values of the statistics can be obtained by using the Helly-Braytheorem (see Lo`eve (1977) p.184-186), which is not applicable unless we can prove thatthe extreme eigenvalues of the random matrix remain in certain bounded intervals
The investigation on limits of extreme eigenvalues is not only important in the aboveaspect, but also in many other areas, such as signal processing, pattern recognition, edgedetection, numerical analysis
The first work in this direction is attributed to Geman (1980) who proved that thelargest eigenvalue of the large dimensional sample covariance matrix tends to b = (1 +
√
c)2 as p/n → y ∈ (0, ∞) under a restriction on the growth rate of the moments of theunderlying distribution
E|X11|k ≤ Mkαk,
Trang 31for some M > 0, α > 0 and for all k ≥ 3 This result was further generalized byYin, Bai and Krishnaiah (1988) under the assumption of the existence of the fourthmoment of the underlying distribution The fourth moment condition was further proved
to be necessary in Bai, Silverstein and Yin (1988) Silverstein (1989) showed that thenecessary and sufficient conditions for the weak convergence of the largest eigenvalue
of a sample covariance matrix are EX11 = 0 and n2P|X11| ≥ √n → 0 In Bai andYin (1988), the necessary and sufficient condition for the almost sure convergence of thelargest eigenvalue of Wigner matrix was obtained Jiang (2004) proved that the almostsure limit of the largest eigenvalue of the sample correlation matrix is same as that of thelargest eigenvalue of the sample covariance matrix
A relatively difficult problem is to find the limit of the smallest eigenvalue of ple covariance matrix Yin, Bai and Krishnaiah (1983) proved that the lower limit ofthe smallest eigenvalue of a Wishart matrix has a positive lower bound if p/n → y ∈(0, 1/2) Silverstein (1984) extended this work to y ∈ (0, 1) He later (1985b) provedthat the smallest eigenvalue of a standard Wishart matrix tends to a = (1 − √y)2 ifp/n → y ∈(0, 1) However, it is hard to use his approach to obtain a general result, ashis method depends heavily on the normality assumption A breakthrough was made inBai and Yin (1993) They used a unified approach to establish the strong limits of boththe largest and the smallest eigenvalues of the sample covariance matrix simultaneouslyunder the existence of fourth moment of the underlying distribution In fact, the stronglimit of the smallest eigenvalue was proven to be a = (1 − √y)2
Trang 32sam-2.3 Convergence Rate of ESD
The convergence rate of the ESD, is of practical interest, but had been an open problemfor decades since there were no suitable tools The first great breakthrough in estimatingthe convergence rate was made in Bai (1993a), in which a Berry-Esseen type inequality
of the difference of two ESDs was established in terms of their Stieltjes transforms.Through this tool, Bai offered a way to establish the convergence rate and proved that
a convergence rate for the expected ESD of large dimensional Wigner matrix is O(n−1).Applying this inequality, Bai, Miao and Tsay (1997) first showed that the ESD itselfconverges to the Wigner semicircular law in probability with the rate O(n−1), under theassumption of the finite fourth moment Later, Bai, Miao and Tsay (1999) improved therate to Op(n−1) In 2002, they further derived that under the eighth moment condition,the convergence rate of the expected ESD is O(n−1) and that of ESD itself is Op(n−2)
For large dimensional sample covariance matrices, under the finite fourth momentcondition, Bai (1993b) showed that the convergence rate for the expectation of ESD isO(n−1) if the ratio of the dimension to the degrees of freedom is away from 1, and isO(n−485), if the ratio is close to 1 Bai, Miao and Tsay (1997) proved the same rates ofconvergence in probability for ESD itself
Using the Stieltjes transform, Bai, Miao, and Yao (2003) improved that the expectedspectral distribution converges to the Marcenko–Pastur law with the rate of O(n−1) if theratio of dimension to sample size y = yn = p/n keeps away from 0 and 1, under theassumption that the entries have a finite eighth moment Furthermore, the rates for both
Trang 33the convergence in probability and the almost sure convergence are shown to be Op(n−2)and oa.s.(n−2+η), respectively, when y is away from 1 It is interesting that the rate inall senses is O(n−1) when y is close to 1 However, the exact convergence rate and theoptimal condition of convergence for Wigner and sample covariance matrices are stillopen.
2.4 CLT of Linear Spectral Statistics (LSS)
As mentioned in the introduction section, many important statistics in multivariate ysis can be expressed as functionals of ESD of some random matrices Indeed, a param-eter θ of the population can often be expressed as:
Trang 34we need to know the limiting distribution of
Gn( f )= αn(ˆθ − θ)=Z f(x)dGn(x)
where Gn(x)= αn(Fn(x) − F(x)) and αn → ∞ is a suitably chosen normalizer such that
Gn( f ) tends to a non-degenerate distribution
It seems natural to pursue the properties of linear functionals by considering theasymptotic of the empirical process Gn(x)= αn(Fn(x) − F(x)) when viewed as a randomelement in C space or D space, the metric spaces of functions, equiped with the Sko-rokhod metric If for some choice of αn, Gn(x) tends to a limiting process G(x), then thelimiting distribution of all LSS can be derived Unfortunately, many lines of evidenceshow that Gn(x) cannot tend to a limiting process in any metric space The work done
in Bai and Silverstein (2004) showed that Gn(x) cannot converge weakly to any trivial process for any choice of αn This phenomenon appears in other random matrixensembles as well When Fn is the empirical distribution of the angles of eigenvalues
non-of an n × n Haar matrix, Diaconis and Evans (2001) proved that all finite dimensionaldistributions of Gn(x) converge in distribution to independent Gaussian variables when
αn = n/ plog n This shows that when αn = n/ plog n, the process Gncannot be tight in
D space
Therefore, we have to withdraw our attempts of looking for the limiting process of
Gn(x) Instead, we will consider the convergence of the empirical process Gn( f ) withsuitable αnand f
The first work in this direction was done by Jonsson (1982) in which he proved the
Trang 35CLT for the centralized sum of the r-th power of eigenvalues of a normalized Wishartmatrix Similar work for Wigner matrix was obtained in Sinai and Soshnikov (1998).Later, Johansson (1998) proved the CLT of linear spectral statistics of Wigner matrixunder density assumption.
In Bai and Silverstein (2004), the normalization constant αn for large dimensionalsample covariance matrices has been found to be n, by showing that the limiting distri-bution of Gn( f )= n R f (x)d(Fn(x) − F(x)) is Gaussian under certain assumptions, where
f is any function analytic on a certain open set including the support of MP law In Baiand Yao (2005), they considered the Wigner matrix case Under fourth moment condi-tion, they proved that Gn( f ) converges to a Gaussian limit For the CLT of other typematrices, one can refer to Anderson and Zeitouni (2006)
In Bai and Silverstein (2004) and Bai and Yao (2005), the test functions f are analytic
on an open set, including the support of the corresponding limit distributions However,the condition that the functions have to be analytic is stringent This is because some
of the functions observed in real-life situations do not satisfy this condition As such, itwill be more useful to relax this condition
The aim of this thesis is to relax this condition We only require that the test functionshave continuous fourth-order derivatives on an open interval including the support of thecorresponding limiting spectral distribution We prove that the LSS for sample covari-ance matrices and Wigner matrices converge weakly to Gaussian processes under certainmoment conditions We also provide the explicit formulae for the mean and covariance
Trang 36function of the limiting Gaussian processes.
Trang 37Chapter 3
CLT of LSS for Wigner Matrices
3.1 Introduction and Main Result
A real Wigner matrix of size n is a real symmetric matrix Wn = (xi j)1≤i, j≤n where theupper-triangle entries (xi j)1≤i≤ j≤n are independent, zero-mean real-valued random vari-ables satisfying the following moment conditions:
(1) ∀i, E|xii|2 = σ2 > 0; (2) ∀i < j, E|xi j|2 = 1
The set of these real Wigner matrices is called the Real Wigner Ensemble (RWE)
A complex Wigner matrix of size n is a Hermitian matrix Wn = (xi j)1≤i, j≤nwhere theupper-triangle entries (xi j)1≤i≤ j≤n are independent, zero-mean complex-valued randomvariables satisfying the following moment conditions:
(1) ∀i, E|xii|2 = σ2 > 0; (2) ∀i < j, E|xi j|2= 1, and Ex2 = 0
Trang 38The set of these complex Wigner matrices is called the Complex Wigner Ensemble(CWE).
The empirical distribution Fngenerated by the n eigenvalues of the normalized Wignermatrix n−1/2Wnis called the empirical spectral distribution (ESD) of Wigner matrix Thesemicircular law states that Fnconverges a.s to the distribution F with the density
F0(x)= 1
2π
√
4 − x2, x ∈[−2, 2]
Its various versions of convergence were later investigated
Clearly, as stated in introduction, one method of refining the above approximation
is to establish the rate of convergence, which was studied in Bai (1993a), Costin andLebowitz (1995), Johansson (1998), Khorunzhy, Khoruzhenko and Pastur (1996), Sinaiand Soshnikov (1998) and Bai, Miao and Tsay (1997, 1999, 2002) The convergencerate was improved gradually from O(n−1) to O(n−2) Although the exact convergencerate remains unknown for Wigner matrices, Bai and Yao (2005) proved that the LSS ofWigner matrices indexed by a set of functions analytic on an open domain of the complexplane including the support of the semicircular law converges to a Gaussian process withrate n under finite fourth moment and a Lindeberg type condition
Naturally, one may ask whether it is possible to derive the convergence of the LSS
of Wigner matrices indexed by a larger class of functions In other words, can we relaxthe analyticity condition on test functions?
In this paper, we consider the LSS of Wigner matrices, which is indexed by a set
Trang 39of functions with continuous fourth-order derivatives on an open interval of the real lineincluding the support of the semicircular law More precisely, let C4(U) denote the set offunctions f : U → C which have continuous fourth-order derivatives The open set U
of the real line includes the interval [−2, 2], the support of F(x) The empirical process
Gn , {Gn( f )} indexed by C4(U) is given by
Theorem 3.1.1 Suppose
E|xi j|6≤ M for all i, j (3.1)Then the empirical process Gn = {Gn( f ) : f ∈ C4(U)} converges weakly in finite dimen-sions to a Gaussian process G := {G( f ) : f ∈ C4(U)} with mean function
EG( f ) = κ − 1
4 [ f (2)+ f (−2)] − κ − 1
2 τ0( f )+ (σ2−κ)τ2( f )+ βτ4( f )
and covariance function
c( f , g) , E[{G( f ) − EG( f )}{G(g) − EG(g)}]
Trang 40where f , g ∈ C4(U),
V(t, s) = σ2−κ + 1
2βts
!p(4 − t2)(4 − s2)+κ log
Here {Tl, l ≥ 0} is the family of Chebyshev polynomials
The strategy of the proof is to use Bernstein polynomials to approximate functions
in C4(U) This will be done in Section 3.2 Then the problem is reduced to the analyticcase The truncation and re-normalization steps are given in Section 3.3 We derive themean function of the limiting process in Section 3.4 The convergence of the empiricalprocesses is proved in Section 3.5
3.2 Bernstein Polynomial Approximation
It is well-known that if ˜f(y) is a continuous function on the interval [0, 1], the Bernsteinpolynomials
!
yk(1 − y)m−kf˜ k
m
!