VEITCH Software Engineering Research Centre, Carlton, Victoria 3053, Australia 2.1 THE SCALING PHENOMENA 2.1.1 Scaling Issues in Traf®c The presence of scaling behavior in telecommunicat
Trang 1WAVELETS FOR THE ANALYSIS,
ESTIMATION, AND SYNTHESIS OF
SCALING DATA
P ABRY AND P FLANDRIN
CNRS UMR 5672, EÂcole Normale SupeÂrieure de Lyon, Laboratoire de Physique,
69 364 Lyon Cedex 07, France
M S TAQQU
Department of Mathematics, Boston University, Boston, MA 02215-2411
D VEITCH
Software Engineering Research Centre, Carlton, Victoria 3053, Australia
2.1 THE SCALING PHENOMENA
2.1.1 Scaling Issues in Traf®c
The presence of scaling behavior in telecommunications traf®c is striking not only inits ubiquity, appearing in almost every kind of packet data, but also in the wide range
of scales over which the scaling holds (e.g., see Beran et al [18], Leland et al [43],and Willinger et al [78]) It is rare indeed that a physical phenomenon obeys aconsistent law over so many orders of magnitude This may well extend further, asincreases in network bandwidth over time progressively ``reveal'' higher scales.While the presence of scaling is now well established, its impact on teletraf®cissues and network performance is still the subject of some confusion anduncertainty Why is scaling in traf®c important for networking? It is clear, as far
as modeling of the traf®c itself is concerned, that a feature as prominent as scaling
39
Self-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger ISBN 0-471-31974-0 Copyright # 2000 by John Wiley & Sons, Inc.
Self-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger
Copyright # 2000 by John Wiley & Sons, Inc Print ISBN 0-471-31974-0 Electronic ISBN 0-471-20644-X
Trang 2should be built into models at a fundamental level, if these are to be both accurateand parsimonious Scaling, therefore, has immediate implications for the choice ofclasses of traf®c models, and consequently on the choice, and subsequent estimation,
of model parameters Such estimation is required for initial model veri®cation, for
®tting purposes, as well as for traf®c monitoring
Traf®c modeling, however, does not occur in isolation but in the context ofperformance issues Depending on the performance metric of interest, and the model
of the network element in question, the impact and therefore the relevance of scalingbehavior will vary As a simple example, it is known that, in certain in®nite buffer
¯uid queues fed by long-range-dependent (LRD) on=off sources, the stationaryqueueing distribution has in®nite mean, a radically nonclassical result Such in®nitemoments disappear, however, if the buffer is ®nite, intuitively because a ®nitereservoir cannot ``hold'' long memory The long-range dependence of the inputstream will strongly affect the over¯ow loss process but cannot seriously exacerbatethe conditional delay experienced by packets that are not lost, as this is bounded bythe size of the buffer The importance of scaling in the performance sense, apart frombeing as yet unknown in a great many cases, is therefore context dependent
We focus here on the fundamental issues of detection, identi®cation, andmeasurement of scaling behavior These cannot be ignored even if one is interested
in performance questions that are not directly related to scaling This is becausescaling induces nonclassical statistical properties that affect the estimation of allparameters, not merely those that describe scaling This, in turn, affects thepredictive abilities of performance models and therefore their usefulness inpractice
The reliable detection of scaling should thus be our ®rst concern By detecting theabsence or presence of scaling, one will know whether the data need be analyzed byusing traditional statistics or by using special statistical techniques that take thepresence of scaling into account Here it is vital to be able to distinguish artifacts due
to nonstationarities, with the appearance of scaling, from true scaling behavior.Identi®cation is necessary since more than one kind of scaling exists, with differinginterpretations and implications for model choice Finally, should scaling of a givenkind be present, an accurate determination of the parameters that describe it must bemade These parameters will control the statistical properties of estimates made of allother quantities, such as the parameters needed in traf®c modeling or quality ofservice metrics
As a simple yet powerful example of the above, consider a second-order process
X t, which we know to be stationary, and whose mean mX we wish to estimate from
a given data set of length n For this purpose the simple sample mean estimator is areasonable choice The classical result is that asymptotically for large n the samplemean follows a normal distribution, with expectation equal to mX, and variance
s2
X=n, where s2
X is the variance of X In the case where X is LRD the sample mean isalso asymptotically normally distributed with mean mX; however, the variance isgiven by 2crna= 1 aa 1=n, where a 2 0; 1 and cr2 0; 1 are the parametersdescribing the long-range dependence [17, p 160] This expression reveals that thevariance of the sample mean decreases with the sample size n at a rate that is slower
Trang 3than in the classical case Noting that the ratio of the size of the LRD-based variance
to the classical one grows to in®nity with n, it becomes apparent that con®denceintervals based on traditional assumptions, even for a quantity as simple as thesample mean, can lead to serious errors when in fact the data are LRD
We focus here on how a wavelet-based approach allows the threefold objective ofthe detection, identi®cation, and measurement of scaling to be ef®ciently achieved.Fundamentally, this is due to the nontrivial fact that the analyzing wavelet familyitself possesses a scale-invariant feature, a property not shared by other analysismethods A key advantage is that quite different kinds of scaling can be analyzed bythe same technique, indeed by the same set of computations The semiparametricestimators of the scaling parameters that follow from the approach have excellentpropertiesÐnegligible bias and low varianceÐand in many cases compare well evenagainst parametric alternatives The computational advantages, based on the use ofthe discrete wavelet transform (DWT), are very substantial and allow the analysis ofdata of arbitrary length Finally, there are very valuable robustness advantagesinherent in the method, particularly with respect to the elimination of superposedsmooth trends (deterministic functions)
Another important issue connected with modeling and performance studiesconcerns the generation of time series for use in simulations Such simulationscan be particularly time consuming for long memory processes where the past exerts
a strong in¯uence on the future, disallowing simple approximations based ontruncation Wavelets offer in principle a parsimonious and natural way to generategood approximations to sample paths of scaling processes, which bene®t from thesame DWT-based computational advantages enjoyed by the analysis method Thisarea is less well developed than is the case for analysis, however
2.1.2 Mapping the Land of Scaling and Wavelets
The remainder of the chapter is organized as follows
Section 2.2, Wavelets and Scaling: Theory, discusses in detail the key properties
of the wavelet coef®cients of scaling processes It starts with a brief, yet precise,introduction to the continuous and discrete wavelet transforms, to the multiresolutionanalysis theory underlying the latter, and the low complexity decompositionalgorithm made possible by it It recalls concisely the de®nitions of two of themain paradigms of scalingÐself-similarity and long-range dependence The proper-ties of the wavelet coef®cients of self-similar, long-range-dependent, and fractalprocesses are then given, and it is shown how the analysis of these various kinds ofscaling can be gathered into a single framework within the wavelet representation.Extensions to more general classes of scaling processes requiring a collection ofscaling exponents, such as multifractals, are also discussed
The aim of Section 2.3, Wavelets and Scaling: Estimation, is to indicate how andwhy this wavelet framework enables the ef®cient analysis of scaling processes This
is achieved through the introduction of the logscale diagram, where the key analysistasks of the detection of scalingÐinterpretation of the nature of scaling andestimation of scaling parametersÐcan be performed Practical issues in the use of
Trang 4the logscale diagram are addressed, with references to examples from real traf®c dataand arti®cially generated traces De®nitions, statistical performance, and pertinentfeatures of the estimators for scaling parameters are then studied in detail Thelogscale diagram, ®rst de®ned with respect to second-order statistical quantities, isthen extended to statistics of other orders It is also indicated how the tool allows forand deals with situations=processes departing from pure scaling, such as super-imposed deterministic nonstationarities Finally, clear connections between thewavelet tool and a number of more classical statistical tools dedicated to the analysis
of scaling are drawn, showing how the latter can be pro®tably generalized in theirwavelet incarnations
Section 2.4, Wavelet and Scaling: Synthesis, proposes a wavelet-based synthesis
of the fractional Brownian motion It shows how this process can be naturally andef®ciently expanded in a wavelet basis, allowing, provided that the wavelets aresuitably designed, its accurate and computationally ef®cient implementation.Finally, in Section 2.5, Wavelets and Scaling: Perspectives, a brief indication isgiven of what may lay ahead in the broad land of scaling and wavelets
2.2 WAVELET AND SCALING: THEORY
2.2.1 Wavelet Analysis: A Brief Introduction
2.2.1.1 The (Continuous) Wavelet Transform The continuous wavelet position (CWT) consists of the collection of coef®cients
decom-fTX a; t hX ; ca;ti; a 2 R; t 2 Rgthat compares (by means of inner products) the signal X to be analyzed with a set ofanalyzing functions
Trang 5the scale of time (or, equivalently, the range of frequencies) over which it will beobserved The quantity jTX a; tj2, referred to as a ``scalogram,'' can therefore beinterpreted as the energy content of X around time t within a given range offrequencies controlled by a In addition to being well localized in both time andfrequency, the mother wavelet is required to satisfy the admissibility condition,whose weak form is
which shows it is a bandpass or oscillating function, hence the name ``wavelet.''Wavelets that are often used in practice include the Haar wavelet, the Daubechieswavelets, indexed by a parameter N 1; 2; ; and the Meyer wavelets The Haarwavelet c0 u is discontinuous; it equals 1 at 0 u <1
2, 1 at 1
2 u 1, and 0otherwise The Daubechies wavelet with N 1 is in fact that the Haar wavelet, butthe other Daubechies wavelets with N > 1 are continuous with bounded support andhave N vanishing moments (i.e., they satisfy Eq (2.5)) The Meyer wavelets do nothave bounded support, in neither the time nor frequency domain, but all theirmoments vanish and they belong to the Schwartz space; that is, they are in®nitelydifferentiable and decrease very rapidly to 0 as u tends to 1
On the condition that the wavelet be admissible, the transform can be inverted:
X t Cc
TX a; tca;t tda dta2where Ccis a constant depending on c0 This reconstruction formula expresses X interms of a weighted integral of wavelets (acting as elementary atoms) located aroundgiven times and frequencies, thereby constituting quanta of information in the time±frequency plane For a more general presentation of the wavelet analysis see, forexample, Daubechies [24]
Because the wavelet transform represents in a plane (i.e., a two-dimensional (2D)space) the information contained in a signal (i.e., one-dimensional (1D) space), it is aredundant transform, which means that neighboring coef®cients in the time±scaleplane share a certain amount of information A mathematical theory, the multi-resolution analysis (MRA), proves that it is possible to critically sample the time±scale plane, that is, to keep, among the fTX a; t, a 2 R, t 2 Rg, only a discrete set
of coef®cients while still retaining the total information in X That procedure de®nesthe so-called discrete (or nonredundant) wavelet transform
2.2.1.2 Multiresolution Analysis and Discrete Wavelet Transform A lution analysis (MRA) consists of a collection of nested subspaces fVjgj2Z, satisfyingthe following set of properties [24]:
Trang 6L2 R Property 4 expresses the fact that the set of shifted scaling functions
ff0 t k, k 2 Zg form a ``Riesz basis'' for V0; that is, they are linearly independentand span the space V0, but they are not necessarily orthogonal nor do they have to be
of unit length Finding such a function f0 t is hard, but many candidates for f0 tare known in the literature
Similarly, Properties 3 and 4 together imply that the scaled and shifted functions
ffj;k t 2 j=2f0 2 jt k; k 2 Zgconstitute a Riesz basis for the space Vj The multiresolution analysis involvessuccessively projecting the signal X to be studied into each of the approximationsubspaces Vj:
approxj t ProjVjX t P
k aX j; kfj;k t:
Since, from Property 2, Vj Vj 1, approxj is a coarser approximation of X than isapproxj 1 (Note that some authors use the opposite convention and set Vj Vj1:Property 1 moreover indicates that in the limit of j ! 1, all information isremoved from the signal The key idea of the MRA, therefore, consists in studying asignal by examining its coarser and coarser approximations, by canceling more andmore high frequencies or details from the data
The information that is removed when going from one approximation to the next,coarser one is called the detail:
detailj t approxj 1 t approxj t:
The MRA shows that the detail signals detailj can be obtained directly fromprojections of X onto a collection of subspaces, the Wj Vj Vj 1, called thewavelet subspaces Moreover, the MRA theory shows that there exists a function c0,called the mother wavelet, to be derived from f0, such that its templates
fcj;k t 2 j=2c0 2 jt k; k 2 Zgconstitute a Riesz basis for Wj:
detailj t ProjWjX t P
k dX j; kcj;k t:
For example, if the scaling function f0 t is the function that equals 1 if 0 t 1and 0 otherwise, then the corresponding mother wavelet c0 u is the Haar wavelet
Trang 7Theoretically, this projection procedure can be performed from j ! 1 up to
j ! 1 In practice, one limits the range of indices j to j 0; ; J and thus onlyconsiders
VJ VJ 1 V0:This means that we restrict the analysis of X to that of its (orthogonal) projectionapprox0 t onto the reference space V0, labeled as zero by convention, and rewritethis ®ne scale approximation as a collection of details at different resolutionstogether with a ®nal low-resolution approximation that belongs to VJ:
to band-limit a process prior to sampling Note, however, that there is no additionalinformation loss after the initial projection Varying J simply means deciding if more
or less information is written in details as opposed to the ®nal approximationapproxJ
Since the approxj are essentially coarser and coarser approximations of X , f0needs to be a lowpass function The detailj, being an information ``differential,''indicates rather that c0is a bandpass function, and therefore a small wave, a wavelet.More precisely, the MRA shows that the mother wavelet must satisfy c0 t dt 0[24]
Given a scaling function f0 and a mother wavelet c0, the discrete (or redundant) wavelet transform (DWT) consists of the collection of coef®cients
non-X t ! ffaX J; k; k 2 Zg; fdX j; k; j 1; ; J; k 2 Zgg: 2:3These coef®cients are de®ned through inner products of X with two sets offunctions:
aX j; k hX ; fj;ki;
where cj;k (resp., cj;k are shifted and dilated templates of fc (resp., c0), called thedual mother wavelet (resp., the dual scaling function), and whose de®nition depends
on whether one chooses to use an orthogonal, semiorthogonal, or biorthogonal DWT
Trang 8(e.g., see Daubechies [24]) In Eqs (2.2) and (2.4), the role of the wavelet and itsdual can arbitrarily be exchanged, and similarly for the scaling function and its dual.
In what follows this exchange is performed for simplicity of notation The dX j; kconstitute a subsample of the fTX a; t, a 2 R, t 2 Rg, located on the so-calleddyadic grid,
dX j; k TX 2j; 2jk:
The logarithm (base 2) of the scale log2 a 2j j is called the octave j, and a scalewill often be referred to by its corresponding octave For the sake of clarity, wehenceforth restrict our presentation to the DWT (characterized by the dX j; k,which brings with it considerable computational advantages However, the funda-mental results based on the wavelet approach hold for the CWT; see Abry et al.[3, 4]
2.2.1.3 Key Features of the Wavelet Transform In the study of the scalingprocesses analyzed below, the following two features of the wavelet transform playkey roles:
F1: The wavelet basis is constructed from the dilation (change of scale)operator, so that the analyzing family itself exhibits a scale-in-variance feature
F2: c0 has a number N 1 of vanishing moments:
tkc0 t dt 0; k 0; 1; 2; ; N 1: 2:5The value of N can freely be chosen by selecting the mother wavelet c0accordingly.The Fourier transform C0 n of c0 satis®es jC0 nj jnjN, jnj ! 0 [24]
2.2.1.4Fast Pyramidal Algorithm In all of what follows, we always assume that
we are dealing with continuous time stochastic processes, and therefore that thewavelet (and approximation) coef®cients are de®ned through continuous time innerproducts (Eq (2.4)) One major consequence of the nested structure of the MRAconsists in the fact that the dX j; k and the aX j; k can actually be computedthrough a discrete time convolution involving the sequence aX j 1; k and twodiscrete time ®lters h1 and g1 The DWT can therefore be implemented using arecursive ®lter-bank-based pyramidal algorithm, as sketched on Fig 2.1, which has alower computational cost than that of a fast Fourier transform (FFT) [24] Thecoef®cients of the ®lters h1 and g1are to be derived from f0 and c0 [24] The use
of the discrete time algorithm to compute the continuous time inner products
dX j; k hX ; cj;ki requires an initialization procedure It amounts to computing
an initial discrete time sequence to feed the algorithm (see Fig 2.1):
aX 0; k hX ; f0;ki, which corresponds to the coef®cients of the expansion ofthe projection of X on V0 From a practical point of view, one deals with sampled
Trang 9versions of X , which implies that the initialization stage has to be approximated.More details can be found in Delbeke and Abry [27] and Veitch and Abry [75] Thefast pyramidal algorithm is not only scalable because of its linear complexity, O nfor data of length n, but is simple enough to implement on-line and in real time inhigh-speed packet networks An on-line wavelet-based estimation method for thescaling parameter with small memory requirements is given by Roughan et al [62].
2.2.2 Scaling Processes: Self-Similarity and Long-Range Dependence
We can de®ne scaling behavior broadly as a property of scale invariance, that is,when there is no controlling characteristic scale or, equivalently, when all scales haveequal importance There is no one simple de®nition that can capture all systems orprocesses with this property; rather there are a set of known classes open to
h
x
h
x x
x
x x
Fig 2.1 Fast ®lter-bank-based pyramidal algorithm The DWT can be computed using a fastpyramidal algorithm: that is, given that we have approximation aX j 1; k at level j 1, weobtain approximation aX j; k and detail dX j; k at level j by convolving with h1 and g1,respectively, and decimating The coef®cients of the ®lters h1and g1are derived from thechosen scaling function and wavelet f0and c0 The downarrow stands for a decimation by afactor of 2 operation: one drops the odd coef®cients An initialization step is required to gofrom the process X to the approximation of order 0: aX 0; k
Trang 10expansion In this section we brie¯y introduce the most well known of these, namely,self-similar, self-similar with stationary increments, and long-range-dependentprocesses Please note that throughout this chapter we will use the followingconvention: f x g x as x ! a means that limx!af x=g x 1, and
f x g x as x ! a means that limx!af x=g x C, where C is some ®niteconstant
Recall that a process X fX t, t 2 Rg is self-similar with parameter H > 0 H-ss if X 0 0 and fX ct, t 2 Rg and fcHX t, t 2 Rg have the same ®nite-dimensional distributions Such a process, obviously, cannot be stationary Theprocess X is H-sssi if it is H-ss and if, in addition, it has stationary increments, that
is, if the ®nite-dimensional distributions of its increments fX t h X t, t 2 Rg
do not depend on t An H-sssiprocess with H < 1 has zero mean and a variance thatbehaves as EX2 t s2jtj2H The fractional Brownian motion (FBM), for example,
is the (unique) Gaussian H-sssi process, which is simply Brownian motion for
2< H < 1, with H and a related through
In particular, fractional Gaussian noise (FGN), which is the increment process offractional Brownian motion3(FBM) [50] with 1
2< H < 1, has long-range
depen-1 Long-range dependence is sometimes referred to as ``long memory'' or ``second-order asymptotic similarity.''
self-2 The index f indicates that this constant is in force in the frequency domain The corresponding constant appearing in the autocovariance is denoted c r One can also replace these constants by slowly varying functions but for the sake of simplicity, we will not do this here.
3 Discrete standard FGN is the time series X j BH j 1 BH j, j 0; 1; ; where BHis FBM Its spectral density satis®es GX n GX n, and because it is a discrete-time sequence, GX n is concentrated on the interval [ 1 ; 1 .
Trang 11dence FGN is close to an ``ideal'' model because its spectral density is-close to
n1 2H n a for a large range of frequencies n in the interval [0,1
2], and because itscorrelation function,
r k 1
is invariant under aggregation (see Section 2.3.5.1)
We now recall the properties of the wavelet coef®cients of H-sssiprocesses (such
as FBM) and LRD processes (such as FGN) and show that they can be gathered into
a uni®ed framework We subsequently show that other stochastic processes ing scaling behavior also ®t into this framework, opening up the prospect of a singleapproach covering diverse forms of scaling
exhibit-2.2.3 Wavelet Transform of Scaling Processes
2.2.3.1 Discrete Wavelet Transform of Stochastic Processes Whereas the let theory was ®rst established for deterministic ®nite-energy processes, it has clearlybeen demonstrated in the literature that the wavelet transform can be applied tostochastic processes; for example, see Cambanis and Houdre [20] and Masry [49].More speci®cally, for the second-order random processes of interest here, it is wellknown that the wavelet transform is a second-order random ®eld, on the conditionthat the scaling function f0 (and hence the wavelet c0) satisfy certain mildconditions [20, 49] related to the covariance structure of the analyzed process Wewill assume hereafter that the scaling functions and wavelets decay at leastexponentially fast in the time domain, so that the second-order statistics of thewavelet transform exist for all of the random processes we discuss here
wave-2.2.3.2 Wavelet Transform (WT) of H-ss and H-sssi Processes Let X be an H-ssprocess Its wavelet coef®cients dX j; k exactly reproduce the self-similaritythrough the following central scaling property; see Delbeke [25] and Delbeke andAbry [26] or Pesquet-Popescu [57]:
P0 SS: For the DWT, dX j; k hX ; cj;ki, so that
dX j; 0; dX j; 1; ; dX j; Nj 1
For the CWT, TX a; t hX ; ca;ti, and hence
TX ca; ct1; ; TX ca; ctn d cH1=2 TX a; t1; ; TX a; tn; 8c > 0:
Trang 12These equations mimic the self-similarity of the process Let us emphasize thatthis, nontrivially, results from the fact that the analyzing wavelet basis isdesigned from the dilation operator and is therefore, by nature, scale invariant(F1) For second-order processes, a direct consequence of Eq (2.10) is
EdX j; k2 2j 2H1EdX 0; k2: 2:11Moreover, if we add the requirement that X has stationary increments (i.e., X isH-sssi), ingredients F1 and F2 combine, resulting in:
P1 SS: The wavelet coef®cients with ®xed scale index fdX j; k; k 2 Zg form
a stationary process
This follows from the stationary increments property of the analyzedprocesses [20, 25, 49] This property is not trivial, given that self-similarprocesses are nonstationary processes, and is a consequence of N 1 (F2) Inthis case, Eq (2.11) reduces to the fundamental result:
EdX j; k2 2j 2H1C H; c0s2; 8k; 2:12with C H; c0 jtj2H c0 uc0 u t du dt and s2 EX 12
P2 SS: Using the speci®c covariance structure of an H-sssiprocess X t,namely,
EX tX s s22fjtj2H jsj2H jt sj2Hg; 2:13
it can be shown [32, 73] that the correlations between wavelet coef®cientslocated at different positions is extremely small as soon as N H 1
2and theirdecay can be controlled by increasing N:
EdX j; k dX j0; k0 j2jk 2j 0
k0j ! 1: 2:14These two results have been obtained and illustrated originally in the case of theFBM [31±34] (see also Tew®k and Kim [73]) and have been stated in more generalcontexts [20, 25, 26, 49]
2.2.3.3 WT of LRD Processes Let X be a second order stationary process, itswavelet coef®cients dX j; k satisfy the following:
P0 LRD:
EdX j; k2
GX n2jjC0 2jnj2dn 2:15
Trang 13where GX n and C0 n stand for the power spectrum of X and the Fouriertransform of c0, respectively This can be understood as the classical inter-ference formula of the linear ®lter theory and receives a spectral estimationinterpretation: EdX j; k2 is a measure of GX at frequency nj 2 jn0(n0 depends on c0) through the constant relative bandwidth wavelet ®lter[1±3, 34].
In the speci®c context of LRD processes, F1 and F2 together yield the twofollowing key properties:
P1 LRD: Using GX n cfjnj a, n ! 0 (2.15), we obtain
EdX j; k2 2jacfC a; c0; j ! 1; 2:16where C a; c0 jnj ajC0 nj2 dn, a 2 0; 1 The case of a 0 is wellde®ned, corresponding to trivial scaling at large scales, leaving only short-range dependence at small scales Again, this asymptotic recovering of theunderlying power law is not a trivial result It would not, for instance, beobtained with periodogram-based estimates [3] and is due to F1
P2 LRD: It can also be shown [3] that the covariance function of any twowavelet coef®cients is controlled by N and therefore can decay much fasterthan that of the LRD process itself and is no longer LRD as soon as N a=2.Since a 2 0; 1, this is in fact always satis®ed
EdX j; k dX j0; k0 j2jk 2j 0
k0j ! 1: 2:17Observe that the exponents in P1 LRD and P2 LRD are different from those in P1
SS and P2 SS, respectively
2.2.3.4WT of Generalized Scaling Processes The results above can be ized in a straightforward manner to processes that are neither strictly H-sssinor LRDbut whose wavelet coef®cients share equivalent scaling properties Some importantcases are detailed here
general- Start with a H-sssiprocess X , and de®ne Y as
p > 0 increment process of Y if Z t Y p 1 t 1 Y p 1 t and
Trang 14because an H-sssi process (i.e., with 0 < H < 1 is not differentiable, whereasits integrals are) Then, properties P1 SS and P2 SS still hold replacing H by
HY The condition for P1 SS becomes N p 1 [10] and can be rewritten as
N HY [10] We hereafter say that X i s an H-sssi p process if it is H-ss andhas stationary increments of order p 1 Note that with this de®nitionH-sssi p 0 and H-sssiare equivalent
Let X be a second-order stationary 1=f -type process; that is, GX n cfjnj a,
n1 jnj n2, a 0 Note that the term 1=f implicitly implies the physicistpoint of view, where the power-law behavior is supposed to hold for a widerange of frequencies, that is, n1 n2 Recall that the mother wavelet is abandpass function whose frequency content is essentially concentrated between
nAand nBand negligible elsewhere, if nonzero In the case of 1=f processes, it
is therefore assumed that jn2 n1j jnB nAj We henceforth have
is ®nite, but it is generally valid to an excellent approximation 1=f -typeprocesses with a < 1 and n1 0 can be seen as the special case of LRDprocesses Note that the de®nition of 1=f processes naturally extends to include
a < 0
Let X be such that GX n cfjnj a, n ! 0, a 0 For a 1, the variance doesnot exist (the integral of the spectrum diverges) X can, however, be seen as ageneralized second-order stationary 1=f -type process, in the sense that thevariance of the wavelet coef®cients remains ®nite,
EdX j; k2
GX n2jjC0 2jnj2dn 2jacf
jnj ajC0 nj2dn < 1;
on condition that N > a 1=2 This is possible as the power-law decrease ofthe spectrum of the wavelet at the origin jC0 nj nN, jnj ! 0 balances thedivergence of GX n (see Abry et al [3, 4] for details) Then, just as before, wehave EdX j; k2 2jacfC a; c0, j ! 1
Let X be such that GX n cfjnj a, n ! 1, a 1, (i.e., n2 1) Itsautocovariance function reads EX tX t t s2 1 Cjtj2h, t ! 0, with
h a 1=2 Equivalently, it implies that E X t t X t2 jtj2h,
t ! 0 If X is moreover Gaussian, this implies that the sample path ofeach realization of the process is fractal, with fractal dimension (strictlyspeaking Hausdorff dimension) D 5 a=2 [28] This means that the localregularity of the sample path of the process or, equivalently, its local correla-tion structure exhibits scaling behavior Such processes are called fractal
Trang 15Fractality is reproduced in the wavelet domain (generalization of P1)through EdX j; k2 2j 2h1, j ! 1, or equivalently for the CWT:EjTX a; tj2 a2h1, a ! 0 [35, 26], which allows an estimation of the fractaldimension through that of the scaling exponent a 2h 1 5 2D.
2.2.3.5 Summary for Scaling Processes Let X be either an H-sssi p process, or
a LRD process, or a (possibly generalized) second-order stationary 1=f -type process
or a fractal process Then the wavelet coef®cient, due to the combined effects of F1and F2, will exhibit the two following properties, which will play a key role in theestimation of the scaling exponent presented below:
P1: The fdX j; k; k 2 Zg is a stationary process if N a 1=2 and thevariance of the dX j; k accurately reproduces, within a given range of octaves
j1 j j2, the underlying scaling behavior of the data:
EdX j; k2 2jacfC a; c0; 2:18where
(i) in the case of an H-sssi(p) process, a 2H 1, C a; c0 i s to beidenti®ed from Eq (2.12), and j1 1 and j2 1;
(ii) in the case of an LRD process, a is de®ned as in Eq (2.6), C a; c0 i s to
be identi®ed from Eq (2.16), and j2 1 and j1is to be identi®ed fromthe data;
(iii) in the case of a (generalized) second-order stationary 1=f -type process,
a is de®ned from GX n cfjnj a, n1 jnj n2, C a; c0
jnj ajC0 nj2 dn, and j1; j2 are to be derived from n1; n2;
(iv) in the case of a fractal process, a 2h 1, expressions for C a; c0 can
be found in Flandrin and GoncËalveÁs [35, 36] and j1 1 and j2 i s to beidenti®ed from the data
P2: fdX j; k, k 2 Zg is stationary and no longer exhibits long-range statisticaldependences but only short-term residual correlations; that is, it is short-rangedependent (SRD) and not LRD, on condition that N a=2 Moreover, thehigher N the shorter the correlation:
Trang 16The relevance of this idealization has already been illustrated by, for instanceAbry et al [3], Abry and Veitch [5], and Flandrin [32, 33], and will play a key role inthe next section.
2.2.3.6 Multiple Exponents, Multifractal Processes Property P1 (wavelet duction of the power law) extends further to classes of generalized scaling processeswhose behavior cannot be described by a single scaling exponent, but which requires
repro-a collection, even repro-an in®nite collection, of exponents We brie¯y describe threeclasses of examples
The ®rst example is in the spirit of the simple fractal processes described inSection 2.2.3.4 Consider a generalization where the exponent h, which describesthe statistics of local scaling properties, is no longer constant in time:
E X t t X t2 jtj2h t, t ! 0 One consequence is that the local regularity
of sample paths is no longer uniform but depends on t A class of processes calledmultifractional Brownian motion has been proposed [56], which satis®es such aproperty, with h being a continuous function of t As detailed in Flandrin andGoncËalveÁs [35, 36] the time evolution of h can be traced through an analysis of thecontinuous wavelet transform coef®cients at small scales: EjTX a; tj2 a2h t1,
a ! 0 This relation is to be understood as a time-dependent generalization of P1.The second class, multifractal processes, is one that allows an extremely richscaling structure at small scales, far richer than simply fractal in general There is notthe space here to give precise de®nitions of such processes, nor of the relatedmultifractal formalism We aim rather to give some intuition of their relation towavelets and refer the reader to Riedi [59] and Riedi et al [60] and to Chapter 20 ofthe present volume, and references therein, for a thorough presentation For multi-fractal processes, the local regularity of almost every (i.e., with probability one)sample path, which we write as jX o; t t X o; tj jtjh o;t, t ! 0 (where odenotes an element of the probability space underlying the process), exhibits anextraordinary variability over time; indeed, it is itself fractal-like One thereforeabandons the idea of following the time variations of h, since this is realizationdependent and in any case is too complex, and instead studies it statistically.Classically this has been done through the Hausdorff multifractal spectrum
D h, which consists of the Hausdorff dimension of the set of points where
h o; t h The same multifractal spectrum is obtained for almost all realizationsand is therefore a useful invariant describing the scaling properties of the process
A classical tool to obtain the multifractal spectrum is to calculate, from anytypical sample path, the structure functions or partion functions: Sq t
Trang 17because it is far more numerically accessible The connection between multifractalsand wavelets arises from the fact that the increments involved in the study of thelocal regularity of a sample path can be seen as simple examples of waveletcoef®cients [52] It has therefore been proposed heuristically [52] to replaceincrements by wavelet coef®cients in the partition functions and shown theoreticallythat, in some cases, the multifractal formalism can be based directly on waveletcoef®cients [16, 42, 60] For the Legendre multifractal spectrum, this amounts tousing wavelet-based partition functions that exhibit, for small scales, power-lawbehavior:jTX o a; tjqdt az o;qq=2, a ! 0 This last relation can be thought of
as a generalization of P1 to statistics of order both above and below 2 In addition, it
is important to understand that even though the relation describes a property of asingle (typical) realization, it deals directly with the object z o; q central to thedescription of the scaling, and not to an estimator of it This is in contrast to self-similar processes, for example, and the fractal class of the previous paragraph, wherethe fundamental scaling relations and exponents are de®ned at the level of theensemble Such a change of perspective is meaningful for multifractals as almost allrealizations yield a common function z q Finally, let us note that more re®nedwavelet-based partition functions have been proposed to overcome various dif®cul-ties arising in signal processing; the reader is referred to Bacry et al [16] and Muzy
et al [52]
The third example is that of multiplicative cascades, a paradigm introduced byMandelbrot [51] in 1974 It involves a recursive procedure whereby an initial mass isprogressively subdivided according to a geometric rule and assigned to subsets of aninitial set, typically an interval It provides a powerful tool to de®ne multifractalprocesses and was originally considered as a natural synthesis procedure for them.Indeed, cascade-based methods of generating multifractals have been the preferredoption thus far in teletraf®c applications (see Chapter 15) However, the in®nitelydivisible model proposed by Castaing et al [21] shows that multiplicative cascadeprocesses can also very effectively model scaling phenomena in other cases, evenwhere the scaling is barely observable in the time domain Again, the wavelet toolhas proved useful for the analysis of such situations, as comprehensively detailed byArneÂodo et al [14, 15] This tool has been applied, for instance, in the study ofturbulence [22, 63]
2.2.3.7 Processes with In®nite Second-Order Statistics: a-Stable Processes Theexistence of the wavelet coef®cients, the extensions of P0 SS, P1 SS, and P2 SS, toH-sssi processes without second-order statistics, such as a-stable processes, forinstance, have recently been obtained [25, 26, 58] (see also, Pesquet-Popescu [57])but will not be detailed here
2.3 WAVELETS AND SCALING: ESTIMATION
In this section it is shown in detail how the statistical properties of the wavelet detailcoef®cients, summarized in the previous section in the form of properties P1 and P2,
Trang 18can be applied to the related tasks of the detection, identi®cation, and measurement
of scaling The estimation of scaling exponents, ``magnitude of scaling'' parameters,and the multifractal spectrum are discussed Practical issues in the use of theestimators are addressed and comparisons are made with other estimation methods.Robustness of different kinds is also discussed It is shown how wavelet methodsallow statistics other than second order to be analyzed, with applications in theidenti®cation of self-similar and multifractal processes It is explained how thewavelet framework allows a reinterpretation and a fruitful extension of the naturalidea of aggregation in the study of scaling It is shown how the Allan variance, aneffective time domain estimator of scaling, belongs in fact to this framework Finally,
it is shown how the same analysis methods can be applied to the measurement ofgeneralized forms of the Fano factor, a well-known descriptor of the burstiness ofpoint processes
2.3.1 An Analysis Tool: TheLogscaleDiagram
2.3.1.1 The Legacy of P1 and P2 Property P2 is the key to the statisticaladvantages of analysis in the wavelet domain In sharp contrast to the problematicstatistical environment in the time domain due to the long-range dependence, non-stationarity, or fractality of the original process X t, in the wavelet domain we needonly deal with the stationary, short-range-dependent (SRD) processes dX j; foreach j (Due to the admissibility condition of the mother wavelet these processeseach have zero mean.) The stationarity allows us to meaningfully average across
``time'' within each process to reduce variability The short-range dependence results
in these average statistics having small variance An example of central importancehere is given by
mjn1j
Pn j
where nj is the number of coef®cients at octave j available to be analyzed Therandom variable mj is a nonparametric, unbiased estimator of the variance of theprocess dX j; Despite its simplicity, because of the short-range dependence thevariance of mjdecreases as 1=njand it is in fact asymptotically ef®cient (of minimalvariance) The variable mj can therefore be thought of as a near-optimal way ofconcentrating the gross second-order behavior of X at octave j Furthermore, againfrom P2, the mjare themselves only weakly dependent, so the analysis of each scale
is largely decoupled from that at other scales To analyze the second-orderdependence of X t on scale, therefore, we are naturally led to study mjas a function
of j
Property P1 now enters by showing explicitly, in the case of scaling, theunderlying power-law dependence in j of the variance (second moment) of theprocesses at each scale, of which the mj are estimates The importance of P1 is thatits pure power-law form suggests that the scaling exponent a could be extracted
Trang 19simply by considering the slope in a plot of log2 mj against j Here it is essential tounderstand that, although log±log plots are a natural and familiar tool wheneverexponents of power laws are at issue, using them as a basis for semiparametricestimation of the exponent is only effective statistically if properties equivalent toP1±P2 hold This is typically not the case For example, for the correlogramÐa timedomain semiparametric estimator [17] based on direct estimation of the covariancefunctionÐcovariance estimates at ®xed lag are biased, resulting in bias in theexponent estimate Furthermore, across lags the covariance estimates are stronglycorrelated, resulting in misleadingly impressive ``straight lines'' in the log±log plot,which in reality are symptomatic of high variance in the resulting estimates Inaddition to these issues, the complication that in general Elog 6 log E isoverlooked in the correlogram and in many other estimators based on log±log plots.For simplicity of presentation we set yj log mj for the moment but address thisre®nement in the estimation section below We now introduce a wavelet-basedanlaysis tool, the logscale diagram, which exploits the key properties P1 and P2 andserves as an effective and intuitive central starting point for the analysis of scaling.De®nition 2.3.1 The (second-order) logscale diagram (LD) consists of the graph
of yjagainst j, together with con®dence intervals about the yj
Examples of logscale diagrams analyzing synthesized scaling data are given inFig 2.2, where the plot on the left is of a LRD series, and that on the right side of aself-similar series It follows from the nature of the dilation operator generating thewavelet basis that the number nj of detail coef®cients at octave j halves with eachincrease in j (in practice the presence of border effects results in slightly lowervalues) Con®dence intervals about the yjtherefore increase monotonically with j asone moves to larger and larger scales, as seen in each of the diagrams in Fig 2.2 Theexact sizes of these intervals depend on details of the process and in practice arecalculated using additional distributional and quasi-decorrelation assumptions Ifnecessary they could also be estimated from data
Generalizations to the qth-order logscale diagrams can be de®ned, q > 0, wherethe second moment of the details in Eq (2.19) is replaced by the qth Here we mainlyconcentrate on the second-order logscale diagram or simply ``logscale diagram,''both as an illustrative example and because it is the most important special case,being central for LRD and 1=f processes by de®nition, de®nitive for Gaussianprocesses, and suf®cient for exactly self-similar processes Like any second-orderapproach, it is of course insuf®cient for processes whose second moments do notdetermine all the properties of interest We discuss this further in Section 2.4.3 in theparticular context of multifractals
The logscale diagram is ®rst of all a means to visualize the scale dependence ofdata with a minimum of preconceptions Scaling behavior is not assumed butdetected, through the region(s) of alignment, if any, observed in the log±log plot By
an alignment region we mean a range of scales where, up to statistical variation, the
yj fall on a straight line Estimation of scaling parameters, if relevant, can then beeffectively performed through weighted linear regression over the region(s) Finally,
Trang 20the identi®cation of the kind of scaling is made by interpreting the estimated value inthe context of the observed range These different aspects of the aims and use of thelogscale diagram are expanded upon next.
2.3.1.2 The Detection of Scaling A priori it is not known over which scales, ifany, a scale-invariant property may exist By the detection of scaling in the logscalediagram we mean the identi®cation of region(s) of alignment and the determination oftheir lower and upper cutoff octaves, j1 and j2, respectively, which are taken tocorrespond to scaling regimes In a sense this is an insoluble problem, as scaling oftenoccurs asymptotically or has an asymptotic de®nition, with no clear way to de®ne how
a scaling range begins or ends Nonetheless experience shows that good estimates arepossible Note the semantic difference between the term scaling region or range, atheoretical concept that refers to where scaling is truly present (an unknown in realdata), and alignment region or range, an estimation concept corresponding to what isactually observed in the logscale diagram for a given set of data
The ®rst essential point here is that the concept of alignment is relative to thecon®dence intervals for the yj, and not to a close alignment of the yj themselves.Indeed, an undue alignment of the actual estimates yj indicates strong correlationsbetween them, a highly undesirable feature typical of time domain log±log basedmethods such as variance±time plots As mentioned earlier the mj, and hence the
yj, are weakly dependent, resulting in a natural and desirable variation around
Fig 2.2 Logscale diagrams Left: An example of the yjagainst j plot and regression line for aLRD process with strong short-range dependence The vertical bars at each octave give 95%con®dence intervals for the yj The series is simulated FARIMA (0; d; 2) with d 0:25 andsecond-order moving average operator C B 1 2B B2, implying a; cf 0:50; 6:38.Alignment is observed over scales j1; j2 4; 10, and a weighted regression over this rangeallows an accurate estimation despite the strong short-range dependence: ^a 0:55 0:07,
^cf 6:0 with 4:5 < ^cf < 7:8 The scaling can be identi®ed as LRD as the value is in thecorrect range, ^a 2 0; 1, and the alignment region includes the largest scales in the data.Right: Alignment is observed over the full range of scales with ^a 2:57, corresponding to
^H 0:79, consistent with the self-similarity of the simulated FBM (H 0:8) series analyzed
Trang 21the calculated regression line as seen, for example, in Fig 2.2 Using weightedregression incorporates the varying con®dence intervals into the estimation phase;however, the selection of the range of scales de®ning the alignment region is prior tothis, and great care is required to avoid poor decisions.
We now discuss the selection, in practice, of the cutoff scales j1 and j2 Apreliminary comment is that for the regression to be well de®ned at least two scalesare required, for a Chi-squared goodness of ®t test three, and in practice four, areneeded before any estimate can be taken seriously: it is simply too easy for threepoints to align fortuitously if the con®dence intervals are not very small A usefulheuristic in the selection of a range is that the regression line should cut, or nearly so,each of the con®dence intervals within it This can help avoid the following twoerrors: (1) the nondetection of an alignment region due to the apparently wildvariation of the yj, when in fact to within the con®dence intervals the alignment isgood (this typically occurs when the slope is small, such as in the right-hand plot inFig 2.4, as the vertical scale on plots is reduced, increasing the apparent size ofvariations), and (2) the erroneous inclusion of extra scales to the left of an alignmentregion, since to the eye they appear to accurately continue a linear trend, whereas infact the small con®dence intervals about the yj for small j reveal that they departsigni®cantly from it
The above heuristic can be formalized somewhat by a Chi-squared goodness of ®ttest [9], where the critical level of the goodness of ®t statistic is monitored as afunction of the endpoints of the alignment range At least in the case of the lowerscale, this can make a very clear and relatively objective choice of cutoff possible,eliminating the error of type (2) above An example of this is afforded by the left-hand plot of Fig 2.2, where the octave j 3, if included, results in a drop of the Chi-squared goodness of ®t of several orders of magnitude! An even subtler example isthat of the right-hand plot in Fig 2.3, where j 2 was excluded from the alignmentregion for the same reason, whereas in the left-hand plot in the same ®gure it is cleareven to the eye that, given the small size of the con®dence interval about octave
j 8, it should not be included Further work is required to develop reliableautomated methods of cutoff scale determination This is especially true for uppercutoff scales, where the dif®culties are compounded by a lack of data On the otherhand, at smaller scales the technical assumptions used in the calculation of thecon®dence intervals (see below) may be less reliable, whereas at large scales the dataare highly aggregated and therefore Gaussian approximations are reasonable
2.3.1.3 The Interpretation of Scaling By the interpretation of scaling we meanthe identi®cation of the kind of underlying scaling phenomenonÐLRD, H-ss, and soonÐgenerating the observed alignment in the logscale diagram The task is themeaningful interpretation of the estimated value of a in the context of the range ofscales de®ning the alignment region, informed where possible by other known orassumed properties of the time series such as stationarity It is in fact partly aquestion of model choice, and there may be no unique solution We now consider,nonexhaustively, a number of important cases
Trang 22If an estimate of the scaling exponent a is found to lie in (0, 1), and the range ofscales is from some initial value j1up to the largest one present in the data, then thescaling could be said to correspond to long-range dependence with a scalingexponent that is simply the measured a Examples are afforded by the left-handplots in each of Figs 2.2, and 2.3 If there were a priori physical reasons to believethat the data were stationary, then long-range dependence would be an especiallyrelevant conclusion This applies to the left-hand plots in Fig 2.3, as the seriescorresponds to successive interarrival times of Ethernet packets, which under steadytraf®c conditions one would expect to be stationary (the Ethernet data in Fig 2.3 isfrom the ``pAug'' Bellcore trace [43]).
Another key example, illustrated in the right-hand plots in Figs 2.2 and 2.3, is avalue of a greater than 1 but also measured over a range including the largest scales.Such a value precludes long-range dependence and may indicate that a self-similar orasymptotically self-similar model is required, implying that the data are nonsta-tionary The exponent would then be reexpressed as H a 1=2, the Hurstparameter Again conclusions should be compared with a priori physical reasoning.The right-hand plot in Fig 2.3 is the analysis of a cumulative work process forEthernet traf®c, that is, the total number of bytes having arrived by time t Such aseries is intrinsically nonstationary, though under steady traf®c conditions one wouldexpect it to have stationary increments Thus a conclusion of nonstationarity is anatural one, and the estimated value of ^H 0:80, being in (0,1), indeed corresponds
to an H-sssi process It would have been problematic, however, if underlyingphysical reasons had indicated that in fact stationarity was to be expected Such
an apparent paradox could be resolved in one of two ways It may be that theunderlying process is indeed stationary and exhibits 1=f noise over a wide range ofscales, but that the data set is simply not long enough to include the upper cutoff
Fig 2.3 LRD and H-sssi behavior in Ethernet traf®c data Left: Logscale diagram for thediscrete series of successive interarrival times, showing a range of alignments and an aestimate consistent with long-range dependence Right: Logscale diagram for the cumulativework process (bytes up to time t), consistent with an asymptotically self-similar (close toexactly self-similar) process with stationary increments
Trang 23scale The alternative is to accept that empirical evidence has shown the physicalreasoning concluding stationarity to be invalid.
If, on the other hand, the scaling was concentrated at the lowest scales (highfrequencies), that is, j1 1 with some upper cutoff j2, then the scaling may best beunderstood as indicating the fractal nature of the sample path The observed a shouldthen be reexpressed as h a 1=2, the local regularity parameter Values of h inthe range (0, 1], for example, would then be interpreted as indicating continuous butnondifferentiable sample paths (under Gaussian assumptions [28]), as observed inthe leftmost alignment region in the Internet delay data in the left plot in Fig 2.4.The stationarity or otherwise of the data in such a case may then not be relevant.Note that j 1 has been excluded from the leftmost alignment region in each plot inFig 2.4 This is not in contradiction with interpretations of fractality, as it is knownthat the details at j 1 can be considerably polluted due to errors in the initialization
of the multiresolution algorithm (see Section 2.2.1.4)
If scaling with a > 1 is found over all or almost all of the scales in the data, such
as in the right-hand plot in Fig 2.2, then exact self-similarity could be chosen as amodel, again with H a 1=2 being the relevant exponent However, in this caseone could equally well use the local regularity parameter h a 1=2, with theinterpretation that the fractal behavior at small scales is constant over time andhappens to extend right up to the largest scales in the data
Finally, more than one alignment region is certainly possible within a singlelogscale diagram, a phenomenon that we refer to as biscaling One could imagine,for example, fractal characteristics leading to an alignment at small scales with oneexponent, and long-range dependence resulting in alignment at large scales with aseparate scaling exponent Examples of this phenomenon are shown in Fig 2.4 in
Fig 2.4 Logscale diagrams with biscaling Internet UDP packet data displaying two scalingregimes, examples of biscaling Left: Delay series Regime I on the left (small j's) is related tocontinuous but nondifferentiable sample paths h 2 0; 1 and regime II (large j's) to long-range dependence Right: Loss series (1 for lost packets, else 0) Regime I corresponds todiscontinuous sample paths h < 0 In regime II there is trivial white noise scaling a 0,indicating stationary SRD behavior
Trang 24the context of delays (left) and losses (right) experienced by consecutive userdatagram protocol (UDP) packets sent over the Internet in a regular stream (seeAndren et al [12] for details of such data sets) In both ®gures the alignment at largescales corresponds to long-range dependence, whereas at small scales it is associatedwith highly irregular sample paths Note that when second-order properties areinsuf®cient to fully describe the scaling nature of the process (an extreme example isafforded by multifractals), then the correct interpretation of each branch of thebisecting will require the examination of logscale diagrams across a range of orders,
as discussed in Section 2.3.3
2.3.2 Estimation Within theLogscaleDiagram
In this subsection it is assumed that a scaling range j 2 j1; j2 has been correctlyidenti®ed Sums over j, and regressions, are always taken over this range Theestimators to be de®ned are semiparametric, as they depend on the range of scales
j 2 j1; j2, where scaling is deemed to be present, and the scaling property P1 validthere, but not on any tightly speci®ed parametric model
2.3.2.1 Estimating the Scaling Exponent a Because of property P1, the
measurement of a is reduced to the determination of the slope over the alignmentregion in the logscale diagram A natural way to achieve this in a statisticalestimation context is through linear regression, the de®ning hypothesis of which
is Eyj a j a, where a is a real constant Because in general Elog 6 log E,this condition is not exactly satis®ed, however We therefore introduce smallcorrective deterministic factors g j, discussed below, and rede®ne the yj as
yj log mj g j, so that Eyj aj a by de®nition
Any kind of linear regression of yjon j constitutes an unbiased estimator of a, asthe lack of bias does not require decorrelation between the yj, nor knowledge of theirvariances or distributions A weighted regression where the weights are related to thevariances s2
j of the yj is preferable, however, as this is the minimum varianceunbiased estimator (MVUE) for the regression problem [48] Intuitively thisre®nement is signi®cant as we know that the s2
j are far from equal To exploit theoptimality, however, the correlation factors g j and the variances s2
j must becalculatedÐa dif®cult task They can nonetheless be well approximated, providedsimplifying idealized properties are adopted The particular idealizations chosenhere are the following:
ID1a: For each ®xed j the dX j; are stationary sequences of uncorrelatedvariables
IDIb: The processes dX j; and dX j0; , j 6 j0, are uncorrelated
ID2: The process X and, hence, the processes dX j; are Gaussian
The above conditions may appear unduly restrictive at ®rst; however, the lying effectiveness of the method is based on P1 and P2, ID1±ID2 being added
Trang 25under-mainly to extend the quantitative analysis Robustness with respect to departuresfrom these idealizations is discussed in detail below Note that ID1a and ID1btogether make ID1, the idealization of complete decorrelation It is split here tohighlight the fact that ID1b, independence between scales, is not needed for the keyresults.
Under ID1a and ID2 it can be shown [75] that g j is a negative, increasingfunction of nj only, given by
g j c nj=2= ln 2 log2 nj=2; 2:20where c z G0 z=G z is the so-called Psi function and G z the Gamma function.This function can easily be calculated for all values of nj
Under ID1a and ID2, with g j as above, the variables yj log mj g j arescaled and shifted logarithms of Chi-squared variables, satisfying
The estimator ^a of a is the slope of a weighted linear regression of yj on j givenby