They have impacted our understanding ofactual network traf®c, to the point where we now know why aggregate data traf®cexhibits fractal scaling behavior over time scales from a few hundre
Trang 1Since the statistical analysis of Ethernet local-area network (LAN) traces in Leland
et al [20], there has been signi®cant progress in developing appropriate tical andstatistical techniques that provide a physical-based, networking-relatedunderstanding of the observed fractal-like or self-similar scaling behavior ofmeasured data traf®c over time scales ranging from hundreds of milliseconds toseconds and beyond These techniques explain, describe, and validate the reportedlarge-time scaling phenomenon in aggregate network traf®c at the packet level interms of more elementary properties of the traf®c patterns generatedby theindividual users and=or applications They have impacted our understanding ofactual network traf®c, to the point where we now know why aggregate data traf®cexhibits fractal scaling behavior over time scales from a few hundreds of milli-seconds onward In fact, a measure of the success of this new understanding is thatthe corresponding mathematical arguments are at the same time rigorous and simple,are in full agreement with the networking researchers' intuition andwith measured
mathema-Self-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger ISBN 0-471-31974-0 Copyright # 2000 by John Wiley & Sons, Inc.
507
Copyright # 2000 by John Wiley & Sons, Inc Print ISBN 0-471-31974-0 Electronic ISBN 0-471-20644-X
Trang 2data, and can be explained readily to a non-networking expert These developmentshave helped immensely in demystifying fractal-based traf®c modeling and havegiven rise to new insights and physical understanding of the effects of large-timescaling properties in measurednetwork traf®c on the design, management, andperformance of high-speednetworks.
However, to provide a complete description of data network traf®c, the same kind
of understanding is necessary with respect to the dynamic nature of traf®c over smalltime scales, from a few hundreds of milliseconds downward Because of thepredominant protocols and end-to-end congestion control mechanisms that play acentral role in modern-day data networks and determine the ¯ow of packets overthose ®ne time scales andat the different layers in the TCP=IP protocol hierarchy,studying the ®ne-time scale behavior or local characteristics of data traf®c isintimately related to understanding the complex interactions that exist in datanetworks such as the Internet between the different connections, across the differentlayers in the protocol hierarchy, over time as well as in space In this chapter, we ®rstsummarize the results that provide a unifying and consistent picture of the large-timescaling behavior of data traf®c and discuss the appropriateness of self-similarprocesses such as fractional Gaussian noise for modeling the ¯uctuations of thetraf®c rate process around its mean and for providing a complete description of thetraf®c on individual links within the network Then we report on recent progress instudying the small-time scaling behavior in data network traf®c and outline a number
of challenging open problems that stand in the way of providing an understanding ofthe local traf®c characteristics that is as plausible, intuitive, appealing, andrelevant
as the one that has been foundfor the global or large-time scaling properties of datatraf®c
20.2 THE LARGE-TIME SCALING BEHAVIOR OF NETWORK
TRAFFIC
In this section, we demonstrate why the empirically observed large-time scalingbehavior or (asymptotic) self-similarity of aggregate network traf®c is an additiveproperty, with the additional requirement that the individual component processesthat generate the total traf®c exhibit certain high-variability or heavy-tailedchar-acteristics
20.2.1 Additive Structure and Gaussianity
When viewedover large enough time scales, the number of packets or bytes per timeunit collectedoff a link in a network originates from all those connections that wereactive during the measurement period, utilized this link, and actively generatedtraf®c during this time In other words, if for ``time scales'' or ``levels of resolution''
m 1, X m X m k: k 0 denotes the overall traf®c rate process, that is, the
Trang 3total number of packets or bytes per time unit (measuredat time scale m) generated
by all connections, then we can write
where the sum is over all connections i that are active at time k andwhere
Xi m Xi m k: k 0 represents the total number of packets or bytes per timeunit (again measuredat time scale m) generatedby connection i.1Thus, Eq (20.1)captures the additive nature of aggregate network traf®c by expressing the overalltraf®c rate process X m as a superposition of the traf®c rate processes Xi m of theindividual connections
Assuming for simplicity that the individual traf®c rate processes Xi m areindependent from one another and identically distributed, then under weak regularityconditions on the marginal distribution of the Xi m(including, e.g., the existence ofsecondmoments), Eq (20.1) guarantees that the overall traf®c rate process (or itsdeviations from its mean) exhibits Gaussian marginals, as soon as the traf®c isgenerated by a suf®ciently large number of individual connections
20.2.2 Self-Similarity Through Heavy-Tailed Connections
Focusing on the temporal dynamics of the individual traf®c rate processes Xi m,suppose for simplicity that connection i sends packets or bytes at a constant rate (say,rate 1) for some time (the ``active'' or ``on'' period) and does not send any packets orbytes during the ``idle'' or ``off'' period; we will return to the challenging problem ofallowing for more realistic ``within-connection'' packet dynamics in Section 20.3.For example, in a LAN environment, a connection corresponds to an individual host-to-host or source±destination pair and the corresponding traf®c patterns have beenshown in Willinger et al [38] to conform to an alternating renewal process where thesuccessive pairs of on and off periods de®ne the inter-renewal intervals On the otherhand, in the context of wide-area networks or WANs such as the Internet, weassociate individual connections with ``sessions,'' where a session starts at somerandom point in time, generates packets or bytes at a constant rate (say, rate 1) duringthe lifetime of the connection, andthen stops transmitting packets or bytes Here asession can be anFTPappplication, aTELNETconnection, a Web session, sending e-mail, reading Network News, and so on, or any imaginable combination thereof Infact, over1
2to 1 hour periods, session arrivals on Internet links have been shown to
be consistent with a homogeneous Poisson process; for example, see Paxson andFloyd[25] for FTP and TELNET sessions, andsee Feldmann et al [12] for Websessions Note that in the present setting, only global connection characteristics (e.g.,session arrivals, lifetimes of sessions, durations of the on=off periods) play a role,while the details of how the packets arrive within a connection or within an on
1 Note that the processes X m and X m
i are de®ned by averaging X and Xiover nonoverlapping blocks of size m.
Trang 4periodhave been conveniently modeledaway by assuming that the packets within aconnection are generatedat a constant rate.
To describe the stochastic nature of the overall traf®c rate process X m, the onlystochastic elements that have not yet been speci®edare the distributions of thelengths of the on=off periods (in the case of the LAN example) or the distribution ofthe session durations (for the WAN case) associated with the individual traf®c rateprocesses Xi m Basedon measuredon=off periods of individual host-to-host pairs in
a LAN environment (e.g., see Willinger et al [38]) andmeasuredsession durationsfrom different WAN sites (e.g., see Feldman et al [12], Paxson and Floyd [25] andWillinger et al [37]), we choose these distributions to be heavy-tailed with in®nitevariance Here, a positive random variable U (or the corresponding distributionfunction F) is called heavy-tailed with tail index > 0 if it satis®es
where c > 0 is a ®nite constant that does not depend on y Such distributions are alsocalled hyperbolic or power-law distributions andinclude, among others, the well-known class of Pareto distributions The case 1 < < 2 is of special interest andconcerns heavy-taileddistributions with ®nite mean but in®nite variance Intuitively,in®nite variance distributions allow random variables to take values that vary over awide range of scales and can be exceptionally large with nonnegligible probabilities.Hence, heavy-tailed distributions with in®nite variance allow for compact descrip-tions of the empirically observedhigh-variability phenomena that dominate traf®c-relatedmeasurements at all layers in the networking hierarchy; for example, seeFeldman et al [12]
Mathematically, the heavy-tailed property of, for example, the durations duringwhich individual connections actively generate packets implies that the temporalcorrelations of the stationary versions of an individual traf®c rate processes Xi mand,because of the additivity property (20.1), of the overall traf®c rate process X mdecayhyperbolically slowly; that is, they exhibit long-range dependence More precisely, if
r m r m k: k 0 denotes the autocorrelation function of the stationary version
of the overall traf®c rate process X m, then property (20.2) can be shown to implylong-range dependence (e.g., see Cox [4] andWillinger et al [38]; for similar resultsobtainedin the context of a ¯uidqueueing system under heavy traf®c, see Chapter 5
in this volume) That is, for all m 1, r m satis®es
r m k ck2H 2; as k ! 1; 0:5 < H < 1; 20:3where the parameter H is calledthe Hurst parameter andmeasures the degree oflong-range dependence in X m; in terms of the tail index 1 < < 2 that measuresthe degree of ``heavy-tailedness'' in Eq (20.2), H is given by H 3 =2.Intuitively, long-range dependence results in periods of sustained greater-than-average or lower-than-average traf®c rates, irrespective of the time scale overwhich the rate is measured In fact, for a zero-mean covariance-stationary process,
Eq (20.3) implies (andis impliedby) asymptotic (second-order) self-similarity; that
is, after appropriate rescaling, the overall traf®c rate processes X m have identicalsecond-order statistical characteristics and ``look similar'' for all suf®ciently large
Trang 5time scales m In other words, Eq (20.3) holds if and only if for all suf®ciently largetime scales m1 and m2, we have
¯ow of packets in modern data networks) In fact, for the self-similarity property ofdata traf®c over large time scales to hold, all that is needed is that the number ofpackets or bytes per connection is heavy tailedwith in®nite variance, andthe precisenature of how the individual packets within a session or connection are sent over thenetwork is largely irrelevant
Note that this understanding of data traf®c started with an extensive analysis ofmeasuredaggregate traf®c traces, followedby the statistically well-groundedconclusion of their self-similar or fractal characteristics, andtriggeredthe curiosity
of networking researchers who wantedto know: ``Why self-similar or fractal?'' Inturn, this question for a physical explanation of the large-time scaling behavior ofmeasured data traf®c resulted in ®ndings about data traf®c at the connection levelthat are, at the same time, mathematically rigorous, agree with the networkingresearchers' experience, are consistent with data, and are intuitive and simple toexplain in the networking context In this sense, the progression of results proceeded
in an opposite way to how traf®c modeling has traditionally been done in this area;that is, by ®rst analyzing in great detail the dynamics of packet ¯ows withinindividual connections and then appealing to some mathematical limiting result thatallowedfor a simple approximation of the complex andgenerally overparameterizedaggregate traf®c stream In contrast, the self-similarity work has demonstrated thatnovel insights into and new and unprecedented understanding of the nature of actualdata traf®c can be gained by a careful statistical analysis of measured traf®c at theaggregate level andby explaining aggregate traf®c characteristics in terms of moreelementary properties that are exhibitedby measureddata traf®c at the connectionlevel
20.2.3 Self-Similar Gaussian Processes as Workload Models
Note that in the Gaussian setting discussed in Section 20.2.1, the self-similarityproperty (20.4) implies that for1
2< H < 1 andfor all suf®ciently large time scales
Trang 6m, the traf®c rate process X m (or, more precisely, the deviation from its mean)satis®es
where in this case, the equality is understood in the sense of ®nite-dimensionaldistributions, and where X Xk: k 1 denotes fractional Gaussian noise (FGN),the only stationary (zero-mean) Gaussian process that is (exactly) self-similar in thesense that Eq (20.5) holds for all m 1 Equivalently, FGN is uniquely character-izedas the stationary (zero-mean) Gaussian process with autocorrelation function
r k 1
2 k 12H 2k2H k 12H, k 1,1
2< H < 1
For the purpose of modeling the dynamics of actual data traf®c over a link within
a network, FGN has the advantage of providing a complete description of theresulting traf®c rate process; that is, specifying its mean, variance, andHurstparameter H suf®ces to completely characterize the traf®c Given this advantageover otherÐtypically incompleteÐdescriptions of network traf®c dynamics, it isimportant to know under what conditions FGN is an adequate and accurate processfor modelling the deviations around the mean of actual data traf®c To this end,Erramilli et al [8] note that the FGN model can be expected to be an appropriatemodel for data traf®c provided (1) the traf®c is aggregated over a large number ofindependent and not too wildly ¯uctuating connections (i.e., ensuring Gaussianity ofexpression (20.1)), (2) the effects of ¯ow control on any one connection arenegligible (i.e., requiring, in fact, that we consider the traf®c only over suf®cientlylarge time scales where Eq (20.4) holds), and (3) the time scales of interest for theperformance problem at handcoincide with the scaling region (i.e., where Eq (20.5)holds) In practice, these conditions are often satis®ed in the backbone (i.e., highlevels of aggregation) andfor time scales that are larger than the typical round-triptime of a packet in the network
20.2.4 Toward Self-Similar Non-Gaussian Workload Models?
One of the conditions mentioned above that justify the use of FGN as an adequateand accurate description of actual data traf®c traversing individual links in a networkstates that the traf®c over a speci®c link is made up of a large number of (more orless) independent connections, where each connection's own traf®c rate cannot
¯uctuate too wildly; that is, Xi m is chosen from a distribution with ®nite variance.While this condition is generally applicable in many legacy LAN and WANenvironments andcan often be validatedagainst measuredtraf®c, due to changes
in networking technologies, applications, anduser behavior, it can no longer betaken for granted in today's networks For example, advanced networking technol-ogies such as 100 Mb=s Ethernets or gigabit Ethernets can be expectedÐdespite thepresence of TCP, for exampleÐto allow the traf®c rates of individual connections tovary over many orders of magnitude, from kilobits=secondto megabits=secondandbeyond, depending on the networking conditions Thus, for understanding modern-day network traf®c, processes that combine heavy tails in time and space (i.e., the
Trang 7distributions of the durations as well as of the rates at which individual connectionsemit packets are heavy tailedwith in®nite variance) may become relevant in practiceandmay see genuine applications in the networking area in the near future.
To illustrate, let Xi mdenote an on=off-type connection described earlier, where inaddition to the duration of the on=off periods, the rate at which the connection emitspackets during the on period is also heavy tailed with in®nite variance (with tailindex , say) Focusing on this modi®cation of the renewal model investigated byMandelbrot [22] andTaqqu andLevy [34], Levy andTaqqu [21] recently showedthat when studying the overall traf®c rate process X mde®ned in Eq (20.1)Ðthat is,aggregating many such independent connectionsÐone can obtain a dependent,stationary process that has a stable marginal distribution with in®nite variance andthat is self-similar as in Eq (20.5) with self-similarity parameter H given by
Here denotes the index characterizing the heaviness of the tail of the traf®c rate ofthe individual connections, and denotes the tail index associated with thedistributions of the durations of on and off periods, which we assume for simplicity
to be identical Observe that in the ®nite variance case 2, relation (20.6)reduces to the familiar H 3 =2 2 1
2; 1, which appears in connection withfractional Gaussian noise considered earlier However, in contrast to FGN, thesuperposition process obtainedunder the assumption of heavy tails with in®nitevariance on the durations and rates is not Gaussian but has heavy-tailedmarginalsinstead, implying that there is a much higher probability than in the Gaussian casethat the overall traf®c rate can differ greatly from the average value and that it cantake extreme values (a phenomenon also known as intermittency) Being non-Gaussian, one of the obstacles at this stage for using these kinds of stable super-position processes in the context of modeling data traf®c is that their statisticalparameters (which speci®es the marginals) and H (Eq (20.5)) do not de®ne themcompletely; there exist a number of different dependent, stationary incrementprocesses with stable marginals with the same andsame self-similarity parameterHÐsee, for example, Samorodnitsky and Taqqu [33] This is in stark contrast toFGN, where knowing the second-order statistical characteristics (i.e., variance andHurst parameter H) uniquely de®nes the process, due to Gaussianity
20.3 THE SMALL-TIME SCALING BEHAVIOR OF
NETWORK TRAFFIC
The analysis of measured network traf®c and resulting understanding of some of itsunderlying structure outlined in Section 20.2 have led to the realization that whilewide-area traf®c is consistent with asymptotic self-similarity or large-time scalingbehavior, its small-time scaling features are very different from those observed overlarge time scales Thus, to provide an adequate and more complete description of
Trang 8actual network traf®c, it is necessary to deal with these small-time scaling featuresandto ultimately understandtheir cause andeffects To this end, we summarize inthis section our current understanding of this very recent development in networktraf®c analysis and modeling by introducing concepts that are novel to thenetworking area, for example, multifractals, conservative cascades, and multiplica-tive structure, andillustrate their relevance to networking.
20.3.1 Multifractals
From a networking perspective, it comes as no surprise that protocol-speci®cmechanisms andend-to-endcongestion control algorithms operating on smalltime scales and at the different layers in the hierarchical structure of modern datanetworks give rise to structural properties that are drastically different from the large-time scaling behavior, which has been shown earlier to be mainly due to global userand=or session characteristics Since these networking mechanisms determinelargely the actual ¯ow of packets across the networks, they are likely to cause thetraf®c to exhibit pronouncedlocal variations andirregularities which, per se, cannot
be expectedto have any obvious connection to the self-similar behavior of the traf®cover large time scales
To quantify these local variations in measuredtraf®c at a particular point in time
t0, let Y Y t: 0 t 1 denote the process representing the total number ofpackets or bytes sent over a link-up to time t, andfor some n > 0, consider the traf®crate process Y kn 12 n Y kn2 n, kn 0; 1; ; 2n 1; that is, the totalnumber of packets or bytes seen on the link during nonoverlapping intervals ofthe form kn2 n, kn 12 n We say that the traf®c has a local scaling exponent
t0 at time t0 if the traf®c rate process behaves like 2 n t0 , as
kn2 n! t0 n ! 1 Note that t0 > 1 corresponds to instants with low intensitylevels or small local variations (Y has derivative zero at t0), while t0 < 1 is found
in regions with high levels of burstiness or local irregularities Informally, we calltraf®c with the same scaling exponent at all instants t0 monofractal (this includesexactly self-similar traf®c, for which t0 H, for all t0), while traf®c withnonconstant scaling exponent t0 is called multifractal
More formally, the degree of local irregularity of a signal Y or its singularitystructure at a given point in time t0can be characterizedto a ®rst approximation bycomparison with an algebraic function, that is, t0 is the best (i.e., largest) suchthat jY t0 Y t0j Cjt0 t0j, for all t0suf®ciently close to t0 Since our process
Y has positive increments, this singularity exponent can be approximatedthroughthe somewhat simpler quantity
Trang 9The aim of multifractal analysis (MFA) is to provide information about thesesingularity exponents in a given signal andto come up with a compact description ofthe overall singularity structure of signals in geometrical or in statistical terms.Before describing in more detail some of the commonly used MFA methods, we notethat since wavelet decompositions contain information about the degree of localirregularity of a signal, it shouldcome as no surprise that the singularity exponent t is relatedto the decay of wavelet coef®cients wj;k Y s j;k s ds aroundthe point t, where is a bandpass wavelet function and where j;k s :
2 j=2 2 js k (e.g., in the case of the well-known Haar wavelet, s equals 1for 0 s 1; 1 for 1 s 2, and0 for all other s; for a general overview ofwavelets, we refer to Daubechies [5]) Indeed, assuming only that s ds 0 onecan show as in Jaffard[18] that
2n=2w n;kn C 2 n t; as kn2 n! t: 20:9Moreover, it is known that under some regularity conditions (for a precise statementsee Jaffard[18] or Daubechies [5, Theorem 9.2]), relation (20.9) characterizes thedegree of local irregularity of the signal at the point t This suggests to de®ne ~ t as
in Eq (20.8) but with n t replacedby ~n t, where
~n t : ~n
k n: n log 21 log 2n=2jw n;knj: 20:10
In general, this may give a different but nevertheless useful description of thesingularity structure of Y, particularly for nonmonotonous processes (for anexample, see Gilbert et al [13]) Using wavelets may also have numericaladvantages The remainder of this section remains true if t is replacedby ~ tandEq (20.8) by (20.10), that is increments by normalizedwavelet coef®cients.Conceptually, the geometrical formulation of MFA in the time domain is the mostobvious one Its objective is to quantify what values of the limiting scaling exponent t appear in a signal andhow often one will encounter the different values In otherwords, the focus here is on the ``size'' of the sets of the form
Trang 10One such description involves the notion of the coarse HoÈlder exponents (20.8).
To illustrate, ®x a path of Y andconsider a histogram of the n
k k 0; 2n 1taken at some ®nite level n It will show a nontrivial distribution of values but isboundto concentrate more andmore aroundthe expectedvalue as a result of the law
of large numbers (LLN): values other than the expectedvalue must occur less andless often To quantify the frequency with which values other than the mean valueoccur, we make extensive use of the theory of large deviations Generalizing theChernoff±Cramer bound, the large deviation principle (LDP) states that probabilities
of rare events (e.g., the occurrence of values that deviate from the mean) decayexponentially fast To make this more precise consider a sequence of independent,identically distributed (i.i.d.) random variables W, W1, W2; andset Vn:
W1 Wn Using Chebyshev's inequality and the independence, we ®nd, forany q > 0,
P 1=nVn a P2qV n 2nqa E22nqaqVn E2qW2 qan: 20:12Since q > 0 is arbitrary, we can replace the right-handside in Eq (20.12) by itsin®mum over q > 0 A symmetry argument shows that Pb 1=nVn E2qW2 qbn, for all q < 0 Combining all this yields the following two upperbounds:
1
nlog2Pb 1=nVn a infq>0flog2E2qW qag;
infq<0flog2E2qW qbg:
To apply the LDP approach to our situation, we ®x a realization of Y andconsiderthe location t, encoded by kn via t 2 kn2 n, kn 12 n, as the only randomnessrelevant for the LDP Since kn can take only 2n different values, which we will
Trang 11assume to be all equally likely, the relevant probability measure for t is the countingmeasure Pt The sequence of interest for our purpose is
Vn: log2jY kn 12 n Y kn2 nj nn
k n:Trying to obtain more precise information about the singularity behavior andaiming
at simplifying Eq (20.13), we not only let n tendto 1 but also let a; b shrink down
to a single point a b=2, which uni®es the two bounds in the limit All thissuggests that the following limiting ``rate function'' f will exist under mildconditions (see Riedi [27, Theorem 7]):
at least a considerable part of the n
k are approximately equal to , that is,
fn ; ' 2n Such is the case for FGN with H; but we also have f 1 ifonly a certain constant fraction of the n values equals , as is the case with theconcatenation of FGNs described earlier [36] Only if certain values of n areconsiderably more spurious than others will we observe f < 1 In fact, it can beshown [28, 29] that the rate function f relates to dim K andthat we have
It is in this sense that f provides information on the occurrence of the various
``fractal'' exponents andhas been termedmultifractal spectrum Also, note that therate function f is a random element because it is de®ned for every path of Y.Although f can, in principle, be computedin practice, it is a very delicate andhighly sensitive object, mainly because of its de®nition in terms of a double limit(see Eq (20.14)) Fortunately, the LDP result suggests using the RHS of Eq (20.13),with E2qW replacedby E2qV n1=n as in Eq (20.12), as an alternative methodforestimating f that avoids double-limit operations and is generally more robustbecause it involves averages In fact, consider the partition function q de®ned by
Sn q :2Pn 1
k0jY k 12 n Y k2 njq2Pn 1
k02 qn n
Trang 12According to the theory of LDP we will have equality in Eq (20.13) under mildconditions, at least in the limit as n ! 1 and b ! a Appealing to such results, it ispossible to establish conditions under which f inf q q In fact, collect-ing the terms k in Sn q with n
k t approximately equal to some given value, say, ,for varying andnoting that we have about 2nf such terms yields
f f ** * inf
where f ** is the concave hull of f (compare Eq (20.13)2) The questions are whenandfor which the equality f ** f holds A simple application of the LDPtheorem of GaÈrtner±Ellis [7] provides an answer to these questionsÐunder theassumption that q is differentiable everywhere (see Riedi [27]) In this particularcase, we obtain the appealing formula
Sn q '2P
n 1 k0EjY k 12 n Y k2 njq 2n nqHEjY 1jq:
2 The factors 2 n appearing in f and q are for convenience The sign of q is chosen such as to render Eqs (20.20) and(20.19) symmetrical The signs of q in (20.13) and(20.20) are opposite to each other.