An analytic example [40] shows that for a simple G=M=1queue, a stationary input with long-range dependence can induce heavy tailsfor the waiting time distribution and for the distributio
Trang 1Cornell University, School of Operations Research and Industrial Engineering,
New features in the teletraf®c data discussed in recent studies suggest severalissues for study and discussion
Statistical How can statistical models be ®t to such data? Finite variance blackbox time series modeling has traditionally been dominated by ARMA or Box±Jenkins models These models can be adapted to heavy-tailed data and workvery well on simulated data However, for real nonsimulated data exhibitingSelf-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger ISBN 0-471-31974-0 Copyright # 2000 by John Wiley & Sons, Inc.
171
Copyright # 2000 by John Wiley & Sons, Inc Print ISBN 0-471-31974-0 Electronic ISBN 0-471-20644-X
Trang 2dependencies, such ARMA models provide unacceptable ®ts and do notcapture the correct dependence structure For discussion see Davis and Resnick[15], Resnick [38, 39], Resnick et al [42], and Resnick and van den Berg [43].
Probabilistic What probability models explain observed features in the datasuch as long-range dependence and heavy tails
Consequences Dothe new features revealed by current teletraf®c data studiesmean we have to give up Poisson derived models and exponentially boundedtails and the highly linear models of time series? Various bits of evidenceemphasize the de®ciencies of classical modeling There are simulation studies[34] and the experimental queueing analysis of Erramilli, Narayan, andWillinger [18] An analytic example [40] shows that for a simple G=M=1queue, a stationary input with long-range dependence can induce heavy tailsfor the waiting time distribution and for the distribution of the number in thesystem
Connections between long-range dependence and heavy tails need to be moresystematically explored but it is clear that in certain circumstances, long-rangedependent (LRD) inputs can cause heavy-tailed outputs and (as we discuss here)heavy tails can cause long-range dependence We discuss three models where heavytails induce long-range dependence:
1 A single channel on=off source feeding a single server working at constant rate
r > 0 Transmission or on periods have heavy-tailed distributions
2 A multisource system where a single server working at constant rate r > 0 is fed
by J > 1 on=off sources Transmission periods have heavy-tailed distributions
3 An in®nite source model feeding a single server working at constant rate
r > 0 At Poisson time points, nodes or sources commence transmitting.Transmission times have heavy-tailed distributions
In each of the three cases, our basic descriptor of system performance is the timefor buffer content to reach a critical level Such a measure of performance is pathbased and makes sense without regard to stability of the model, existence ofmoments of input variables, or properties of steady-state quantities
7.2 A SINGLE CHANNEL ON/OFF COMMUNICATION MODEL
7.2.1 BasicSetup
We consider ®rst communication between a single source and a single destinationserver The source transmits for random on periods alternating with random offperiods when the source is silent During the on periods, transmission is at unit rate.Let fXon; Xn; n 1g be i.i.d nonnegative random variables representing onperiods The common distribution is Fon Similarly, fYoff; Yn; n 1g are i.i.d
Trang 3nonnegative random variables independent of fXon; Xn; n 1g representing offperiods and these have common distribution Foff The means are
1 The interarrival distribution is Fon Foff and the mean interarrival time is
m mon moff
2 The renewal times are
0;Pni1 Xi Yi; n 1
:Because of the ®niteness of the means, the renewal process has a stationary version:
D; D Pni1 Xi Yi; n 1
:where D is a delay random variable satisfying
PD > x
1x
PXon Yoff > s
1x
1 Fon Foff s
However, making the process stationary in this manner has the disadvantage that theinitial delay period D does not decompose into an on and an off period the waysubsequent inter-renewal periods do and the following procedure is preferable forgenerating the stationary alternating renewal process De®ne independent randomvariables B; Xon 0; Yoff 0, which are assumed independent of fXon; Xn; n 1g and
1 Fon s
mon ds : 1 Fon 0 x;
PYoff 0> x
1x
1 Foff s
moff ds : 1 Foff 0 x:
Trang 4The delay random variable D 0 is de®ned by
is a stationary renewal process
7.2.2 High Variability Induces Long-Range Dependence
Consider the indicator process fZtg, which is 1 iff t is in an on period Thus, for
(
A standard renewal argument gives the following result [22]
Proposition 7.2.1 fZt; t 0g is strictly stationary and
PZt 1 mmon:Conditional on Zt 1, the subsequent sequence of on=off periods is the same asseen from time 0 in the stationary process with B 1
It is easiest to express long-range dependence in terms of slow decay ofcovariance functions so we consider the second-order properties of the stationaryprocess fZtg (See Heath et al [22].) The basis for the next result is a renewal theoryargument
Theorem 7.2.2 The covariance function
g s Cov Zt; Zts
Trang 5of the stationary process fZ t; t 0g is
s
0z s oU dw;
where
U P1n0 Fon Foffn
Trang 6So g t decreases like a constant times t Fon t Such a slow decay of g t at analgebraic rate is characteristic of long-range dependence One way to think about thisresult is that, with heavy-tailed on periods, there is a signi®cant probability that avery long on period can cover both the time points s and t s, thereby inducingstrong correlation between these two time points.
Taqqu et al [48] use the long-range dependence of the on=off process andsuperimpose many such processes This superposition is approximately a fractionalBrownian motion, giving one explanation of the observed self-similarity of Ethernettraf®c See alsoLeland et al [33] Other limiting procedures leading toLeÂvy motionwith heavy tails are possible and are also brie¯y discussed in Leland et al [33] Seealso Konstantopoulos and Lin [32]
7.2.3 Single Channel Fluid Queues with Constant Service Rates
Suppose work enters a communication system according to the on=off process Theserver works off the load at constant rate r assuming there is load to work on.Here are the formal model ingredients for the single source model
1 The Input Process
A t :
t
0Zvdv:
Since A t tmon=m, the long-term input rate is mon=m
2 The Output Process There is a release rate functionÐthe release rate from thesystem when contents are at level x is
4 The contents process fX tg satis®es the storage equation
dX t dA t r X t dt:
Note that during an on period, the net input rate is 1 r since work is inputted atunit rate but the server works at rate r During an off period, the release rate is r
Trang 7(provided there is liquid to release) This means the paths of X are sawtoothedshaped.
7.2.3.1 Regeneration Times Recall that the stationary alternating renewal cess is
pro-Sn D 0Pn
i1 Xi Yi; n 0
:Since the contents process is stable, we can de®ne regeneration times
fCng : fSn: X Sn 0g;
which are times when a dry period ends and input commences So the standard limittheorems due to Smith for regenerative processes [45] guarantee limit distributionsexist in discrete and continuous time:
xn1 1 rXn1 rYn1and fxjg i.i.d It is important to distinguish between the random walk with steps fxngand the random walk with steps fXn Yng
Trang 8sothat the tail of W is heavy and comparable to the integral of the tail of Fon Thede®nition of r is
r mon
moff
1 r
r < 1:
As expected from the sawtooth shape of the paths, the tail of V x is heavier because
of a bigger multiplicative factor
7.2.4 Extremes, Level Crossings, and Buffer Over¯ow
The distributions W x and V x are standard queueing quantities and convey someperformance information Another performance measure, one that is less dependent
on notions of stability and existence of moments, is the time until buffer over¯ow
We formulate the time to buffer over¯ow as the hitting time of a high level L:
t L : infft 0: X t Lg:
De®ne M t Wts0X s maxfX s: 0 s tg, and the reason for interest inthe maximum content up to t is that as a process it is the inverse of the t processsince
t L inffs > 0: X s Lg inffs > 0: M s Lg : M L;
where M is the right continuous inverse of the monotone function M If weunderstand the asymptotic behavior of M , then we will understand the asymptoticbehavior of t To understand the behavior of M we ®rst study the extremes of
fX Sng and then ®ll in the behavior between the discrete points fSng We study themaxima of the random walk generated by fxng over cycles and then knit cyclestogether Recall
xn1 1 rXn1 rYn1:
Trang 9The ®rst downgoing ladder epoch of fPni0xi; n 0g is
Dl 0; 1 are the left continuous functions with ®nite right limits.)
Trang 10Theorem 7.2.6 Assume fX tg stable and
lim
L!1 P 1 rm a 1 Fon Lt L x
PE 1 x 1 e x; x > 0;where E 1 is a unit exponential random variable Also, as x ! 1,
1 Fon xE t x ! 1 rm a;
Trang 11Foff the same Paretoor
constant off times equal to3;
The number of replications was 500 and the levels L were 2, 5, 10, 22, 46, 100, 215,
464 The simulation with the constant off times shows our approximation issurprisingly effective but, as expected, having variability in the off distributionmakes the approximation less accurate See Fig 7.1
As a ®nal experiment, we decided to test the correctness of the intuition that ahigh level crossing by the content level was due to a single very long on period,rather than due to gradual buildup We ran 1000 simulation runs of the system Eachsimulation consisted of running the system until level L 64 was crossed and
approximation simulationapproximation
Fig 7.1 Approximation versus simulated mean times
Trang 12keeping track of the length of the on period, which was in progress when level L wascrossed The histogram in Fig 7.2 is for the ratios
1 r length of the on period resulting in level exceedence
where r 0:53 and a 1:5 The wedge symbol ^ means minimum The offdistribution was concentrated at 3 and the on distribution was Pareto The histogramshows that 85% of the time, the contents process crosses level L due toa very long
on period and not due to gradual buildup This is due to the fact that for all but 150out of 1000 of the runs, the ratio was 1
7.2.5 Contrast with Exponential Tails
From results of Iglehart [27] we may contrast the heavy-tailed case with the case ofexponentially decreasing tails We make the following assumptions
1 There exists g > 0 such that E egx 1 1 This implies that the right tail of x1isbounded by expf gxg, x > 0 Note that Foff helps determine the growth rate g
2 For this value of g, Ex1egx 1 : mg2 0; 1
3 The random variable x1 has a nonlattice distribution
Fig 7.2 Histogram giving effect of long on periods
Trang 13De®ne N as in Eq (7.2) and
a 0 : 1 E expfg
PNi1xig2
as u ! 1, where E 1 is a unit exponential random variable Also,
Et s mE Na 0 egs s ! 1:
Soin this case, the time tohit a high level grows exponentially fast in the level(better) as opposed to algebraically fast (worse) in the level for the heavy-tailed case.7.3 BUFFER OVERFLOW IN A MULTISOURCE MODEL
Suppose a single server is fed by J i.i.d on=off sources As before we assume theserver works at constant rate r Fo r i 1; ; J let Zi t be the indicator of the ithon=off source describing when this alternating renewal source is on and transmitting.De®ne
Z J t PJ
i1Zi t; t > 0tobe the total input rate at t; that is, the number of sources transmitting at t Thecontents process corresponding to this input is
dX J t Z J t dt r X J t dt:
To guarantee stability, we assume the condition Jmon=m < r
As in the single channel model, we focus on time to buffer over¯ow as ameaningful performance measure and thus consider
t L infft 0: X J t Lg:
Because the superposition of renewal processes is no longer a renewal or erative process, this J-channel model is signi®cantly more complex to analyze Weare only able to analyze the expected value of t L and only in the case r < 1
Trang 14For related ®nite buffer results, see Jelenkovic [29] and Zwart [50].
7.4 AN INFINITE SOURCE MODEL
The dif®culty in analyzing the J-channel on=off model of the previous sectionsuggests letting J ! 1 in order to achieve additional tractability So suppose nowthere are an in®nite number of nodes in a network Sources turn on and initiatesessions at Poisson time points fGk; k 1g The rate of the Poisson process is l Thelengths of sessions are i.i.d random variables fXng, which have a commondistribution Fon and assume E X1 mon During a session, an active sourcetransmits at rate 1 and whenever there is work, the single server works at rate r
An important quantity is
N t number of sessions in progress at t
P1k11Gk<tGkXk
the number of busy servers in the M=G=1 queue:
Traditional queueing theory recognizes N t as the number of busy trunklines in anin®nite line telephone model
It is well known and easy to derive the following facts
1 N t is Poisson distributed At equilibrium the mean is lmon
Trang 152 At equilibrium, the stationary version of N has covariance function at lag sequal tos1 Fon u du So if Fon is heavy tailed, there will be slow decay of thecovariance function and long-range dependence will be present.
Both facts follow readily from the fact that the point process with points f Gk; Xkg is
a two-dimensional Poisson process with mean measure l ds F dy
We de®ne the contents process X by
dX t N t dt r1X t>0dt
so the input rate is random depending on the number of active sources at a giventime Toensure stability we assume
long-term input rate lmon< r:
For additional discussion of this model, see Boxma [8], Brichet et al [10],Konstantopoulos and Lin [32], and Resnick and Rootzen [46] A discrete timeanalog of this model is discussed in Chapter 9
7.4.1 Activity Periods
Following Jelenkovic and Lazar [30, 31], we de®ne an activity period to be a busyperiod in the corresponding M=G=1 queue so if at time 0 the initial conditions arethat N 0 0 and a node turns on, then the length of the initial activity period isinfft > 0: N t 0g Note that an activity period could end with nonzero content; itjust depends on there being no active sources The length of an activity period is arandom variable whose distribution is known by means of a Laplace transformformula as given in TakaÂcs [47] or Hall [20] Based on this, we may derive anotherinteresting quantity
Let n be the number of sessions in an activity period and de®ne
Trang 16This allows us (see Resnick and Samorodnitsky [41] for the proof ) to generalize aresult of Jelenkovic and Lazar [28] See alsoChapter 10 Our result weakensassumptions on Fon and weakens the assumption r < 1.
Theorem 7.4.1 Let I be the change in the buffer content during the activityperiod Assume that the stability condition holds, and that
Assume the session length distribution Fon has a regularly varying (or onlydominatedly varying) tail with index a > 1 Then
limx!1
1 lmon > r (since the overall rate is lmonand the long session provides added input
at rate 1) and therefore content increases by amount
1 lmon r length of long session:
Thus, if we have a long session whose length exceeds
7.4.2 Extremes, First Passages, and Buffer Over¯ow
Suppose the added restriction that