This holds even if the underlying arrival process is nonstationary.Finally, basedon a simple resource allocation problem, we show that there areadvantages to using Weibull distributions
Trang 1a set of packets Within the center of the network, or for non-TCP traf®c, a relatedconnection abstraction is provided by an IP ¯ow, a group of related IP packets thatare close in time.
Examples of network resource allocation problems that arise on a per connectionlevel include signaling to reserve buffers and bandwidth Signaling could be initiated
by the endsystem to achieve explicit quality of service (QoS), using protocols such
as RSVP Alternatively, signaling couldbe initiatedby a network element to achievebetter loadbalancing or to provide QoS for certain types of traf®c In this approach,edge routers detect ¯ows of related packets and implicitly establish dedicatedconnections through the network to carry this traf®c Finally, individual network
Self-Similar Network Traf®c and Performance Evaluation, Edited by Kihong Park and Walter Willinger ISBN 0-471-31974-0 Copyright # 2000 by John Wiley & Sons, Inc.
367Copyright 2000 John Wiley & Sons, Inc ISBNs: 0-471-31974-0 (Hardback); 0-471-20644-X (Electronic)
Trang 2routers couldallocate resources for a ¯ow to improve routing andforwardingperformance For example, a router couldcache next-hop routes for commondestination addresses to reduce the number of routing computations and to forwardrelated packets along a single route The router could even establish dedicatedconnections through the switching fabric for long-lived¯ows to avoidsoftwareprocessing of subsequent packets.
Each of these network mechanisms performs operations andallocates resources
on the time scale of connection arrivals Therefore, the burstiness of the connectionarrival process affects two separate provisioning tasks: the central processing unit(CPU) resources necessary to perform the algorithm andthe network resourcesrequiredto acheive a desiredlevel of blocking The burstier the arrival process, themore CPU resources are necessary to execute the algorithm andthe more networkresources are needed to maintain a given level of blocking
In this chapter we demonstrate that the TCP connection arrival process is bursty
We show that the arrival process is asymptotic self-similar Self-similarity of the TCPconnection arrival process implies that the use of standard models in evaluating theperformance of resource allocation methods can yield misleading results Therefore,
we characterize TCP connection interarrival times1using heavy-taileddistributions
We present statistical evidence that such distributions, especially the Weibulldistributions, yield a better model for the interarrival times of TCP connectionsthan exponential models Intuitively, a heavy-tailed interarrival time means that if noconnection arrivedfor some time it becomes more andmore unlikely that one willarrive soon This holds even if the underlying arrival process is nonstationary.Finally, basedon a simple resource allocation problem, we show that there areadvantages to using Weibull distributions to model TCP connection interarrivaltimes over a nonstationary Poisson process
Our results are basedon extensive analyses of multiple traces collectedatCarnegie Mellon University in 1995, at AT&T Bell Laboratories in 1995 and
1996, and at AT&T Labs±Research in 1996 In addition we augment our results withthe analysis of traces collectedat Lawrence Berkeley Laboratories (LBL) in 1993and1995, at Digital Equipment Corporation (DEC) in 1995, andwithin AT&TWorldNet in 1997 and 1998
The rest of the chapter is organizedas follows In Section 15.2 we present, on anintuitive level, why TCP connection arrivals are bursty andwhat impact this mayhave on resource allocation problems A more detailed description of the TCP=IPtraces on which we base our traf®c characterizations is given in Section 15.3 Theresults in Section 15.4 indicate the self-similar nature of the arrival process Section15.5 outlines the methods used to analyze the interarrival time distribution and thenshows that the Weibull distribution yields a good ®t for connection interarrival times.Section 15.6 contrasts the use of the Weibull model to that of a nonstationaryPoisson process using an example application Finally we conclude with a briefsummary
1 Given two consecutive timestamps of observedTCP connections from a TCP connection arrival process, the interarrival time is the time difference between the two timestamps.
Trang 315.2 BURSTINESS OF TCP CONNECTION ARRIVAL PROCESSESAND ITS IMPLICATIONS
In this section we discuss, on an intuitive level, the self-similarity of arrival processesandthe reason why TCP connection arrival processes are self-similar The fact thatTCP connection arrivals are bursty leads to the question of how to characterize thedistribution of time between arrivals of TCP connections, or the TCP connectioninterarrival time distribution Finally, we discuss the implications of the burstiness onresource allocation problems
15.2.1 Self-Similarity of Packet Arrival Processes
In the context of the Internet, the most studied arrival process is the packet arrivalprocess Prior to the work by Lelandet al [27], the Poisson model was the mostcommonly usedmodel for network traf®c Lelandandco-workers point out that on alocal-area network (LAN) the packet arrival process shows self-similar behavior Anobvious pitfall of the Poisson model, compared to a self-similar model, is that itsaggregation behavior differs substantially from the self-similar model; aggregatedself-similar traf®c stays burstier than aggregatedPoisson traf®c If one assumes aPoisson arrival process of packets, the amount of buffering needed in an ATM switchshouldbe fairly small Yet, performance of ATM switches, in terms of loss rateimproved signi®cantly when adding larger buffers [19, 38] Willinger et al [42]explain the self-similar nature of Ethernet traf®c observedon the packet level as theresult of a superposition of many on=off sources (also referredto as packet trains[25]), where the lengths of the on and off periods are drawn from heavy-taileddistributions
Paxson andFloyd[35] show that wide-area network (WAN) traf®c at the packetlevel is of asymptotically self-similar nature, that is, self-similar behavior over largetime scales They propose a structural model to explain the asymptotic self-similarity
in terms of the characteristics of the main applications Their model is based on anM=G=1 model originally due to Cox [9]: session arrivals are assumed to be Poisson
or, more generally, of renewal-type; session duration (in seconds) or session size (inbytes) is required to be heavy tailed Accordingly, traf®c can be seen as thesuperposition of user activities that arrive according to a Poisson process butcarry an amount of work that is drawn from a heavy-tailed distribution The authors
®ndthat for wide-area traf®c the arrival process of sessions is consistent withPoisson processes andthat their durations are heavy tailed Park et al [36] andCrovella andBestavros [8] showeda possible causal relationship between heavy-tailedtransfer durations and®le sizes on UNIX ®le systems
15.2.2 Self-Similarity of Connection Arrival Processes
To be able to study the arrival process of sessions, Paxson and Floyd equate sessionswith TCP connections of applications such asTELNETandFTPconnections, which, atthe time of the study (1995), were among the most popular applications on the
Trang 4Internet They point out that other arrival processes includingFTP DATAconnectionarrivals andSMTPconnection arrivals are not consistent with Poisson processes.
A Poisson process arises when users initiate actions more or less independently.While this may have been the case forFTPandTELNETsessions in 1995, the Internethas changedsince then It has grown beyondexpectation so that an enormousamount of information is available, and its dominant application is the World WideWeb (or the Web) The Web contributes more than half of the packets andmore thantwo-thirds of all TCP connections to the traf®c of commercial ISPs such as AT&T'sWorldNet During a Web user session, a user is likely to download not just one singleWeb object, since most Web pages consist of 3 to 4 embedded images [4, 18], butalso different Web pages Indeed, he=she is very likely to download a set of Webpages from different Web sites or to initiate multiple FTP downloads Therefore,equating a user session that includes Web browsing with a TCP connection isquestionable Indeed, with the growth of the Internet, a person using the Internet ismore likely to initiate more than one operation during a session (independent of thekindof application FTP, TELNET, or Web) This breaks the Poisson paradigm andsuggests a new look at the burstiness of the TCP connection arrival process In fact,
we show that WEBconnection arrivals show self-similar behavior This change is anillustration of how the characteristics of the Internet can change over time.While it is still true that users arrive in a more or less memoryless fashion, thisdoes not translate into TCP connections arriving in a memoryless way Feldmann et
al [16] show that Cox's model is still applicable but should be applied to a higherlevel abstraction of a user session Feldmann et al [16] explain that the connectionarrival process is self-similar using Cox's model by showing that the number of TCPconnections initiatedby a user session (in this case a modem call) is heavy-tailed.15.2.3 Distribution of TCP Connection Interarrival Times
TCP connection interarrival times are derived from the TCP arrival process byconsidering the distribution of the times between consecutive arrivals of TCPconnections The fact that the TCP connection arrival process is self-similarindicates that distributions with heavy tails such as Weibull, Pareto, and lognormaldistributions may yield better models for TCP connection interarrival times thanexponential models We augment previous results with statistical evidence that this isindeed the case We show that the Weibull distribution yields a good ®t for theconnection interarrival times of Web connections over time periods of an hour ortwo Deng [11], independent of our results, also found that the Weibull distributiongives a good®t
Indeed, the Weibull distribution gives a good ®t not just for Web connections butfor all connections of a speci®c type (such asFTPandTELNET) that arrivedwithin atime period of an hour, two hours, or even days The same holds true even if weinclude connections of all applications, when calculating the interarrival times over aspeci®c time period, or if we consider all connections in a given trace While theWeibull distribution gives a good ®t when the set of connections is enlarged toinclude more connections, it also gives a good ®t if the set of connections is reduced
Trang 5Indeed, if we isolate traf®c sources (e.g., machines that serve a number of users, such
as computer servers) or consider only connections from a subset of the sources, weobserve that the Weibull distribution still results in better ®ts than the other models
A possible explanation for this somewhat surprisingly good®t is that the Weibulldistribution is a rather ¯exible distribution that seems to capture small to largeinterarrival times Applications such as the Web create interarrival times from veryshort to very long Very short interarrival times are createdwhen a Web page hasmultiple embedded images Currently, each of these images will be downloadedusing a separate TCP connection andtypical browsers open up to four parallel TCPconnections Persistent connections [37] may reduce the number of TCP connectionsfrom four to two andmay transfer multiple Web objects over the same TCPconnections Still the Web protocol is responsible for creating some very shortinterarrival times Once a user starts browsing the Web, he=she usually visits a set ofdifferent Web pages This browsing and reading of Web pages generates interarrivaltimes that are of intermediate size Once the browsing session is done a user is likely
to take a long break, lets say for lunch, dinner, or even a vacation, thus creating longinterarrival times As such, it is no surprise that Web connection interarrival timesare heavy tailed
We are by no means claiming that the arrival process is a stationary process thatcan be completely described by an independent identical distribution (i.i.d.) arrivalprocess The traces show a substantial dependence on the time of day Yet, depending
on the application, it might be more important to understand the small time scalebehavior than matching a longer time scale behavior To this extent we presentevidence that using an i.i.d Weibull model can yield more accurate performanceprediction than a nonstationary Poisson process that matches the number of arrivalsover time periods as small as minutes Our example application is signaling on asingle edge network
15.2.4 Implications of the Burstiness of TCP Connections
The burstiness of connection interarrival times has implications on the design andevaluation of algorithms such as signaling anddynamic routing as well as resourceallocation for Web servers
Signaling Signaling consists of two subproblems: connection-admission controland routing Connection-admission control decides which connections to accept
or reject while routing chooses along which (multihop) path through thenetwork to reserve resources The connection arrival process affects not onlythe CPU resources necessary to perform the algorithms, but also the networkresources requiredto achieve a desiredlevel of blocking
Depending on a router's CPU resources, it may not be feasible to do animplicit setup for every connection In this case one has to ®ndthe mostbene®cial subset and, in addition, a different way of grouping the packets [18].While different groupings may reduce the burstiness, they do not eliminate it
Trang 6Connection-level simulations [14] show that an increase in burstiness of theconnection arrival stream increases the level of blocking in connection-admis-sion control This can be even more severe for multihop paths if several links onthe path are affected.
Dynamic Routing Currently, routing in the Internet is completely static.Dynamic routing (e.g., on a per connection level) is a candidate for takingadvantage of shifts in the traf®c matrix to improve traf®c engineering Yet, ifarrivals are too bursty, this may not leadto much improvement [40]
Benchmarking ofWeb Servers Web servers often allocate resources, such asprocesses, on a per TCP connection basis SURGE [3] is a realistic Webworkloadgeneration tool that mimics a set of real users accessing a Web server
It creates a bursty connection arrival process, which exercises the resourceallocation policies of Web servers In this way it is able to detect signi®cantperformance problems that wouldhave gone unnoticedusing other benchmarksthat do not generate a bursty arrival process
15.3 OVERVIEW OF NETWORK TRAFFIC TRACES
Our analysis of TCP connections is basedon trace analysis of transmission controlprotocol=Internet protocol (TCP=IP) internetwork traf®c The traces were collected
on three different Ethernet segments at Carnegie Mellon University (CMU), AT&TBell Laboratories, andAT&T Labs±Research using the tcpdump packet capture tool[24] running the Berkeley Packet Filter [28] The number of packets that tcpdumpreporteddroppedby the kernel was negligible
The ®rst set of traf®c data was collected on an Ethernet segment of the School ofComputer Science at CMU that is one of a total of 18 Ethernet segments that arebridged through a backbone with an aggregate 0.5 Gbit=s throughput The work-station we usedfor the trace collection (DEC Alpha 400=300 with 64 Mbytes RAM)
is connectedto a segment that connects approximately 120 systems, including UNIXworkstations, MacIntoshs, andPCs The secondtraf®c data set was collectedon anEthernet segment at AT&T Bell Laboratories The thirdtraf®c data set was collected
on the same Ethernet segment shortly after the split of AT&T Bell Laboratories intoAT&T Labs±Research andLucent Bell Laboratories The workstation we usedforthe secondset of traces (SGI with one 134 MHz MIPS R4600 processor and
64 Mbytes RAM) was connectedto AT&T Bell Laboratories' internal network whilethe workstation for the thirdset of traces (SGI with one 100 MHz MIPS R4000processor and64 Mbytes RAM) was connectedto an Ethernet segment outsideAT&T's ®rewall During some trace collection periods all TCP connections(including all World Wide Web access) to the external Internet passed this Ethernetsegment We refer to the secondset of traces as the internal AT&T traces andto thethirdset as the external AT&T traces
Trang 7Traces were restrictedto TCP traf®c only; all user datagram protocol (UDP)traf®c was discarded Thus roughly one-third of all traf®c at CMU and on theinternal AT&T network was discarded but almost none on the external AT&Tnetwork Note that neither of these networks carriedany substantial amount ofMBONE traf®c.
To be able to collect data over a reasonable time period only those TCP packetsthat are involvedin the TCP connection establishment handshakes between thesource anddestination pair were collected, that is, those packets that either have theirsynchronize sequence numbers ¯ag (SYN) or their ®nish ¯ag (FIN) set in the TCPheader (ignoring RST packets)
From the traces one can derive the arrival time of TCP connections, their sourceand destination, their application, their durations, and the number of bytes trans-ferred(excluding TCP=IP overhead) using the tcp-conn tool [22] The CMU datasetscover nearly 161 hours over 8 different days and the AT&T datasets more than 1290hours over 52 days
The packets with SYN, FIN ¯ags are classi®edaccording to the application thatgeneratedthem We distinguishedthe traf®c classes shown in Table 15.1 Thisclassi®cation to applications is basedon the port numbers of the packets All packetswith port numbers that ®t none of these applications are collectedin a separate class.Table 15.2 presents a breakdown of the observed TCP connection arrivalsaccording to application classes First we note that the traf®c volume and traf®cmix are highly dependent on the kind of network we were monitoring and when wewere monitoring them The observedchange in the CMU traf®c mix betweenDecember 1994 andJune 1995 re¯ects the increasing popularity of the WorldWide
TABLE 15.1 Traf®c Classes
HTTP: Packets generatedby WorldWide Web applications such as Netscape andMosaic.X: Packets generatedby the X11 window system, for example, by xterms
RFS: CMU uses three remote ®le systems: the Brunhoff remote ®le system, the Andrew
®le system (AFS), andthe network ®le system (NSF) Of these three the Brunhoffremote ®le system is the only one that uses TCP connections
SMTP: Packets generatedby the simple mail transfer protocol (smtp)
FTP: Packets generatedby the ®le transfer protocol (ftp) This includes ftp-control
connections as well as ftp- data connections Sometimes it is desirable todistinguish between these two types of connections In this case we refer to them
asFTP.CONTROLandFTP.DATA
POP: Packets generatedby the post of®ce protocol
TELNET: Packets generatedby the remote terminal protocol
NNTP: Packets generatedby the network news transfer protocol
FINGER: Packets associatedwith the user information query application
CUSTOM: Packets generatedby custom, CMU, or AT&T-speci®c applications
PRINTER: Packets generatedby spooling ®les to networkedprinters
PROTO: Packets generated by a collection of protocols, including domain name service
protocol andecho protocol
Trang 8Web andthe fact that CMU is phasing outRFS The trace collectedon the internalnetwork at AT&T Bell Laboratories shows a very low overall volume but contains avery high percentage of X connections Given that most machines connectedto thisEthernet are X-terminals andthat most people work on the machine to which theirX-terminal is connected, this is not surprising Comparing the three external AT&Ttraces one can see the effects of certain network changes Most traf®c observedinthe periodbetween 18 November and8 December are accesses to AT&T BellLaboratories' WorldWide Web server that is connectedto this Ethernet segment.While the January dataset still contains suchHTTPaccesses it is dominated byHTTPtraf®c that originates at AT&T Labs±Research but accesses other machines in theInternet The ®nal dataset also contains all traf®c between Lucent Bell LaboratoriesandAT&T Labs±Research's Web server andbetween AT&T andLucent Web server(using a different port number) After we accounted for this, the percentage ofHTTPtraf®c is 65.83% andthe percentage of SMTPtraf®c is 28.55% This explains thehuge difference in the number of collected packets.
TABLE 15.2 Breakdown of TCP Connection Packets According to Application Typefor a Subset of the Traces
AT&T ExternalAT&T ExternalAT&T ExternalAT&TStart 9 Dec 94 29 June 95 8 Dec 95 18 Nov 95 15 Jan 96 15 Mar 96
12 a.m 12 a.m 12 a.m 12 a.m 9 p.m 1:30 p.m.End10 Dec 94 31 June 95 23 Dec 95 8 Dec 95 19 Jan 96 31 Mar 96
3 p.m 12 a.m 12 a.m 12 a.m 10 a.m 2:30 p.m
Trang 9All in all the traf®c mixes observedon three Ethernets (two carrying mostly area traf®c, and one, at AT&T, carrying mostly wide-area traf®c) re¯ect different, butnot uncommon, TCP connection usage patterns.
local-We have appliedthe same analysis techniques to some of the LBL andDECtraces collectedprior to 1996 that were usedfor the study by Paxson andFloyd[35](available through the Internet Traf®c Archive [21]) Most of the results are similar
to the results we get for the CMU andAT&T datasets In the cases where the resultsdiffer we will brie¯y comment on how and why they differ The key to understandingthe difference is that the use of the network has changed with more and more peopleusing the Web andbetter andbetter wide-area networking connectivity The maindifference between the traces is that the amount ofHTTPtraf®c in some of the tracesconsidered here is substantially larger than in the LBL and DEC traces weconsidered
In addition, we selectively analyze more recent traces collected from a cial ISP, AT&T's WorldNet For these traces the Web is the dominant application.HTTPtransfers are responsible for more than 85% of all of the TCP connections andmore than 50% of all the packets The difference between the percentages is becauseWeb downloads on average involve a smaller number of packets than some of theother applications
commer-15.4 SELF-SIMILARITY OF CONNECTION ARRIVAL PROCESSIntuitively, an arrival process is consistent with a Poisson process if it is asuperposition of many independent sources whose activity is more or less memory-less Before the Web became the dominant application of the Internet, this wasmostly true for the then dominant applications: TELNET and FTP CONTROL Justbecause a user hadinitiatedone telnet session didnot necessarily imply that he=shewas going to be more likely to initiate another telnet session The Web changedthisprinciple Once a user starts browsing he=she is much more likely to downloadanother set of Web pages than to stop after just one Web page After all, ®nding the
``right'' information on the Web is not always easy This implies that the arrivalprocess may not be consistent with a Poisson process anymore Indeed, the arrivalprocess of TCP connections shows self-similar behavior
To support this claim Fig 15.1 shows a series of plots of the arrival process (i.e.,number of connection arrivals per time unit) forHTTPconnection arrivals for threedifferent choices of time units for the external AT&T trace from 18 November to 8December 1995 Starting with a time unit of 312.5 minutes (Fig 15.1(a)), thesubsequent plot is obtainedfrom the previous one by decreasing the time unit by afactor of 25, increasing the time resolution by the same factor, andconcentrating on
a random subinterval of the previous plot All plots look very ``similar'' to each otherregardless of the chosen time scale The burstiness of the connection arrival does notseem to decrease as the time resolution is decreased Rather we see the same degree
of variability for all time resolution Similar plots can be obtainedfor other datasets
as well as other applications or shorter time intervals
Trang 10The degree of burstiness over different time scales or the extent of self-similaritycan be expressedwith just one single parameter, the Hurst parameter [5] For self-similar processes its value is between 0.5 and1 andthe degree of self-similarityincreases as the Hurst parameter approaches 1 More formally, a covariance-
Fig 15.1 Pictorial indication of self-similarity: number of connection arrivals over time fortraf®c classHTTPon three different time scales
Trang 11stationary process X Xk: k 1 is called asymptotically similar (with similarity parameter H, 0 < H < 1), if for all large enough m,
self-X m1 HX m;where X m X m k: k 1 is the aggregated process of order m, given by
X m k m1 X k 1m1 Xkm; k 1:
The process under consideration is the number of connection arrivals per time unit
In the past [5, 7, 20, 27, 35, 41] various graphical tools, such as ``variance±timeplots,'' ``pox plots of R=S,'' and``periodogram plots,'' andstatistical tools, such as
``periodogram-based MLE estimate,'' have been used to estimate the Hurst meter Using these tools on theHTTPconnection arrival from the busy hour of theexternal AT&T dataset from March, the estimates are ^H 0:749 for the variance±time plot; ^H 0:764 for the pox plot of R=S; ^H 0:796 for the periodogram plot(not restrictedto the lower frequencies); ^H 0:737 with a 95% con®dence intervalfrom 0.715 to 0.758 for the MLE Whittle estimate basedon the periodogram.Similar estimates have been derived for other subsets of the traces
para-More recently, Abry andVeitch [1, 2] proposeda wavelet-basedtechnique foranalyzing long-range dependent data and for estimating the associated Hurstparameter Yet, more important, their methodallows the identi®cation of scalingregions, breakpoints, andnonscaling behavior andyields an unbiasedestimator forthe Hurst parameter (see Feldmann et al [16] for examples) Abry and Veitch'smethodutilizes the ability of wavelets to ``localize'' a signal in both time andscale(see Kaiser [26] for an introduction to wavelets, and Daubechies [10] for a moremathematical treatment of the subject)
Given a process X , the discrete wavelet transform of the process will result in aset of wavelet coef®cients dj;k that capture the contribution of the process at scale jandtime 2jk If X is a self-similar process with Hurst parameter H 2 1
2; 1, thenAbry andVeitch [1] have shown that the expectation of the energy Ejthat lies within
a given bandwidth 2 j aroundfrequency 2 jl0 is given by
Trang 12If we apply the wavelet technique to our datasets, we con®rm our suspicion thattheHTTPconnection arrival process is asymptotically self-similar Figure 15.2 showsthe scaling plots for some of the busier two hour periods of the datasets All plotsshow a clear nonhorizontal scaling region from about scale 4±6 to scale 12, verifyingthe assumption of self-similar nature of the arrival process The corresponding Hurstparameter estimation results, basedon the larger scales, are all around ^H 0:7 Toestimate simple trends (e.g., linear and x2trends) we used Daubechies wavelets [10]with three vanishing moments.
To contrast these results with the analysis of Paxson andFloyd[35], Fig 15.3shows the results of the scaling analysis for two subsets of the DEC-PKT- 4 dataset.The ®rst dataset contains allFTP CONTROLconnection arrivals while the secondonecontains allHTTPconnection arrivals In line with the analysis results reportedbyPaxson andFloyd[35], the arrival process ofFTP CONTROLconnections gives a Hurst
Trang 13parameter estimation of 0.5, which is consistent with a Poisson process Yet, even inthis dataset, the process that counts the number ofHTTP connections is consistentwith a self-similar process with Hurst parameter of ^H 0:65.
To underline the observations that the number of Web connection arrivals isconsistent with asymptotic self-similar behavior, Fig 15.4 shows the results of thescaling analysis for data collected on AT&T WorldNet's network The ®rst datasetwas collected on 14 August 1998 at 10:30 p.m on a T3 backbone link while thesecond dataset was collected on 23 July 1997 at 7 p.m on a FDDI ring carryingtraf®c from roughly 420 modems used by WorldNet dialup customers (for more
Trang 14details see Feldmann et al [13, 15]) In the ®rst dataset 87.6% of the TCPconnections are Web connections, of which there are a total of 1,187,866 In thedial dataset at least 85.8% of all TCP connections are Web connections, of whichthere are a total of 51,027 Web connections Again both dataset as well as manyothers show that the arrival process is consistent with self-similarity with a Hurstparameter of ^H 0:7.
15.5 CHARACTERIZATION OF CONNECTION INTERARRIVAL TIMES
In this section we ®t distributions with heavy tails to the empirical distribution ofTCP connection interarrival times andexplore their goodness of ®t To this endwe
®rst outline the methods used to analyze interarrival time distribution Next, we showthat the Weibull distribution in particular gives a good ®t for TCP connectioninterarrival times and®nish by discussing recent results of using heavy-taileddistributions in traf®c modeling
15.5.1 Modeling of Empirical Distribution
Until recently [33±35, 42] traf®c has almost exclusively been modeled by tial distributions In the last few years it has been shown that this is insuf®cient andthat heavy-taileddistributions are more appropriate andcan be usedto explain self-similar behavior or long-range dependent behavior Given a random variable X withdistribution function F x P X x, its distribution function F x is calledheavytailedif its tail 1 F x P X > x decreases subexponentially for large values of
exponen-x This means that if a random variable represents waiting time, then for a randomvariable with a heavy-tailed distribution the longer the already accumulated waitingtime is the lower in the likelihoodof the next arrival within the following timeinterval
The models we consider are based on the exponential, the Weibull, the Pareto, orthe lognormal probability distributions Past work has concentrated on the expo-nential, the lognormal, and the Pareto distribution, but not the Weibull distribution.Often, if the conditions of ``strict randomness'' of the exponential distribution are notsatis®ed, the Weibull distribution is a suitable alternative [23] The Weibulldistribution is a generalization if the exponential distribution in the sense that avariable x has a Weibull distribution of y x=ac has an exponential distributionwith probability density function p y e y As the value of c decreases theprobability of longer as well as shorter values increases, andthe burstiness of thetraf®c increases The de®nitions and the maximum likelihood estimators for all theprobability distributions can be found in Tables 15.3 and 15.4 For more details seeJohnson andKotz [23] Note that all but the exponential distributions are two-parameter distributions and can have heavy tails
For the analysis of the traf®c traces described above we follow the approach ofusing goodness-of-®t measures suggested by Paxson [34, 35] for the analysis ofwide-area TCP connections To judge if the ®t of one model is better than the other
Trang 15we use a discrepancy estimate ^l2, which has an estimatedvariance of ^v ^l2 [31, 34]and say that one model is better than another if the value of its estimated discrepancyminus its variance is larger than the other model's estimated discrepancy plus itsvariance.
The discrepancy ^l2 is computedby estimating the values of the experimentalcumulative probability for a set of bins andcomparing it to the values of the actualcumulative probability function Therefore, the computation depends on the choice
of a number of bins andthe spacing of the bins Following suggestions by Scott [39]andMann andWald[32], andPaxson [34], we space the bins logarithmically wherethe number of bins is w 3:49^sn 4=9if n is the number of observations Adjacentbins are combinedif the number of observations in the bins is less than ®ve
TABLE 15.4 Maximum Likelihood Estimators for Several ProbabilityDistributions
Probability Distribution Maximum LikelihoodEstimator
i1xi x
i1x^c i
1=^c
^c Pni1x^c
ilog xi
i1x^c i
n
Pn i1log xi
TABLE 15.3 De®nition of Several Probability Distributions
a
xa
z log x z=s xp12pse log x z
2 =2s 2
No closedform ezs 2 =2
Trang 1615.5.2 Modeling Interarrival Times of TCP Connections
The ®rst indication that the Weibull distribution might be a good model is obtained
by plotting the standardized skewness versus the coef®cient of variation for bothmodels and datasets and observing that the points of the datasets are clusteredaround the Weibull distribution Figure 15.5(a) demonstrates this for eight completedatasets, Fig 15.5(b) for eight hour periods of the external AT&T dataset from 18November to 8 December 1995, andFig 15.5(c) for one hour periods of all Webrequests of the external AT&T dataset from 18 November to 8 December 1995.The arrival processes are by no means stationary Indeed, Fig 15.6(a) shows howthe number of Web connections changes over time for the CMU trace from 29 June.Each point represents the number of Web connections that arrivedwithin a
15 minute interval For the longer external AT&T trace from 18 November, Fig.15.6(b) shows how many Web connections arrivedwithin each four hour timeperiod These graphs show clear time of day and day of week dependencies.Nevertheless, we will show in this section that the Weibull distribution provides agood®t for the connection interarrival times of TCP connections over all differentmeasurement periods We start by ®rst examining the distribution of all connection
(c) 18 Nov AT&T one hour of Web dataFig 15.5 Skewness