Th is section demonstrates how simulation data can be managed to account for the peculiarities of the simulation experiment. In particular, we consider the eff ects of lack of independence, absence of normality, and lack of stationarity.
10 9 8 7 6 5 4 3 2 1 0
Standard deviation
1000 2000 3000 4000 5000
Stream 1 Stream 2 Stream 3
Simulation time
Figure 4.2 Plot of standard deviation versus simulation time.
4.3.1 Normality and Independence
In Section 4.2, we indicated that although X is an unbiased estimate of μ, correla-__
tion of the time series data may yield a biased estimate of the variance, S2. In addi- tion, the individual data points are not usually drawn from a normal population.
As a result, the raw data points X1, X2, …, Xn should not be used as sample observa- tions in a statistical test. To alleviate this problem, a single simulation observation is defi ned in the following manner, based on whether the measurement variable is observation-based or time-based (see Section 3.4.1). Let Xj be the jth value of the observation-based variable and suppose that X(t) is the value of the time-based vari- able at time t. Suppose further that X(t) is observed for the time interval (0, T ) and that n values of Xj have been collected. Th en, for statistical analysis in simulation, a single observation is defi ned as
Y = { ∑ j=1n __ Xn Xj j is observation-based
∫
0
T
X(t)dt _____
T X(t) is time-based
Th e defi nitions above are in eff ect batch averages. Th ese averages possess two important properties provided that n (or T ) is suffi ciently large:
1. Successive observations Y1, Y2, …, YN from the same run, though not com- pletely independent, are reasonably uncorrelated.
2. Th e sample averages of Y1, Y2, …, YN are asymptotically normal.
Keep in mind that the two results depend on having n (or T ) “suffi ciently” large.
Although procedures exist for the determination of large n (or T ), these procedures cannot be implemented easily in a simulation experiment (see Ref. [2] for details).
Th e above discussion may give the impression that the suggested defi nition of Y is a guaranteed remedy to the problems of independence and normality. How- ever, the defi nition of the simulation observation as a batch average raises a problem of its own. Batch averages as defi ned above are unbiased estimates so long as the batch size (n in observation-based variables and T in time-based variables) is fi xed for all Yj, i = 1, 2, …, N. A typical simulation experiment usually includes both observation- and time-based variables. If the batch size T is fi xed to accommo- date time-based variables, then batch size n (corresponding to T ) of the associated observation-based variable will necessarily be a random variable. In a similar man- ner, T becomes a random variable if n is fi xed. Th erefore, either the observation- or time-based batch average is biased.
One suggestion for solving this problem is to fi x the value of T and count n within T. When n reaches a prespecifi ed fi xed size, data collection for the observation-based variable is terminated. Th e diffi culty with this approach is that
there is no guarantee that every batch size T will contain (at least) n observation- based data points.
Another suggestion is to defi ne the batch size for each variable (observation- or time-based) independently of all other variables. To implement this procedure, the time intervals defi ning the batch sizes for the diff erent variables will likely overlap.
Additionally, it will not be possible to fi x the simulation run length in advance because the length will be determined by the largest sum of all batch sizes needed to generate N batch averages among all variables. Th is requirement in itself is a major disadvantage because one would not be able to decide in advance whether the simulation should be executing for one minute or one day before termination.
Additionally, in some situations it may be necessary to maintain a very long run simply to generate a desired number of batches for just one variable, even though all the other variables may have reached their allotted target very early during the run.
In practice, the problem of random variability of n or T is all but ignored, because it has no satisfactory solution. As we will show in Section 4.4, batch size is usually decided by fi xing the time-based interval T. Th en n will assume whatever value the observation-based variable accumulates during T. By selecting T suffi - ciently large, it is hoped that the variation in n will not be pronounced, particularly when the data is collected during the steady-state response.
4.3.2 Transient Conditions
Th e above defi nition of a simulation observation Y “reasonably” settles the ques- tions of normality and independence in simulation experiments. Th e remaining issue is transient system response conditions. In practice, the eff ect of the transient state is accounted for simply by collecting data after an initial warm-up period. Th e warm-up period length is of course model dependent. For example, in the single- server model presented in Section 4.2, the transient period is expected to increase as the value of the arrival rate approaches that of the service rate. Indeed, the steady state is never achieved if the two rates are equal. Th e length of the transient period also depends on the initial conditions for which the simulation run is started. For example, in the single-server model, experimentation has shown that the best way to reach the steady state “quickly” is to start the run with the queue empty and the server idle. Th is rule, however, does not produce similar favorable results for more complex models. Some modelers call for initializing complex systems with condi- tions that are similar to steady state. Th e problem here is that such conditions are usually not known a priori. Otherwise, why do we need the simulation?
Unfortunately, methods for detecting the end of the transient state are heuristics and as such do not yield consistent results. Besides, the practical implementation of these procedures within the context of a simulation experiment is usually quite diffi cult. It is not surprising that simulation languages do not off er these procedures as a routine feature. Indeed, it is commonly advised in practice to “observe” the
output of the simulation at equally spaced time intervals as a means for judiciously identifying the approximate starting point of the steady state. Steady state is sup- posed to begin when the output ceases to exhibit excessive variations. However, this approach, not being well defi ned, is not likely to produce dependable results in practice.
In this section, we suggest an approach for estimating the end of the transient state. However, before doing so, we discuss the general ideas governing the develop- ment of currently available heuristics. Th ese ideals are the basis for our procedure.
A heuristic by Conway calls for collecting observations (in the form of Yi as defi ned above) at the fi rst batch that is neither a maximum nor a minimum among all future batches. Gafarian et al. [4] propose a procedure that appears to take the opposite view of Conway’s heuristic. It calls for deleting the fi rst n raw data points if Xn is neither the maximum nor the minimum of X1, X2, …, Xn. Schriber [6]
suggests that the transient period ends approximately at the point in time when the batch means of the K most recent Yis falls within a preset interval of length L.
Another heuristic, advanced by Fishman [2], calls for collecting observations after the raw data time series X1, X2, …, Xn has oscillated above and below the cumula- tive average (X1 + X2 + … + Xn)/n a specifi ed number of times.
Th ese heuristics are all based, more or less, on the following general strategy.
Th e transient state ends when oscillations (variations) in batch means Yi remain within “controllable” limits. Another way of looking at this strategy is to consider the following two measures for each simulation output variable:
1. Th e cumulative mean from the start of the simulation up to times t1, t2, …, tN, where t1, t2, …, tN are equally spaced points over the simulation run whose length is T = tN.
2. Th e cumulative standard deviations associated with these means.
Th ese defi nitions are essentially the same as those developed in Section 4.2 for the single-server queueing model. Th ese two quantities are plotted for points t1, t2, …, tN. We then say that a steady state has prevailed when both measures approximately have reached constant (stable) values. At that point the output of the simulation process has likely become stationary (i.e., time independent).
Th e proposed procedure can be implemented as follows: Perform an “explor- atory” run and plot both the mean and standard deviation of the variable as a func- tion of simulation time. With a visual examination of these plots, we estimate the length of the transient period. A fresh run is made and we begin collecting statistics after the estimated transient period.
Th e proposed “visual” procedure, though not sophisticated, is based on sound principles that are consistent with available heuristics. It has the advantage of being easy to implement. Th e utility of this method is further enhanced with plots for all performance variables in a single graph. Th e user can determine the start of the steady state, by viewing plots of all statistical variables of the model. Th is is particu- larly important because, in general, variables of the same model may reach steady
state at diff erent points in time. In this case, the length of the transient period must be taken suffi ciently long to account for the last variable to reach the steady state.
A typical illustration of the use of the proposed procedure is depicted in Figure 4.3.
Th e fi gure shows the changes in system time and facility utilization together with their standard deviations for a single-server queue. Steady state begins when all four quantities exhibit “reasonable” stability.