The Bootstrapping Technique for Estimating Mean Sq- 123docz.net

Suppose now thatX1, . . . ,Xnare independent random variables having a common distribution functionF, and suppose we are interested in using them to estimate some parameterθ(F)of the distributionF. For example,θ(F)could be (as in the previous sections of this chapter) the mean ofF, or it could be the median or the variance ofF, or any other parameter of F. Suppose further that an estimator of θ(F)—call itg(X1, . . . ,Xn)—has been proposed, and in order to judge its worth as an estimator ofθ(F)we are interested in estimating its mean square error. That is, we are interested in estimating the value of

MSE(F)≡EF[(g(X1, . . . ,Xn)−θ(F))2]

[where our choice of notation MSE(F)suppresses the dependence on the estimator g, and where we have used the notationEFto indicate that the expectation is to be taken under the assumption that the random variables all have distributionF]. Now whereas there is an immediate estimator of the above MSE—namely,S2/n—when θ(F)=E[Xi] andg(X1, . . . ,Xn)=X, it is not at all that apparent how it can be estimated otherwise. We now present a useful technique, known as the bootstrap technique, for estimating this mean square error.

To begin, note that if the distribution function F were known then we could theoretically compute the expected square of the difference between θ and its estimator; that is, we could compute the mean square error. However, after we observe the values of the n data points, we have a pretty good idea what the underlying distribution looks like. Indeed, suppose that the observed values of the data are Xi =xi,i =1, . . . ,n. We can now estimate the underlying distribution function F by the so-called empirical distribution functionFe, whereFe(x), the estimate of F(x), the probability that a datum value is less than or equal tox, is just the proportion of thendata values that are less than or equal tox. That is,

Fe(x)= number ofi:Xi ⩽x n

Another way of thinking aboutFeis that it is the distribution function of a random variableXewhich is equally likely to take on any of thenvaluesxi,i =1, . . . ,n.

(If the valuesxi are not all distinct, then the above is to be interpreted to mean that Xewill equal the valuexi with a probability equal to the number of j such that xj = xi divided by n; that is, ifn = 3 and x1 = x2 = 1,x3 = 2, then Xe is a random variable that takes on the value 1 with probability 23 and 2 with probability 13.)

Now if Feis “close” to F, as it should be whenn is large [indeed, the strong law of large numbers implies that with probability 1,Fe(x)converges toF(x)as n → ∞, and another result, known as the Glivenko–Cantelli theorem, states that this convergence will, with probability 1, be uniform inx], thenθ(Fe)will probably be close toθ(F)—assuming thatθis, in some sense, a continuous function of the distribution—and MSE(F)should approximately be equal to

MSE(Fe)=EFe[(g(X1, . . . ,Xn)−θ(Fe))2]

In the above expression the Xi are to be regarded as being independent random variables having distribution function Fe. The quantity MSE(Fe)is called the bootstrap approximation to the mean square errorMSE(F).

To obtain a feel for the effectiveness of the bootstrap approximation to the mean square error, let us consider the one case where its use is not necessary—namely, when estimating the mean of a distribution by the sample mean X. (Its use is not necessary in this case because there already is an effective way of estimating the mean square errorE[(X−θ)2]=σ2/n—namely, by using the observed value of S2/n.)

146 8 Statistical Analysis of Simulated Data

Example 8d Suppose we are interested in estimatingθ(F)=E[X] by using the sample meanX =n

i=1Xi/n. If the observed data arexi,i =1, . . . ,n, then the empirical distribution Fe puts weight 1/n on each of the pointsx1, . . . ,xn (combining weights if the xi are not all distinct). Hence the mean of Fe is θ(Fe) = x = n

i=1xi/n, and thus the bootstrap estimate of the mean square error—call it MSE(Fe)—is given by

MSE(Fe)=EFe

⎡

⎣ n

i=1

Xi n −x

2⎤

⎦

whereX1, . . . ,Xnare independent random variables each distributed according to Fe. Since

EFe n

i=1

Xi n

=EFe[X]=x it follows that

MSE(Fe)=VarFe

i=1

Xi n

= VarFe(X) n Now

VarFe(X)=EFe[(X−EFe[X])2]

=EFe[(X−x)2]

= 1 n

i=1

(xi−x)2

and so

MSE(Fe)= n

i=1(xi−x)2 n2

which compares quite nicely with S2/n, the usual estimate of the mean square error. Indeed, because the observed value ofS2/n isn

i=1(xi−x)2/[n(n−1)],

the bootstrap approximation is almost identical.

If the data values areXi =xi,i =1, . . . ,n, then, as the empirical distribution functionFeputs weight 1/non each of the pointsxi, it is usually easy to compute the value ofθ(Fe): for example, if the parameter of interestθ(F)was the variance of the distributionF, thenθ(Fe)=VarFe(X)=n

i=1(xi−x)2/n. To determine the bootstrap approximation to the mean square error we then have to compute

MSE(Fe)=EFe[(g(X1, . . . ,Xn)−θ(Fe))2]

However, since the above expectation is to be computed under the assumption that X1, . . . ,Xnare independent random variables distributed according to Fe, it follows that the vector(X1, . . . ,Xn)is equally likely to take on any of thenn possible values(xi1,xi2, . . . ,xin),ij ∈ {1,2, . . . ,n},j =1, . . . ,n. Therefore,

MSE(Fe)=

ã ã ã

[g(xi1, . . . ,xin)−θ(Fe)]2 nn

where eachijgoes from 1 ton, and so the computation of MSE(Fe)requires, in general, summingnnterms—an impossible task whennis large.

However, as we know, there is an effective way to approximate the average of a large number of terms, namely, by using simulation. Indeed, we could generate a set ofnindependent random variablesX11, . . . ,Xn1each having distribution function Feand then set

Y1= g

X11, . . . ,Xn1

−θ(Fe)2

Next, we generate a second setX12, . . . ,Xn2and compute Y2=

X12, . . . ,Xn2

−θ(Fe)2

and so on, until we have collected the variablesY1,Y2, . . . ,Yr. Because theseYi are independent random variables having mean MSE(Fe), it follows that we can use their averager

i=1Yi/ras an estimate of MSE(Fe).

Remarks

1. It is quite easy to generate a random variable X having distribution Fe. Because such a random variable should be equally likely to bex1, . . . ,xn, just generate a random numberUand setX =xI, whereI =Int(nU)+1. (It is easy to check that this will still work even when thexiare not all distinct.) 2. The above simulation allows us to approximate MSE(Fe), which is itself an approximation to the desired MSE(F). As such, it has been reported that roughly 100 simulation runs—that is, choosingr = 100—is usually

sufficient.

The following example illustrates the use of the bootstrap in analyzing the output of a queueing simulation.

Example 8e Suppose in Example8athat we are interested in estimating the long-run average amount of time a customer spends in the system. That is, letting Wi be the amount of time theith entering customer spends in the system,i ⩾1, we are interested in

θ ≡ lim

n→∞

W1+W2+ ã ã ã +Wn n

148 8 Statistical Analysis of Simulated Data

To show that the above limit does indeed exist (note that the random variablesWi are neither independent nor identically distributed), letNi denote the number of customers that arrive on dayi, and let

D1=W1+ ã ã ã +WN1 D2=WN1+1+ ã ã ã +WN1+N2

and, in general, fori>2,

Di =WN1+ããã+Ni−1+1+ ã ã ã +WN1+ããã+Ni

In words,Diis the sum of the times in the system of all arrivals on dayi. We can now expressθas

θ= lim

m→∞

D1+D2+ ã ã ã +Dm

N1+N2+ ã ã ã +Nm

where the above follows because the ratio is just the average time in the system of all customers arriving in the firstmdays. Upon dividing numerator and denominator bym, we obtain

θ= lim

m→∞

(D1+ ã ã ã +Dm)/m (N1+ ã ã ã +Nm)/m

Now as each day follows the same probability law, it follows that the random variables D1, . . . ,Dm are all independent and identically distributed, as are the random variablesN1, . . . ,Nm. Hence, by the strong law of large numbers, it follows that the average of the firstmof theDiwill, with probability 1, converge to their common expectation, with a similar statement being true for theNi. Therefore, we see that

θ= E[D]

E[N]

whereE[N] is the expected number of customers to arrive in a day, andE[D] is the expected sum of the times those customers spend in the system.

To estimateθwe can thus simulate the system overkdays, collecting on theith run the data Ni,Di, whereNi is the number of customers arriving on dayi and Di is the sum of the times they spend in the system,i =1, . . . ,k. Because the quantityE[D] can then be estimated by

D= D1+D2+ ã ã ã +Dk k

andE[N] by

N = N1+N2+ ã ã ã +Nk k

it follows thatθ=E[D]/E[N] can be estimated by Estimate ofθ = D

N = D1+ ã ã ã +Dk N1+ ã ã ã +Nk

which, it should be noted, is just the average time in the system of all arrivals during the firstkdays.

To estimate

MSE=E

⎡

⎣k i=1Di k

i=1Ni −θ 2⎤

⎦

we employ the bootstrap approach. Suppose the observed value of Di,Ni is di,ni,i = 1, . . . ,k. That is, suppose that the simulation resulted inni arrivals on day i spending a total time di in the system. Thus, the empirical joint distribution function of the random vectorD,N puts equal weight on thekpairs di,ni,i =1, . . . ,k. That is, under the empirical distribution function we have

PFe{D=di,N =ni} = 1

k, i =1, . . . ,k Hence,

EFe[D]=d = k

i=1

di/k, EFe[N]=n= k

i=1

ni/k and thus,

θ(Fe)= d n Hence,

MSE(Fe)=EFe

⎡

⎣k i=1Di

i=1Ni −d n

2⎤

⎦

where the above is to be computed under the assumption that thekpairs of random vectorsDi,Niare independently distributed according toFe.

Since an exact computation of MSE(Fe)would require computing the sum of kkterms, we now perform a simulation experiment to approximate it. We generate k independent pairs of random vectors Di1,Ni1,i = 1, . . . ,k, according to the empirical distribution functionFe, and then compute

Y1= k

i=1Di1 k

i=1Ni1 −d n

We then generate a second set Di2,Ni2 and compute the corresponding Y2. This continues until we have generated the r values Y1, . . . ,Yr (where r = 100 should suffice). The average of these r values, r

i=1Yi/r, is then used to estimate MSE(Fe), which is itself our estimate of MSE, the mean square error of our estimate of the average amount of time a customer spends in the

system.

150 8 Statistical Analysis of Simulated Data

Remark The Regenerative Approach The foregoing analysis assumed that each day independently followed the same probability law. In certain applications, the same probability law describes the system not over days of fixed lengths but rather over cycles whose lengths are random. For example, consider a queueing system in which customers arrive in accordance with a Poisson process, and suppose that the first customer arrives at time 0. If the random timeTrepresents the next time that an arrival finds the system empty, then we say that the time from 0 toT constitutes the first cycle. The second cycle would be the time from T until the first time point afterT that an arrival finds the system empty, and so on.

It is easy to see, in most models, that the movements of the process over each cycle are independent and identically distributed. Hence, if we regard a cycle as being a “day,” then all of the preceding analysis remains valid. For example,θ, the amount of time that a customer spends in the system, is given byθ=E[D]/E[N], where Dis the sum of the times in the system of all arrivals in a cycle and N is the number of such arrivals. If we now generatek cycles, our estimate of θ is stillk

i=1Di/k

i=1Ni. In addition, the mean square error of this estimate can be approximated by using the bootstrap approach exactly as above.

The technique of analyzing a system by simulating “cycles,” that is, random intervals during which the process follows the same probability law, is called the regenerative approach.

Exercises

1. For any set of numbersx1, . . . ,xn, prove algebraically that n

i=1

(xi−x)2= n

i=1

xi2−nx2 wherex=n

i=1xi/n.

2. Give a probabilistic proof of the result of Exercise 1, by letting X denote a random variable that is equally likely to take on any of the valuesx1, . . . ,xn, and then by applying the identity Var(X)=E[X2]−(E[X])2.

3. Write a program that uses the recursions given by Equations (8.6) and (8.7) to calculate the sample mean and sample variance of a data set.

4. Continue to generate standard normal random variables until you have generated n of them, wheren ⩾ 100 is such that S/√

n < 0.1, where S is the sample standard deviation of thendata values.

(a) How many normals do you think will be generated?

(b) How many normals did you generate?

(d) What is the sample variance?

(e) Comment on the results of (c) and (d). Were they surprising?

5. Repeat Exercise 4 with the exception that you now continue generating standard normals untilS/√

n<0.01.

6. Estimate1

0 exp(x2)d xby generating random numbers. Generate at least 100 values and stop when the standard deviation of your estimator is less than 0.01.

7. To estimateE[X],X1, . . . ,X16have been simulated with the following values resulting: 10, 11, 10.5, 11.5, 14, 8, 13, 6, 15, 10, 11.5, 10.5, 12, 8, 16, 5. Based on these data, if we want the standard deviation of the estimator ofE[X] to be less than 0.1, roughly how many additional simulation runs will be needed?

Exercises 8 and 9 are concerned with estimating e.

8. It can be shown that if we add random numbers until their sum exceeds 1, then the expected number added is equal toe. That is, if

N =min

n i=1

Ui>1

thenE[N]=e.

(a) Use this preceding to estimatee, using 1000 simulation runs.

(b) Estimate the variance of the estimator in (a) and give a 95 percent confidence interval estimate ofe.

9. Consider a sequence of random numbers and letMdenote the first one that is less than its predecessor. That is,

M =min{n:U1⩽U2⩽ã ã ã⩽Un−1>Un} (a) Argue that P{M >n} = n!1,n⩾0.

(b) Use the identity E[M]=∞

n=0P{M >n}to show thatE[M]=e.

(d) Estimate the variance of the estimator in (c) and give a 95 percent confidence interval estimate ofe.

10. Use the approach that is presented in Example 3a of Chapter 3 to obtain an interval of size less than 0.1, which we can assert, with 95 percent confidence, containsπ. How many runs were necessary?

11. Repeat Exercise 10 when we want the interval to be no greater than 0.01.

12. To estimateθ, we generated 20 independent values having mean θ. If the successive values obtained were

102, 112, 131, 107, 114, 95, 133, 145, 139, 117 93, 111, 124, 122, 136, 141, 119, 122, 151, 143 how many additional random variables do you think we will have to generate if we want to be 99 percent certain that our final estimate ofθ is correct to within±0.5?

152 8 Statistical Analysis of Simulated Data

13. Let X1, . . . ,Xnbe independent and identically distributed random variables having unknown mean μ. For given constants a < b, we are interested in estimating p=P{a<n

i=1Xi/n−μ <b}.

(a) Explain how we can use the bootstrap approach to estimate p.

(b) Estimatepifn=10 and the values of theXiare 56, 101, 78, 67, 93, 87, 64, 72, 80, and 69. Takea= −5,b=5.

In the following three exercises X1, . . . ,Xn is a sample from a distribution whose variance is (the unknown)σ2. We are planning to estimateσ2by the sample varianceS2=n

i=1(Xi−X)2/(n−1), and we want to use the bootstrap technique to estimate Var(S2).

14. Ifn=2 andX1=1 andX2=3, what is the bootstrap estimate of Var(S2)? 15. Ifn=15 and the data are

5,4,9,6,21,17,11,20,7,10,21,15,13,16,8 approximate (by a simulation) the bootstrap estimate of Var(S2).

16. Consider a single-server system in which potential customers arrive in accordance with a Poisson process having rate 4.0. A potential customer will only enter if there are three or fewer other customers in the system when he or she arrives. The service time of a customer is exponential with rate 4.2.

No additional customers are allowed in after timeT =8. (All time units are per hour.) Develop a simulation study to estimate the average amount of time that an entering customer spends in the system. Using the bootstrap approach, estimate the mean square error of your estimator.

Bibliography

Bratley, P., B. L. Fox, and L. E. Schrage,A Guide to Simulation, 2nd ed. Springer-Verlag, New York, 1988.

Crane, M. A., and A. J. Lemoine,An Introducion to the Regenerative Method for Simulation Analysis. Springer-Verlag, New York, 1977.

Efron, B., and R. Tibshirani,Introduction to the Bootstrap. Chapman-Hall, New York, 1993.

Kleijnen, J. P. C.,Statistical Techniques in Simulation, Parts 1 and 2. Marcel Dekker, New York, 1974/1975.

Law, A. M., and W. D. Kelton,Simulation Modelling and Analysis, 3rd ed. McGraw-Hill, New York, 1997.

Techniques

Introduction

In a typical scenario for a simulation study, one is interested in determining θ, a parameter connected with some stochastic model. To estimateθ, the model is simulated to obtain, among other things, the output datumXwhich is such that θ =E[X]. Repeated simulation runs, theith one yielding the output variableXi, are performed. The simulation study is then terminated whenn runs have been performed and the estimate ofθis given byX =n

i=1Xi/n. Because this results in an unbiased estimate ofθ, it follows that its mean square error is equal to its variance. That is,

MSE=E[(X−θ)2]=Var(X)= Var(X) n

Hence, if we can obtain a different unbiased estimate ofθhaving a smaller variance than doesX, we would obtain an improved estimator.

In this chapter we present a variety of different methods that one can attempt to use so as to reduce the variance of the (so-called raw) simulation estimateX.

However, before presenting these variance reduction techniques, let us illustrate the potential pitfalls, even in quite simple models, of using the raw simulation estimator.

Example 9a Quality Control Consider a process that produces items sequentially. Suppose that these items have measurable values attached to them and that when the process is “in control” these values (suitably normalized) come from a standard normal distribution. Suppose further that when the process goes

“out of control” the distribution of these values changes from the standard normal to some other distribution.

Simulation. DOI:http://dx.doi.org/10.1016/B978-0-12-415825-2.00009-7

154 9 Variance Reduction Techniques

To help detect when the process goes out of control the following type of procedure, called an exponentially weighted moving-average control rule, is often used. Let X1,X2, . . . denote the sequence of data values. For a fixed value α,0⩽α⩽1, define the sequenceSn,n ⩾0, by

S0=0

Sn=αSn−1+(1−α)Xn, n⩾1,

Now when the process is in control, all the Xn have mean 0, and thus it is easy to verify that, under this condition, the exponentially weighted moving-average valuesSnalso have mean 0. The moving-average control rule is to fix a constant B, along with the value ofα, and then to declare the process “out of control” when

|Sn|exceedsB. That is, the process is declared out of control at the random time N, where

N =Min{n:|Sn|>B}

Now it is clear that eventually|Sn|will exceedBand so the process will be declared out of control even if it is still working properly—that is, even when the data values are being generated by a standard normal distribution. To make sure that this does not occur too frequently, it is prudent to chooseαandBso that, when theXn,n⩾1, are indeed coming from a standard normal distribution, E[N] is large. Suppose that it has been decided that, under these conditions, a value for E[N] of 800 is acceptable. Suppose further that it is claimed that the valuesα=0.9 andB=0.8 achieve a value ofE[N] of around 800. How can we check this claim?

One way of verifying the above claim is by simulation. Namely, we can generate standard normalsXn,n⩾1, until|Sn|exceeds 0.8 (whereα=0.9 in the defining equation forSn). If N1denotes the number of normals needed until this occurs, then, for our first simulation run, we have the output variableN1. We then generate other runs, and our estimate ofE[N] is the average value of the output data obtained over all runs.

However, let us suppose that we want to be 99 percent confident that our estimate ofE[N], under the in-control assumption, is accurate to within±0.1. Hence, since 99 percent of the time a normal random variable is within±2.58 standard deviations of its mean (i.e.,z.005=2.58), it follows that the number of runs needed—call it n—is such that

2.58σn

√n ≈0.1

whereσnis the sample standard deviation based on the firstndata values. Nowσn

will approximately equalσ (N), the standard deviation ofN, and we now argue that this is approximately equal toE[N]. The argument runs as follows: Since we are assuming that the process remains in control throughout, most of the time the value of the exponentially weighted moving average is near the origin. Occasionally, by chance, it gets large and approaches, in absolute value,B. At such times it may go beyondB and the run ends, or there may be a string of normal data values

which, after a short time, eliminate the fact that the moving average had been large (this is so because the old values ofSiare continually multiplied by 0.9 and so lose their effect). Hence, if we know that the process has not yet gone out of control by some fixed timek, then, no matter what the value ofk, it would seem that the value of Sk is around the origin. In other words, it intuitively appears that the distribution of time until the moving average exceeds the control limits is approximately memoryless; that is, it is approximately an exponential random variable. But for an exponential random variable Y,Var(Y) = (E[Y])2. Since the standard deviation is the square root of the variance, it thus seems intuitive that, when in control throughout,σ (N)≈E[N]. Hence, if the original claim that E[N]≈800 is correct, the number of runs needed is such that

√n≈25.8×800 or

n≈(25.8×800)2≈4.26×108

In addition, because each run requires approximately 800 normal random variables (again assuming the claim is roughly correct), we see that to do this simulation would require approximately 800×4.26×108 ≈ 3.41×1011 normal random

variables—a formidable task.

The Bootstrapping Technique for Estimating Mean Square Errors

Conditional Expectation and Conditional Variance

Using Random Numbers to Evaluate Integrals