Stratified sampling 91= 1N M i=1 i=1 M i=1 The right-hand side of Equation 5.24 contains the variance removed due to use of the proportional ni rather than the naive estimator, the varian
Trang 2Stratified sampling 91
= 1N
M
i=1
i=1
M
i=1
The right-hand side of Equation (5.24) contains the variance removed due to use of the
proportional ni rather than the naive estimator, the variance removed due to use of
the optimal ni rather than the proportional ni , and the residual variance respectively
Now imagine that very fine stratification is employed (i.e M→ ) Then the outcome,
X∈ Si, is replaced by the actual value of X and so from Equation (5.21)
Var
= 1N
VarX X + EX
" 2
YX # (5.25)
Trang 31 2 3 4 5 6
Y
X
Figure 5.2 An example where X is a good stratification but poor control variable
The first term on the right-hand side of Equation (5.25) is the amount of variance removedfrom the naive estimator using proportional sampling The second term is the residualvariance after doing so If proportional sampling is used (it is often more convenient thanoptimum sampling which requires estimation of the stratum variances 2
i
through somepilot runs), then we choose a stratification variable that tends to minimize the residualvariance or equivalently one that tends to maximize VarX X
Equation (5.25) shows that with a fine enough proportional stratification, all the
variation in Y that is due to the variation in EYX can be removed, leaving only theresidual variation EX" 2
YX # This is shown in Figure 5.2 where a scatter plot of
500 realizations of X Y demonstrates that most of the variability in Y will be removedthrough fine stratification It is important to note that it is not just variation in the linearpart of EYX that is removed, but all of it
Suppose we wish to estimate
= E W1+ W25/4where W1and W2are independently distributed Weibull variates with density
fx=3
2x1/2exp−x3/2
on support 0 Given two uniform random numbers R1and R2,
W1= − ln R12/3
W2= − ln R22/3
Trang 4Stratified sampling 93and so a naive Monte Carlo estimate of is
Y=− ln R12/3+ − ln R22/3
5/4
A naive simulation procedure, ‘weibullnostrat’ in Appendix 5.3.1, was called to generate
20 000 independent realizations of Y (seed= 639 156) with the result
= 215843and
ese
For a stratified Monte Carlo, note that Y is monotonic in both R1and R2, so a reasonable
choice for a stratification variable is
X= R1R2This is confirmed by the scatter plot (Figure 5.2) of 500 random pairs of X Y The
joint density of X and R2 is
fXR
2x r2= fR1R2
x
N realizations of X Y will be generated with N strata where pi= 1/N for i = 1 N
With this design and under proportional stratified sampling there is exactly one pair
Trang 5X Y for which X∈ Si for each i Let Ui Vi be independently distributed as U 0 1.Using Equation (5.27) we generate Xifrom the ith stratum through
Ri2 = XVi
i Therefore
Ri1 = Xi
Ri2 Note that Equation (5.29) will need to be solved numerically, but this can be made moreefficient by observing that Xi∈ Xi−1 1 The ith response is
Yi= − ln Ri
1
2/3+− ln Ri
2
2/35/4
and the estimate is
PS=Ni=1
piYi= 1N
N
i=1
Trang 6Stratified sampling 95
Using procedure ‘weibullstrat’ in Appendix 5.3.2 with N = 100 and K = 200 (and with
the same seed as in the naive simulation), the results were
PS= 216644and
The efficiency must take account of both the variance reduction ratio and the relative
computer processing times In this case stratified sampling took 110 seconds and naive
sampling 21 seconds, so
Efficiency=21× 4771
110
≈ 9Three points from this example are worthy of comment:
(i) The efficiency would be higher were it not for the time consuming numerical solution
of Equation (5.29) Problem 5 addresses this
(ii) A more obvious design is to employ two stratification variables, R1 and R2
Accordingly, the procedure ‘grid’ in Appendix 5.3.3 uses 100 equiprobable strata
on a 10× 10 grid on 0 12
, with exactly one observation in each stratum Using
N= 200 replications (total sample size = 20 000 as before) and the same random
number stream as before, this gave
and
Efficiency≈ 13
Compared with the improved stratification method suggested in point (i), this would
not be competitive Moreover, this approach is very limited, as the number of strata
increases exponentially with the dimension of an integral
(iii) In the example it was fortuitous that it was easy to sample from both the distribution
of the stratification variable X and from the conditional distribution of Y given X
In fact, this is rarely the case However, the following method of post stratification
avoids these problems
Trang 75.3.2 Post stratification
This refers to a design in which the number of observations in each stratum is counted
after naive sampling has been performed In this case ni will be replaced by Ni toemphasize that Ni are now random variables (with expectation Npi ) A naive estimator is
=Mi=1
AS differs little from that of PS obtained through proportional stratification with fixed
ni= Npi Of course, the advantage of post stratification is that there is no need to samplefrom the conditional distribution of Y given X, nor indeed from the marginal distribution
of X Implementing post stratification requires only that cumulative probabilities for Xcan be calculated Given there are M equiprobable strata, this is needed to calculate
j= MFXx+ 1, which is the stratum number in which a pair x y falls
This will now be illustrated by estimating
= E W1+ W2+ W3+ W43/2where W1 W4are independent Weibull random variables with cumulative distributionfunctions 1− exp−x2 1− exp−x3 1− exp−x4, and 1− exp−x5 respectively onsupport 0 Bearing in mind that a stratification variable is a function of otherrandom variables, that it should have a high degree of dependence upon the response
Y = W1+ W2+ W3+ W43/2 and should have easily computed cumulative probabilities,
it will be made a linear combination of standard normal random variables Accordingly,define zi by
FWiwi= zifor i= 1 4 where is the cumulative normal distribution function Then
=
0 4
4
i =1
wi
3/2 4
i=1
FWi−1 i
3/2 4
i=1
zi dzi
= EZ∼N0I
4
i=1
FW−1
i i
3/2
Trang 8Stratified sampling 97where is the standard normal density Note that an unbiased estimator is
4
i=1
FW−1
i i
3/2
where the Zi are independently N0 1, that is the vector Z ∼ N 0 I, where the
covariance matrix is the identity matrix I Now
4
i=1
FW−1
i i=
4
i=1
where 1 = 2 2 = 3 3 = 4 4 = 5 Using Maple a linear approximation to
Equation (5.31) is found by expanding as a Taylor series about z= 0 It is
∼ N 0 1
Since X is monotonic in Xthe same variance reduction will be achieved with X as with
X An algorithm simulating K independent realizations, each comprising N samples of
Trang 94 6 8 10 12
Y
X 1 Y
Figure 5.3 An example where X is both a good stratification and control variable
Using K= 50, N = 400, M = 20 seed = 566309 it is found that
A scatter plot of 500 random pairs of X Y shown in Figure 5.3 illustrates the smallvariation about the regression curve EYX This explains the effectiveness of themethod
Trang 10Control variates 99
simulation and we wish to estimate 2= VarY Now suppose that in
the same simulation we collect additional statistics X= X1 Xd
where R2 is the proportion of variance removed from the naive estimator = Y In
practice the information will not be available to calculate Equation (5.34) so it may be
estimated as follows Typically, there will be a sample of independent realizations of
XkLet Xik denote the ith element of column vector Xk Then an unbiased estimator of b∗is
Trang 11can be used Since b∗is a function of the data, E
b∗ X− X
= 0, and so the estimator
is biased Fortunately, the bias is O 1/N Given that the standard error is O
1/√N
,the bias can be neglected providing N is large enough If this method is not suitable,another approach is to obtain b∗ from a shorter pilot run (it is not critical that it deviates
slightly from the unknown b∗) and then to use this in a longer independent simulationrun to obtain b∗ This is unbiased for all N It is worth noting that if E YX is linear
then there is no question of any bias when there is no separate pilot run, even for smallsample sizes
A nice feature of the control variate method is its connection with linear regression A
regression of Y upon X takes the form
Yk= 0+ X
k+ kwhere k are identically and independently distributed with zero mean The predicted
value (in regression terminology) at X∗ is the (unbiased) estimate of E YX∗ and isgiven by
where
= S−1
XX S XYHowever, this is just b∗ This means that variance reduction can be implemented using
multiple controls with standard regression packages Given Xk Yk k= 1 N, thecontrol variate estimator is obtained by comparing Equations (5.35) and (5.36) It followsthat b∗ is the predicted value Y∗ at X∗= X
Let us investigate how the theory may be applied to the simple case where there is justone control variable X d= 1 In this case
b∗=
N k=1xk− x yk− y
N k=1xk− x2
Trang 12Conditional Monte Carlo 101Accordingly, a control variate estimator is
b∗= Y + b∗
1
4− X
The effectiveness of this is given by R2, which is simply the squared correlation between
X and Y A sample of 500 pairs X Y produced the scatter plot in Figure 5.2 and gave
a sample correlation of−08369 So R2= −083692= 0700 Therefore, the proportion
of variance that is removed through the use of this control variable is approximately
0.7 and the variance reduction ratio is approximately 1− 07−1= 33 Although this
is a useful reduction in variance, it does not compare well with the estimated variance
reduction ratio of 48 given in result (5.30), obtained through fine stratification of X
The reason for this lies in the scatter plot (Figure 5.2), which shows that the regression
E YX is highly nonlinear A control variable removes only the linear part of the
variation in Y
In contrast, using the stratification variable X as a control variate in the post
stratification example considered in Section 5.3.2 will produce a variance reduction ratio
of approximately 1− 098722−1
= 40, 0.9872 being the sample correlation of 500 pairs
of X Y Now compare this with the estimated variance reduction ratio of 24 given in
result (5.33) using stratification The control variate method is expected to perform well
in view of the near linear dependence of Y upon X (Figure 5.3) However, the apparently
superior performance of the control variate seems anomalous, given that fine stratification
of X will always be better than using it as a control variate Possible reasons for this are
that M= 20 may not equate to fine stratification Another is that K = 50 is a small sample
as far as estimating the standard error is concerned, which induces a large estimated
standard error on the variance reduction ratio This does not detract from the main point
emerging from this example It is that if there is strong linear dependence between Y
and X, little efficiency is likely to be lost in using a control variate in preference to
stratification
5.5 Conditional Monte Carlo
Conditional Monte Carlo works by performing as much as possible of a multivariate
integration by analytical means, before resorting to actual sampling Suppose we wish to
estimate where
= Egxy f x y
where g is a multivariate probability density function for a random vector that can be
partioned as the row vector X Y Suppose, in addition, that by analytical means the
Trang 13Accordingly, a conditional Monte Carlo estimate of is given by sampling n variates y
from h in the algorithm below:
is the expected cost C of delay? A naive simulation would follow the algorithm (notethat X− K+= max0 X − K):
C
=
$
%n i=1
dv
Trang 141 Consider the following single server queue The interarrival times for customers are
independently distributed as U0 1 On arrival, a customer either commences service
if the server is free or waits in the queue until the server is free and then commences
service Service times are independently distributed as U0 1 Let Ai, Si denote the
interarrival times between the i−1th and ith customers and the service time of the
ith customer respectively Let Widenote the waiting time (excluding service time) in
the queue for the ith customer The initial condition is that the first customer in the
system has just arrived at time zero Then
Wi= max0 Wi −1+ Si −1− Aifor i= 2 5 where W1= 0 Write a procedure to simulate 5000 realizations of
the total waiting time in the queue for the first five customers, together with 5000
antithetic realizations
(a) Using a combined estimator from the primary and antithetic realizations, estimate
the expectation of the waiting time of the five customers and its estimated standard
error Estimate the variance reduction ratio
(b) Now repeat the experiment when the service duration is U0 2 Why is the
variance reduction achieved here much better than that in (a)?
Trang 152 In order to estimate =bx −1e−xdx where ≤ 1 and b > 0, an importance samplingdensity gx= e−x−b1
x>bis used (The case > 1 is considered in Section 5.2) Given
R∼ U0 1, show that an unbiased estimator is = X−1e−bwhere X= b −ln R andthat Var
< b−1e−b−
3 (This is a more difficult problem.) If X∼ N 2
then Y= expX is lognormallydistributed with mean exp 2/2 and variance exp 2 2 "
exp 2
− 1#
It is required to estimate the probability that the sum of n such identically andindependently lognormal distributed random variables exceeds a A similar type
of problem arises when considering Asian financial options (see Chapter 6) Use
an importance sampling density that shifts the lognormal such that X∼ N 2where > (Refer to Section 5.2.1 which describes the i.i.d beta distributedcase.)
(a) Show that when a > n exp the upper bound on variance developed in result(5.13) is minimized when = ln a/n
(b) Now suppose the problem is to estimate
= Ef
n
i =1
eXi− a
+
where f is the multivariate normal density N 2I
and x+= max 0 x Showthat the corresponding value of 2/n
4 (This is a more difficult problem.) Where it exists, the moment generating function
of a random variable having probability density function fx is given by
(a) Consider the estimation of = P n
i =1exp Xi > a where the Xi areindependently N 2
Show that the tilted distribution is N 2 2
Trang 16
Problems 105
Show that when a > n exp the value of t that minimizes the bound on variance
given in result (5.13) is
t=ln a/n2 − and that therefore the method is identical to that described in Problem 3(a)
(b) Consider the estimation of = P n
i=1Xi> a where the Xi are independentand follow a beta distribution with shape parameters > 1 and > 1 on
support 0 1 Show that the corresponding value of t here is the one that
minimizes
e−at/nMt
(i) Use symbolic integration and differentiation within Maple to find this value
of t when n= 12, a = 62, = 15, and = 25
(ii) Write a Maple procedure that estimates for any > 1, > 1, n, and
a > 0 Run your simulation for the parameter values shown in Table 5.1 of
Section 5.2.1 and verify that the variance reduction achieved is of the same
order as shown there
5 In Section 5.3.1, Equation (5.29) shows how to generate, from a cumulative
distribution function, x− x lnx on support 0 1, subject to x lying in the ith of N
equiprobable strata This equation has to be solved numerically, which accounts for
the stratified version taking approximately four times longer than the naive Monte
Carlo version Derive an efficient envelope rejection method that is faster than this
inversion of the distribution function Use this to modify the procedure ‘weibullstrat’
in Appendix 5.3.2 Run the program to determine the improvement in efficiency
6 Write a Maple procedure for the post stratification algorithm in Section 5.3.2
Compare your estimate with the one obtained in result (5.32)
7 Suggest a suitable stratification variable for the queue simulation in Problem 1
Write a Maple program and investigate the variance reduction achieved for different
parameter values
8 Write procedures for naive and conditional Monte Carlo simulations to estimate the
expected cost for the example in Section 5.5 How good is the variance reduction?
9 Revisit Problem 4(b) Suggest and implement a variance reduction scheme that
combines the tilted importance sampling with post stratification
10 Use Monte Carlo to estimate
confidence interval for the integral
... class="text_page_counter">Trang 14< /span>1 Consider the following single server queue The interarrival times for customers are
independently distributed...
Compare your estimate with the one obtained in result (5.32)
7 Suggest a suitable stratification variable for the queue simulation in Problem
Write a Maple program and investigate the... naive and conditional Monte Carlo simulations to estimate the
expected cost for the example in Section 5.5 How good is the variance reduction?
9 Revisit Problem 4( b) Suggest and implement