Consider the random experiment where we draw uniformly and independently n a Let A4 be the smallest of the n numbers.. Section 2.3 discusses general methods for generating one-dimension
Trang 2PROBLEMS 41
for j = 1, , m The optimal vector A * = (AT, , A;) can be found by solving (1.90) numerically Note that if the primal program has a nonempty interior optimal
solution, then the dual program has an optimal solution A’
4 Finally, substitute X = A’ and /3 = p(X*) back into (1.84) to obtain the solution to the original MinxEnt program
It is important to note that we do not need to explicitly impose the conditions
p i 2 0, i = 1 , , n, because the quantities { p i } in (1.84) are automatically strictly
positive This is a crucial property of the C E distance; see also [2] It is instructive
(see Problem 1.37) to verify how adding the nonnegativity constraints affects the above procedure
When inequality constraints IE,[Si(X)] 2 yi are used in (1.80) instead of equality constraints, the solution procedure remains almost the same The only difference is
that the Lagrange multiplier vector X must now be nonnegative It follows that the
dual program becomes
max D(X)
x subject to: X 2 0 ,
with D ( X ) given in (1.88)
A further generalization is to replace the above discrete optimization problem
with afunctional optimization problem This topic will be discussed in Chapter 9 In
particular, Section 9.5 deals with the MinxEnt method, which involves a functional MinxEnt problem
PROBLEMS
Probability Theory
1.1
nition 1.1.1 (here A and B are events):
Prove the following results, using the properties of the probability measure in Defi-
a) P(AC) = 1 - P ( A )
b) P ( A U B ) = P ( A ) + P(B) - P(A n B )
1.2 Prove the product rule (1.4) for the case of three events
1.3 We draw three balls consecutively from a bowl containing exactly five white and five
black balls, without putting them back What is the probability that all drawn balls will be
black?
1.4 Consider the random experiment where we toss a biased coin until heads comes up
Suppose the probability of heads on any one toss is p Let X be the number of tosses
required Show that X - G(p)
1.5
5 Let N be the number of people queried until we get a “duplicate” birthday
In a room with many people, we ask each person hisher birthday, for example May
a) Calculate P(N > n), n = 0 , 1 , 2 ,
b) For which n d o we have P ( N < n ) 2 1/2?
c) Use a computer to calculate IE[N]
Trang 342 PRELIMINARIES
1.6
random variables that are derived from X and Y via the linear transformation
Let X and Y be independent standard normal random variables, and let U and V be
s i n a -s;~sa) (c)
a) Derive the joint pdf of U and V
b) Show that U and V are independent and standard normally distributed
Let X - Exp(X) Show that the memorylessproperty holds: for all s, t 2 0,
1.7
P(X > t - t s J X > t ) =P(X > s )
1.8 Let X1, X2, X3 be independent Bernoulli random variables with success probabilities
1/2,1/3, and 1/4, respectively Give their conditional joint pdf, given that X i +X2 +X3 =
2
1.9
1.10
Verify the expectations and variances in Table 1.3
Let X and Y have joint density f given by
fact that the variance of aX + Y is always non-negative, for any a.]
1.14 Consider Examples 1.1 and 1.2 Define X as the function that assigns the number
2 1 + + zn to each outcome w = ( 2 1 , , zn) The event that there are exactly k heads
in 71 throws can be written as
Verify the properties of variance and covariance in Table 1.4
Show that the correlation coefficient always lies between -1 and 1 [Hint, use the
{ w E R : X(w) = k }
If we abbreviate this to {X = k } , and further abbreviate P({ X = k } ) to P(X = k ) , then
we obtain exactly (1.7) Verify that one can always view random variables in this way,
that is, as real-valued functions on s2, and that probabilities such as P(X 6 z) should be interpreted as P ( { w E R : X(w) 6 x})
1.15 Show that
1.16 Let C be the covariance matrix of a random column vector X Write Y = X - p,
where p is the expectation vector of X Hence, C = IEIYYT] Show that C is positive semidefinite That is, for any vector u, we have uTCu 2 0
Trang 4PROBLEMS 43 1.17 Suppose Y - Gamma(n; A) Show that for all z 2 0
(1.91)
1.18
numbers, XI, , X,, from the interval [0,1]
Consider the random experiment where we draw uniformly and independently n
a) Let A4 be the smallest of the n numbers Express A4 in terms of X I , , X,
b) Determine the pdf of M
1.19 Let Y = ex, where X - N(0,l)
a) Determine the pdf of Y
b) Determine the expected value of Y
1.20 We select apoint (X, Y ) from the triangle (0,O) - ( 1 , O ) - (1,l) in such a way that
X has a uniform distribution on ( 0 , l ) and the conditional distribution of Y given X = x
is uniform on (0, x)
a) Determine the joint pdf of X and Y
b) Determine the pdf of Y
c) Determine the conditional pdf of X given Y = y for all y E ( 0 , l )
d) Calculate E[X I Y = y] for all y E ( 0 , l )
e) Determine the expectations of X and Y
(Hint: write out the binomial coefficient and use the fact that limn+m (1 - $ ) n = eCXt.)
1.23 Consider the Bernoulli approximation in Section 1.1 1 Let U1, U2, denote the times of success for the Bernoulli process X
a) Verify that the “intersuccess” times U 1 , U2 - U l , are independent and have a
b) For small h and n = Lt/hJ, show that the relationship P(A1 > t ) =: P(U1 > n)
geometric distribution with parameter p = Ah
leads in the limit, as n -+ 00, to
B(A1 > t ) = e-Xt
Trang 5Determine the (discrete) pdf of each X,, n = 0, 1 , 2 , for the random walk in
Let {X,, n E N} be a Markov chain with state space {0,1,2}, transition matrix
0.3 0.1 0.6 0.1 0.7 0.2 and initial distribution 7r = (0.2,0.5,0.3) Determine
1.27 Consider two dogs harboring a total number of m fleas Spot initially has b fleas
and Lassie has the remaining m - b The fleas have agreed on the following immigration policy: at every time n = 1 , 2 a flea is selected at random from the total population and that flea will jump from one dog to the other Describe the flea population on Spot as a Markov chain and find its stationary distribution
1.28 Classify the states of the Markov chain with the following transition matrix:
0.0 0.3 0.6 0.0 0.1 0.0 0.3 0.0 0.7 0.0 0.0 0.1 0.0 0.9 0.0 0.1 0.1 0 2 0.0 0.6
1.29 Consider the following snakes-and-ladders game Let N be the number of tosses
required to reach the finish using a fair die Calculate the expectation of N using a computer
start
finish
Trang 6PROBLEMS 45
1.30 Ms Ella Brum walks back and forth between her home and her office every day She owns three umbrellas, which are distributed over two umbrella stands (one at home and one at work) When it is not raining, Ms Brum walks without an umbrella When it is raining, she takes one umbrella from the stand at the place of her departure, provided there
is one available Suppose the probability that it is raining at the time of any departure is p
Let X , denote the number of umbrellas available at the place where Ella arrives after walk number n; n = 1,2, ., including the one that she possibly brings with her Calculate the limiting probability that it rains and no umbrella is available
1.31 A mouse is let loose in the maze of Figure 1.9 From each compartment the mouse chooses one of the adjacent compartments with equal probability, independent of the past The mouse spends an exponentially distributed amount of time in each compartment The mean time spent in each of the compartments 1, 3, and 4 is two seconds; the mean time
spent in compartments 2 , 5 , and 6 is four seconds Let { X t , t 3 0 ) be the Markov jump
process that describes the position of the mouse for times t 2 0 Assume that the mouse starts in compartment 1 at time t = 0
Figure 1.9 A maze
What are the probabilities that the mouse will be found in each of the compartments
1,2, , 6 at some time t far away in the future?
1.32 In an M/M/m-queueing system, customers arrive according to a Poisson process with rate a Every customer who enters is immediately served by one of an infinite number
of servers; hence, there is no queue The service times are exponentially distributed, with mean l / b All service and interarrival times are independent Let X t be the number of customers in the system at time t Show that the limiting distribution of X t , as t -t 00, is
Poisson with parameter a/b
1.36 Derive the program (1.78)
Let a and let x be n-dimensional column vectors Show that V, aTx = a
Let A be a symmetric n x n matrix and x be an n-dimensional column vector Show
Show that the optimal distribution p* in Example 1.17 is given by the uniform
Trang 7where p and q are probability distribution vectors and A is an m x n matrix
a) Show that the Lagrangian for this problem is of the form
b) Show that p , = qi exp( -0 - 1 + pi + C,”=, X j a j i ) , for i = 1, , n
c) Explain why, as a result of the KKT conditions, the optimal p* must be equal to
d) Show that the solution to this MinxEnt program is exactly the same as for the the zero vector
program where the nonnegativity constraints are omitted
Further Reading
An easy introduction to probability theory with many examples is [ 141, and a more detailed textbookis [ 9 ] A classical reference is [7] An accurate and accessible treatment of various stochastic processes is given in [4] For convex optimization we refer to [3] and [8]
REFERENCES
1 S Asmussen and R Y Rubinstein Complexity properties of steady-state rare-events simulation
in queueing models In J H Dshalalow, editor, Advances in Queueing: Theory, Methods and
Open Problems, pages 429-462, New York, 1995 CRC Press
2 2 I Botev, D P Kroese, and T Taimre Generalized cross-entropy methods for rare-event sim-
ulation and optimization Simulation; Transactions of the Society for Modeling and Simulation
International, 2007 In press
3 S Boyd and L Vandenberghe Convex Optimization Cambridge University Press, Cambridge,
2004
4 E Cinlar Introduction to Stochastic Processes Prentice Hall, Englewood Cliffs, NJ, 1975
5 T M Cover and J A Thomas Elements oflnformation Theory John Wiley & Sons, New York,
6 C W Curtis Linear Algebra: An Introductory Approach Springer-Verlag, New York, 1984
7 W Feller A n Introduction to Probability Theory and Its Applications, volume I John Wiley &
8 R Fletcher Practical Methods of Optimization John Wiley & Sons, New York, 1987
9 G R Grimmett and D R Stirzaker Probability and Random Processes Oxford University
10 J N Kapur and H K Kesavan Entropy Optimization Principles with Applications Academic
1991
Sons, New York, 2nd edition, 1970
Press, Oxford, 3rd edition, 2001
Press, New York, 1992
Trang 8REFERENCES 47
11 A I Khinchin Information Theory Dover Publications, New York, 1957
12 V Kriman and R Y Rurbinstein Polynomial time algorithms for estimation of rare events in queueing models In J Dshalalow, editor, Fronfiers in Queueing: Models and Applications in Science and Engineering, pages 421-448, New York, 1995 CRC Press
13 E L Lehmann Tesfing Sfatistical Hypotheses Springer-Verlag, New York, 1997
14 S M Ross A First Course in Probability Prentice Hall, Englewood Cliffs, NJ, 7th edition,
15 R Y Rubinstein and B Melamed Modern Simulation and Modeling John Wiley & Sons, New
2005
York, 1998
Trang 9This Page Intentionally Left Blank
Trang 10ation of uniform random variables Section 2.3 discusses general methods for generating one-dimensional random variables Section 2.4 presents specific algorithms for generating
variables from commonly used continuous and discrete distributions In Section 2.5 we
discuss the generation of random vectors Sections 2.6 and 2.7 treat the generation of Pois-
son processes, Markov chains and Markov jump processes Finally, Section 2.8 deals with
the generation of random permutations
2.2 RANDOM NUMBER GENERATION
In the early days of simulation, randomness was generated by manual techniques, such
as coin flipping, dice rolling, card shuffling, and roulette spinning Later on, physical
devices, such as noise diodes and Geiger counters, were attached to computers for the same purpose The prevailing belief held that only mechanical or electronic devices could produce truly random sequences Although mechanical devices are still widely used in gambling
Simulation and the Monte Carlo Method Second Edition By R.Y Rubinstein and D P Kroese 49
Copyright @ 2007 John Wiley & Sons, Inc
Trang 1150
and lotteries, these methods were abandoned by the computer-simulation community for several reasons: (a) Mechanical methods were too slow for general use, (b) the generated sequences cannot be reproduced and, (c) it has been found that the generated numbers exhibit both bias and dependence Although certain modem physical generation methods are fast and would pass most statistical tests for randomness (for example, those based
on the universal background radiation or the noise of a PC chip), their main drawback remains their lack of repeatability Most of today's random number generators are not based on physical devices, but on simple algorithms that can be easily implemented on a computer They are fast, require little storage space, and can readily reproduce a given sequence of random numbers Importantly, a good random number generator captures all the important statistical properties of true random sequences, even though the sequence is generated by a deterministic algorithm For this reason, these generators are sometimes
called pseudorandom
The most common methods for generating pseudorandom sequences use the so-called
linear congruentialgenerutors, introduced in [6] These generate a deterministic sequence
of numbers by means of the recursive formula
Xi+l = ax, + c (mod m ) , (2.1) where the initial value, X O , is called the seed and the a , c, and m (all positive integers) are
called the multiplier, the increment and the modulus, respectively Note that applying the
modulo-m operator in (2.1) means that a x i + c is divided by m, and the remainder is taken
as the value for Xi+l Thus, each Xi can only assume a value from the set (0, 1, , m- l},
and the quantities
X
m
u 1 - - 2 1
called pseudorandom numbers, constitute approximations to a true sequence of uniform
random variables Note that the sequence X O , X I , X 2 , will repeat itself after at most
rn steps and will therefore be periodic, with period not exceeding m For example, let
a = c = Xo = 3 and m = 5 Then the sequence obtained from the recursive formula
X,+l = 3 X , + 3 (mod 5) is 3 , 2 , 4 , 0 , 3 , which has period 4 In the special case where
c = 0, (2.1) simply reduces to
XE+l = a X , (mod T I ) (2.3)
Such a generator is called a multiplicative congruentiulgenerutor It is readily seen that
an arbitrary choice of X o , a , c, and m will not lead to a pseudorandom sequence with good statistical properties In fact, number theory has been used to show that only a few combinations of these produce satisfactory results In computer implementations, m is selected as a large prime number that can be accommodated by the computer word size For example, in a binary 32-bit word computer, statistically acceptable generators can be obtained by choosing m = 231 - 1 and a = 75, provided that the first bit is a sign bit A 64-bit or 128-bit word computer will naturally yield better statistical results
Formulas (2 l), (2.2), and (2.3) can be readily extended to pseudorandom vector gener- ation For example, the n-dimensional versions of (2.3) and (2.2) can be written as
and
U, = M-'X,,
Trang 12RANDOM VARIABLE GENERATION 51
respectively, where A is a nonsingular n x n matrix, M and X are n-dimensional vectors, and M-’Xi is the n-dimensional vector with components M Y ’ X 1 , , M L I X ,
Besides the linear congruential generators, other classes have been proposed to achieve longer periods and better statistical properties (see [ 5 ] )
Most computer languages already contain a built-in pseudorandom number generator The user is typically requested only to input the initial seed, Xo, and upon invocation the random number generator produces a sequence of independent, uniform ( 0 , l ) random variables We therefore assume in this book the availability of such a “black box” that is capable of producing a stream of pseudorandom numbers In Matlab, for example, this is provided by the rand function
W EXAMPLE 2.1 Generating Uniform Random Variables in Matlab
This example illustrates the use of the rand function in Matlab to generate samples from the U(0,l) distribution For clarity we have omitted the “ans = ” output in the Matlab session below
% g e n e r a t e a uniform random number
3 g e n e r a t e another uniform random number
% t h e previous outcome i s repeated
2 3 RANDOM VARIABLE GENERATION
In this section we discuss various general methods for generating one-dimensional random variables from a prescribed distribution We consider the inverse-transform method, the alias method, the composition method, and the acceptance-rejection method
2.3.1 Inverse-Transform Method
Let X be a random variable with cdf F Since F is a nondecreasing function, the inverse
function F-’ may be defined as
~ - ‘ ( y ) = inf{z : F ( Z ) 2 y } , o < y Q 1 (2.6) (Readers not acquainted with the notion inf should read min.) It is easy to show that if
U - U(0, I), then
has cdf F Namely, since F is invertible and P(U < u ) = u, we have
Trang 1352 RANDOM NUMBER RANDOM VARIABLE, AND STOCHASTIC PROCESS GENERATION
Thus, to generate a random variable X with cdf F , draw U - U(0,l) and set X = F - l ( U ) Figure 2.1 illustrates the inverse-transform method given by the following algorithm
Algorithm 2.3.1 (The Inverse-Transform Method)
I Generate U from U(0,l)
Therefore, to generate a random variable X from the pdf (2.9), first generate a random
variable U from U ( 0 , l ) and then take its square root
EXAMPLE 2.3 Order Statistics
Let X1, , X, be iid random variables with cdf F We wish to generate ran- dom variables X(nl and X(l) that are distributed according to the order statistics max(X1, , X,) and min(X1, , X,), respectively From Example 1.7 we see
Trang 14RANDOM VARIABLE GENERATION 53
that the cdfs of X ( n l and X(l) are F , ( x ) = [ F ( x ) l n and F l ( z ) = 1 - [ l - F ( x ) ] " ,
respectively Applying (2.7), we get
and, since 1 - U is also from U(0, l),
X ( 1 ) = F - ' ( l - U"n)
In the special case where F ( x ) = 2, that is, X i - U(0, l), we have
X ( n ) = U'/" and X(l) = 1 - U1/"
EXAMPLE 2.4 Drawing from a Discrete Distribution
Let X be a discrete random variable with P(X = x , ) = p , , z = 1 , 2 , , with
c, p , = 1 and x1 < 5 2 < The cdf F of X is given by F ( x ) = z,:x,Gx p , , i =
1 , 2 , and is illustrated in Figure 2.2
P3 { i
j p d - -
Figure 2.2 Inverse-transform method for a discrete random variable
Hence, the algorithm for generating a random variable from F can be written as follows
Algorithm 2.3.2 (Inverse-Transform Method for a Discrete Distribution)
1 Generate U - U ( 0 , l )
2 Find the smaIIestpositive integer; k, such that U < F ( x k ) andreturn x = xk
Much of the execution time in Algorithm 2.3.2 is spent in making the comparisons of Step
2 This time can be reduced by using efficient search techniques (see [ 2 ] )
In general, the inverse-transform method requires that the underlying cdf, F , exists
in a form for which the corresponding inverse function F-' can be found analytically
or algorithmically Applicable distributions are, for example, the exponential, uniform, Weibull, logistic, and Cauchy distributions Unfortunately, for many other probability distributions, it is either impossible or difficult to find the inverse transform, that is, to solve
F ( x ) = 1: f ( t ) dt = u
Trang 1554 RANDOM NUMBER, RANDOM VARIABLE, AND STOCHASTIC PROCESS GENERATION
with respect to 2 Even in the case where F-' exists in an explicit form, the inverse-
transform method may not necessarily be the most efficient random variable generation method (see [2])
can be represented as an equally weighted mixture of n - 1 pdfs, q ( k ) , k = 1, , n - 1,
each having at most two nonzero components That is, any n-point pdf f can be represented
as
f ( Q ) = P(X = Xi), a = 1 , , n ,
(2.10)
for suitably defined two-point pdfs q ( k ) , k = 1, , n - 1; see [ 111
The alias method is rather general and efficient but requires an initial setup and extra storage for the n - 1 pdfs, q ( k ) A procedure for computing these two-point pdfs can be found in [2] Once the representation (2.10) has been established, generation from f is
simple and can be written as follows:
Algorithm 2.3.3 (Alias Method)
Let Xi - Gi and let Y be a discrete random variable with P(Y = i) = p i , independent of
Xi, for 1 < i < m Then a random variable X with cdf F can be represented as
i=l
It follows that in order to generate X from F , we must first generate the discrete random variable Y and then, given Y = i, generate X, from G, We thus have the following method