SIMULATION AND THE MONTE CARLO METHOD Episode 6 doc

Theorem 5.5.1 Stratified Sampling Assuming that a maximum number of N samples can be collected, that is, El”=, N, = N , the optimal value of N , is given by which gives a minimal varian

Trang 1

Suppose that X can be generated via the composition method Thus, we assume that there exists a random variable Y taking values in { 1, , m } , say, with known probabilities

{ p , , i = 1, , m } , and we assume that it is easy to sample from the conditional distri-

bution of X given Y The events {Y = i}, i = 1, , m form disjoint subregions, or

strata (singular: stratum), of the sample space 0, hence the name stratification Using the

conditioning formula (1.1 l), we can write

where X,, is the j-th observation from the conditional distribution of X given Y = ị Here

N , is the sample size assigned to the i-th stratum The variance of the stratified sampling

estimator is given by

1= 1 3 = 1

(5.35)

where uz = Var(H(X) I Y = i)

How the strata should be chosen depends very much on the problem at hand However, for a given particular choice of the strata, the sample sizes { N,} can be obtained in an

optimal manner, as given in the next theorem

Theorem 5.5.1 (Stratified Sampling) Assuming that a maximum number of N samples can be collected, that is, El”=, N, = N , the optimal value of N , is given by

which gives a minimal variance of

(5.36)

(5.37)

Proof: The theorem is straightforwardly proved using Lagrange multipliers and is left as

Theorem 5.5.1 asserts that the minimal variance of ệ is attained for sample sizes Ni that

are proportional to p i uị A difficulty is that although the probabilities pi are assumed to be known, the standard deviations {ai} are usually unknown In practice, one would estimate the { u i } from “pilot” runs and then proceed to estimate the optimal sample sizes, N t , from

(5.36)

A simple stratification procedure, which can achieve variance reduction without requiring prior knowledge of u: and H ( X ) , is presented next

Trang 2

IMPORTANCE SAMPLING 131

Proposition 5.5.1 Let the sample sizes N, beproportional to p,, that is, N, = p , N, i =

1 , m Then

var(ês) 6 E r ( l j Proof Substituting N, = p , N in (5.35) yields Var(ệ) = & zz, p , 0: The result now follows from

m

NVar(P^) = Var(H(X)) 2 E[Var(H(X) 1 Y)] = x p l a : = NVar(ês),

r = l

Proposition 5.5.1 states that the estimator is more accurate than the CMC estimator

It effects stratification by favoring those events {Y = i} whose probabilities p , are largest

Intuitively, this cannot, in general, be an optimal assignment, since information on a: and

where H is the sample performance and f is the probability density of X For reasons that will become clear shortly, we ađ a subscript f to the expectation to indicate that it is taken with respect to the density f

Let g be another probability density such that H f is dominated by g That is, g(x) =

0 + H(x) f(x) = 0 Using the density g we can represent e as

(5.40) where the subscript g means that the expectation is taken with respect to g Such a density

is called the importance sampling density, proposal density, or instrumental density (as we

use g as an instrument to obtain information about l ) Consequently, if X I , , XN is a

random sample from g, that is, X I , , XN are iid random vectors with density g, then

(5.41)

Trang 3

is an unbiased estimator of e This estimator is called the importance sampling estimator

The ratio of densities,

(5.42)

is called the likelihood ratio For this reason the importance sampling estimator is also called the likelihood ratio estimator In the particular case where there is no change of

measure, that is, g = f, we have W = 1, and the likelihood ratio estimator in (5.41)

reduces to the usual CMC estimator

5.6.1 Weighted Samples

The likelihood ratios need only be known up to a constanf, that is, W(X) = cw ( X ) for some known function w(.) Since IE,[W(X)] = 1, we can write f2 = IEg[H(X) W ( X ) ] as

This suggests, as an alternative to the standard likelihood ratio estimator (5.42), the follow-

ing weighted sample estimator:

(5.43)

Here the { w k } , with 'uik = w ( & ) , are interpreted as weights of the random sample

{ X k } , and the sequence { ( x k , W k ) } is called a weighted (random) sample from g(x)

Similar to the regenerative ratio estimator in Chapter 4, the weighted sample estimator (5.43) introduces some bias, which tends to 0 as N increases Loosely speaking, we

may view the weighted sample { (&, W k ) } as a representation of f ( x ) in the sense that

e = IE,[H(X)I =: e2, for any function H ( )

5.6.2 The Variance Minimization Method

Since the choice of the importance sampling density g is crucially linked to the variance

of the estimator Fin (5.41), we consider next the problem of minimizing the variance of

with respect to g, that is,

minVarg (H(x) g(x,> f (XI

It is not difficult to prove (see, for example, Rubinstein and Melamed [3 11 and Problem 5.13)

that the solution of the problem (5.44) is

In particular, if H(x) 0 - which we will assume from now on - then

(5.45)

(5.46)

and

Var, (F) = varg- (H(x)w(x)) = Var, (e) = o

The density g* as per (5.45) and (5.46) is called the optimal importance sampling density

Trang 4

IMPORTANCE SAMPLING 133

EXAMPLE58

Let X - Exp(u-') and H ( X ) = I { x 2 7 ) for some y > 0 Let f denote the pdf of

X Consider the estimation of

We have

Thus, the optimal importance sampling distribution of X is the shfted exponential

distribution Note that H f is dominated by g' but f itself is not dominated by g *

Since g* is optimal, the likelihood ratio estimator z i s constant Namely, with N = 1,

It is important to realize that, although (5.41) is an unbiased estimator for any pdf g

dominating H f , not all such pdfs are appropriate One of the main rules for choosing a good importance sampling pdf is that the estimator (5.41) should have finite variance This

is equivalent to the requirement that

(5.47)

This suggests that g should not have a "lighter tail" than f and that, preferably, the likelihood ratio, f / g should be bounded

In general, implementation of the optimal importance sampling density g* as per (5.45)

and (5.46) is problematic The main difficulty lies in the fact that to derive g * ( x ) one needs

to know e But e is precisely the quantity we want to estimate from the simulation!

In most simulation studies the situation is even worse, since the analytical expression for the sample performance H is unknown in advance To overcome this difficulty, one can perform a pilot run with the underlying model, obtain a sample H ( X 1 ) , , H ( X N ) ,

and then use it to estimate g* It is important to note that sampling from such an artificially constructed density may be a very complicated and time-consuming task, especially when

g is a high-dimensional density

Remark5.6.1 (Degeneracy of the Likelihood Ratio Estimator) The likelihood ratio es-

timator C in (5.41) suffers from a form of degeneracy in the sense that the distribution of

W ( X ) under the importance sampling density g may become increasingly skewed as the

dimensionality n of X increases That is, W(X) may take values close to 0 with high probability, but may also take very large values with a small but significant probability As

a consequence, the variance of W(X) under g may become very large for large n As an example of this degeneracy, assume for simplicity that the components in X are iid, under both f and g Hence, both f ( x ) and g(x) are the products of their marginal pdfs Suppose the marginal pdfs of each component X i are f l and 91, respectively We can then write

W ( X ) as

(5.48)

Trang 5

Using the law of large numbers, the random variable c:=, In (fl(Xi)/gl(Xi)) is approx-

imately equal to n E,, [In (fi ( X ) / g l (X))] for large n Hence,

as a dimension-reduction technique

When the pdf f belongs to some parametric family of distributions, it is often convenient

to choose the importance sampling distribution from the same family In particular, suppose

that f ( ) = f(.; u) belongs to the family

9 = { f ( ; v ) , v E Y } Then the problem of finding an optimal importance sampling density in this class reduces

to the following parametric minimization problem:

We shall call either of the equivalent problems (5.50) and (5.5 1) the variance minimization

(VM) problem, and we shall call the parameter vector v that minimizes programs (5.50) -

(5.5 1) the optimal VMreferenceparameter vector We refer to u as the nominal parameter

The sample average version of (5.51) - (5.52) is

where

(5.53)

(5.54)

and the sample X I , , XN is from f(x; u) Note that as soon as the sample X1, , X N

is available, the function v(v) becomes a deterministic one

Since in typical applications both functions V ( v ) and 6 ( v ) are convex and differentiable with respect to v, and since one can typically interchange the expectation and differentiation

operators (see Rubinstein and Shapiro [32]), the solutions of programs (5.51) - (5.52) and

Trang 6

EXAMPLES9

Consider estimating e = IE[X], where X N Exp(u-') Choosing f ( z ; v ) =

v-' exp(z,u-'), z 2 0 as the importance sampling pdf, the program (5.51) reduces

The optimal reference parameter *v is given by

*v = 2 2 1

We see that .IJ is exactly two times larger than u Solving the sample average version (5.56) (numerically), one should find that, for large N , its optimal solution .z will be close to the true parameter *v

EXAMPLE 5.10 Example 5.8 (Continued)

Consider again estimating e = P U ( X 2 y) = exp(-yu-') In this case, using the family { f (z; v), v > 0) defined by f (2; v) = vP1 exp(zv-l), z 2 0, the program (5.51) reduces to

The optimal reference parameter .w is given by

1

2

*?I = - {y + 'u + &G2} = y + ; + O((u/y)2) ,

where O(z2) is a function of z such that

lim (30 = constant

2-0 5 2

We see that for y >> u, v is approximately equal to y

Trang 7

It is important to note that in this case the sample version (5.56) (or (5.53) - (5.54))

is meaningful only for small y, in particular for those y for which C is not a rare-event probability, say where C < For very small C, a tremendously large sample N is needed (because of the indicator function I{ x)y}) and thus the importance sampling

estimator F i s useless We shall discuss the estimation of rare-event probabilities in more detail in Chapter 8

Observe that the VM problem (5.5 1) can also be written as

min V(V) = m i n E, [H’(x) W(X; u, v) W(X; u, w)] , (5.57)

V E Y V E Y

where w is an arbitrary reference parameter Note that (5.57) is obtained from (5.52) by multiplying and dividing the integrand by f(x; w) We now replace the expected value in

(5.57) by its sample (stochastic) counterpart and then take the optimal solution of the asso-

ciated Monte Carlo program as an estimator of *v Specifically, the stochastic counterpart

of (5.57) is

N

1 min ?(v) = min - H’(X,) W(Xk ; u , v ) W(Xk ; u, w) , ( 5 5 8 )

where X I , , XN is an iid sample from f( .; w) and w is an appropriately chosen trial parameter Solving the stochastic program (5.58) thus yields an estimate, say 3, of *v

In some cases it may be useful to iterate this procedure, that is, use as a trial vector in

(5.58), to obtain a better estimate

Once the reference parameter v = 3 is determined, C is estimated via the likelihood ratio estimator

V E Y “ E Y N ,=I

(5.59)

where X I , , XN is a random sample from f(.; v) Typically, the sample size N in (5.59)

is larger than that used for estimating the reference parameter We call (5.59) the standard likelihood ratio (SLR) estimator

5.6.3 The Cross-Entropy Method

An alternative approach for choosing an “optimal” reference parameter vector in (5.59) is based on the Kullback-Leibler cross-entropy, or simply crass-entropy (CE), mentioned in (1 S9) For clarity we repeat that the C E distance between two pdfs g and h is given (in the

continuous case) by

Recall that ID(g, h ) 2 0, with equality if and only if g = h

The general idea is to choose the importance sampling density, say h, such that the C E

distance between the optimal importance sampling density g* in (5.45) and h is minimal

We call this the CE optirnalpdf: Thus, this pdf solves the followingfunctional optimization

program:

min ID (g’, h)

Trang 8

max D(v) = max IE, [ H ( X ) In f ( X ; v)] (5.61) Since typically D(v) is convex and differentiable with respect to v (see Rubinstein and

Shapiro [32]), the solution to (5.61) may be obtained by solving

E, [ H ( X ) V In f ( X ; v)] = 0 , (5.62) provided that the expectation and differentiation operators can be interchanged The sample counterpart of (5.62) is

N

(5.63)

By analogy to the VM program (5.51), we call (5.61) the CE program, and we call the parameter vector v* that minimizes the program (5.64) the optimal CE referenceparameter vector

Arguing as in (5.57), it is readily seen that (5.61) is equivalent to the following program:

max D(v) = max E, [ H ( X ) W ( X ; u, w) In f ( X ; v)] , (5.64) where W ( X ; u, w) is again the likelihood ratio and w is an arbitrary tilting parameter

Similar to (5.58), we can estimate v* as the solution of the stochastic program

V

N

1

max ~ ( v ) = max - C H ( x ~ ) w ( x ~ ; u, w) In f ( & ; v) , (5.65)

where X I , , X N is a random sample from I ( ; w) As in the VM case, we mention the

possibility of iterating this procedure, that is, using the solution of (5.65) as a trial parameter

for the next iteration

Since in typical applications the function 5 in (5.65) is convex and differentiable with respect to v (see [32]), the solution of (5.65) may be obtained by solving (with respect to

v) the following system of equations:

k=l

(5.66)

Trang 9

where the gradient is with respect to v

Our extensive numerical studies show that for moderate dimensions n, say n 5 50, the optimal solutions of the CE programs (5.64)and (5.65) (or (5.66)) and their VM counterparts

(5.57) and (5.58) are typically nearly the same However, for high-dimensional problems

(n > 50), we found numerically that the importance sampling estimator g i n (5.59) based

on VM updating of v outperforms its C E counterpart in both variance and bias The latter

is caused by the degeneracy of W , to which, we found, CE is more sensitive

The advantage of the CE program is that it can often be solved analytically In particular,

this happens when the distribution of X belongs to an exponentialfamily of distributions; see

Section A.3 of the Appendix Specifically (see (A 16)) for a one-dimensional exponential

family parameterized by the mean, the CE optimal parameter is always

and the corresponding sample-based updating formula is

Observe also that because of the degeneracy of W , one would always prefer the estimator

(5.70) to (5.69), especially for high-dimensional problems But as we shall see below, this

is not always feasible, particularly when estimating rare-event probabilities in Chapter 8

Consider again the estimation of l = E [ X ] , where X N E x p ( u - l ) and f(z; v) =

v-' exp(zv-'), z 2 0 Solving (5.62) we find that the optimal reference parameter

v * is equal to

Thus, v* is exactly the same as *v For the sample average of (5.62), we should find that for large N its optimal solution 8' is close to the optimal parameter v* = 2u

Trang 10

IMPORTANCE SAMPLING 139

I EXAMPLE 5.12 Example 5.10 (Continued)

Consider again the estimation of C = Bu(X > y) = exp(-yv.-') In this case, we readily find from (5.67) that the optimal reference parameter is w* = y + u Note that

similar to the VM case, for y >> u, the optimal reference parameter is approximately

7

Note that in the above example, similar to the VM problem, the CE sample version

(5.66) is meaningful only when y is chosen such that C is not a rare-eventprobability, say

when l < In Chapter 8 we present a general procedure for estimating rare-event probabilities of the form C = B,(S(X) 2 y) for an arbitrary function S(x) and level y

EXAMPLE 5.13 Finite Support Discrete Distributions

Let X be a discrete random variable with finite support, that is, X can only take a

finite number of values, say a l , Let ui = B(X = ai),i = 1, , m and

define u = (u1, , urn) The distribution of X is thus trivially parameterized by the vector u We can write the density of X as

m

From the discussion at the beginning of this section we know that the optimal CE

and VM parameters coincide, since we optimize over all densities on { a1 , , a m }

By (5.45) the VM (and CE) optimal density is given by

so that

for any reference parameter w , provided that Ew[H(X) W(X; u, w)] > 0

The vector V* can be estimated from the stochastic counterpart of (5.71), that is,

as

where XI, , XN is an iid sample from the density f(.; w)

A similar result holds for a random vector X = ( X I , , X,) where X I , ,

X , are independent discrete random variables with finite support, characterized by

Trang 11

the parameter vectors u l , , u, Because of the independence assumption, the

C E problem (5.64) separates into n subproblems of the form above, and all the components of the optimal CE reference parameter v* = (v;, , v;), which is

now a vector of vectors, follow from (5.72) Note that in this case the optimal VM

and CE reference parameters are usually not equal, since we are not optimizing the

C E over all densities See, however, Proposition 4 2 in Rubinstein and Kroese [29]

for an important case where they do coincide and yield a zero-variance likelihood

ratio estimator

The updating rule (5.72), which involves discrete finite support distributions, and in

particular the Bernoulli distribution, will be extensively used for combinatorial optimization problems later on in the book

Consider the bridge network in Figure 5.1, and let

S(X) = m i n ( X l + X4, X I + X 3 + X 5 , X z + X3 + X4, X Z + X 5 )

Suppose we wish to estimate the probability that the shortest path from node A to node B has a length of at least y; that is, with H ( x ) = I{s(x)2r}, we want to estimate

e = W W I = PU(S(X) 2 7 ) = L [ I { S ( X ) > y } I '

We assume that the components { X , } are independent, that Xi - E x p ( u ; l ) , i =

1, , 5 , and that y is chosen such that C 2 lo-' Thus, here the CE updating formula (5.69) and its particular case (5.70) (with w = u) applies We shall show that this

yields substantial variance reduction The likelihood ratio in this case is

As a concrete example, let the nominal parameter vector u be equal to (1,1,0.3, 0.2,O.l) and let y = 1.5 We will see that this probability C is approximately 0.06 Note that the typical length of a path from A to B is smaller than y = 1.5; hence, using importance sampling instead of CMC should be beneficial The idea

is to estimate the optimal parameter vector v* without using likelihood ratios, that

is, using (5.70), since likelihood ratios, as in (5.69) (with quite arbitrary w, say by

guessing an initial trial vector w), would typically make the estimator of v * unstable, especially for high-dimensional problems

Denote by G1 the CE estimator of v* obtained from (5.70) We can iterate (repeat)

this procedure, say for T iterations, using (5.69), and starting with w = G g , Once the final reference vector V^T is obtained, we then estimate C via a larger sample

from f ( x ; G ~ ) , say of size N1, using the SLR estimator (5.59) Note, however,

that for high-dimensional problems, iterating in this way could lead to an unstable final estimator G T In short, a single iteration with (5.70) might often be the best alternative

Trang 12

SEQUENTIAL IMPORTANCE SAMPLING 141

0

1

2

3

Table 5.1 presents the performance of the estimator (5.59), starting from w = u =

(1,1,0.3,0.2,0.1) and then iterating (5.69) three times Note again that in the first iteration we generate a sample X 1 , XN from f(x; u) and then apply (5.70) to obtain an estimate v^ = (51, ,55) of the CE optimal reference parameter vector

v* The sample sizes for updating v^ and calculating the estimator l were N = lo3

and N1 = lo5, respectively In the table R E denotes the estimated relative error

2.4450 2.3274 0.2462 0.2113 0.1030 0.0631 0.0082 2.3850 2.3894 0.3136 0.2349 0.1034 0.0644 0.0079 2.3559 2.3902 0.3472 0.2322 0.1047 0.0646 0.0080

Table 5.1 Iterating the five-dimensional vector 0

Note that v^ already converged after the first step, so using likelihood ratios in Steps 2 and 3 did not add anything to the quality of v^ It also follows from the results of Table 5.1 that CE outperforms CMC (compare the relative errors 0.008 and 0.0121 for CE and CMC, respectively) To obtain a similar relative error of 0.008 with CMC would require a sample size of approximately 2.5 l o 5 instead of lo5; we thus obtained a reduction by a factor of 2.5 when using the C E estimation procedure

As we shall see in Chapter 8 for smaller probabilities, a variance reduction of several orders of magnitude can be achieved

5.7 SEQUENTIAL IMPORTANCE SAMPLING

Sequential importance sampling (SIS), also called dynamic importancesampling, is simply

importance sampling carried out in a sequential manner To explain the SIS procedure, consider the expected performance l in (5.39) and its likelihood ratio estimator Fin (5.41)

with f ( x ) the “target” and g(x) the importance sampling, or proposal, pdf Suppose that (a) X is decomposable, that is, it can be written as a vector X = ( X I , , Xn), where each

of the Xi may be multi-dimensional, and (b) it is easy to sample from g(x) sequentially Specifically, suppose that g(x) is of the form

(5.74) where it is easy to generate X1 from density g l ( q ) , and conditional on X1 = 2 1 , the second component from density 92(52121) and so on, until one obtains a single random vector X from g ( x ) Repeating this independently N times, each time sampling from g(x),

one obtains a random sample X I , , XN from g(x) and estimates C according to (5.41)

To further simplify the notation, we abbreviate ( 2 1 , z t ) to x1:t for all t In particular,

~ 1= : x Typically, ~ t can be viewed as a (discrete) time parameter and ~ 1as a path or : ~

trajectory By the product rule of probability (1.4), the target pdf J(x) can also be written sequentially, that is,

g(x) = g1(21) g2(22 1x1) ’ ‘ ’ gn(2n Izlr z n - 1 ) 7

f ( x ) = f ( 2 1 ) f(z2 1 5 1 ) ’ f ( z n 1 X1:n-1) (5.75)

Trang 13

From (5.74) and (5.75) it follows that we can write the likelihood ratio in product form

as

(5.76)

f ( ~ l ) f ( ~ I x l ) " ' f ( x n Ix1:n-l) g1(21)g2(22 I ~ l ) g n ( x n Ix1:n-l) W(x) =

or, if WL(xl:t) denotes the likelihood ratio up to time t , recursively as

Wt(X1:t) = U t Wt-l(Xl:t-l), t = 1, 1 n 7 (5.77) with initial weight Wo(x1:o) = 1 and incremental weights u1 = f(z1)/g1(xi) and

, t = 2 , , n (5.78)

In order to update the likelihood ratio recursively, as in (5.78), one needs to known the marginal pdfs f ( x ~ : ~ ) This may not be easy when f does not have a Markov structure, as

it requires integrating f(x) over all z ~ + ~ , , 2, Instead one can introduce a sequence of

auxiliary pdfs fl, f 2 , , fn that are easily evaluated and such that each ft(xl:t) is a good approximation to f ( x p t ) The terminating pdf fn must be equal to the original f Since

fort = 1 , , n, where we put fo(x1:o) = 1

Remark 5.7.1 Note that the incremental weights ut only need to be defined up to uconstunt,

say c t , for each t In this case the likelihood ratio W(x) is known up to a constant as well, say W(x) = C w ( x ) , where 1/C = E,[w(X)] can be estimated via the corresponding sample mean In other words, when the normalization constant is unknown, one can still estimate e

using the weighted sample estimator (5.43) rather than the likelihood ratio estimator (5.42)

Summarizing, the SIS method can be written as follows

Algorithm 5.7.1 (SIS Algorithm)

I For eachjnite t = 1, , n, sample X t from gt(Zt 1 xpt-1)

2 Compute wt = ut wL-l, where wo = 1 and

Trang 14

SEQUENTIAL IMPORTANCE SAMPLING 143

Consider the random walk on the integers of Example 1.10 on page 19, with probabilities p and q for jumping up or down, respectively Suppose that p < q, so that the walk has a drift toward -m Our goal is to estimate the rare-event probability

C of reaching state K before state 0, starting from state 0 < k << K , where K is a

large number As an intermediate step consider first the probability of reaching K in exactly n steps, that is, P(X, = K ) = IE[IA,,], where A, = {X, = K } We have

f(X1:n) = f(s1 I k ) f ( x 2 1x1) f(53 1x2) ' ' f(zn I %-l) 7

where the conditional probabilities are either p (for upward jumps) or q (for down-

ward jumps) If we simulate the random walk with different upward and downward

probabilities, 6 and ij, then the importance sampling pdf g(x1:,) has the same form as

f(xl:,) above Thus, the importance weight after Step t is updated via the incremental

gives an efficient estimator fore

Trang 15

5.7.1 Nonlinear Filtering for Hidden Markov Models

This section describes an application of SIS to nonlinear filtering Many problems in engineering, applied sciences, statistics, and econometrics can be formulated as hidden Markov models (HMM) In its simplest form, an HMM is a stochastic process { (Xt, Y,)} where X t (which may be multidimensional) represents the true state of some system and

Yt represents the observed state of the system at a discrete time t It is usually assumed that { X , } is a Markov chain, say with initial distribution f ( z 0 ) and one-step transition

probabilities J ( x t I It is important to note that the actual state of the Markov chain

remains hidden, hence the name HMM All information about the system is conveyed by

the process { Y , } We assume that, given X O , , X t , the observation Yt depends only

on X t via some conditional pdf f ( y t I x,) Note that we have used here a Bayesian style

of notation in which all (conditional) probability densities are represented by the same symbol f We will use this notation throughout the rest of this section We denote by

XI:, = ( X I , , Xt) and Ylrt = (Y1, , yt) the unobservable and observable sequences

up to time t , respectively - and similarly for their lowercase equivalents

The HMM is represented graphically in Figure 5.2 This is an example of a Bayesian network The idea is that edges indicate the dependence structure between two variables For

example, given the states X I , , Xt, the random variable Yt is conditionally independent

of X I , , Xt-l, because there is no direct edge from Yt to any of these variables We thus

have J ( y t I X I x t ) = f ( y t I xt), and more generally

Figure 5.2 A graphical representation of the HMM

Định dạng
Số trang	30
Dung lượng	1,35 MB