Tài liệu Digital Signal Processing Handbook P28 docx

Inverse Problems, Statistical Mechanics and Simulated Annealing K.. Venkatesh Prasad Ford Motor Company 28.1 Background 28.2 Inverse Problems in DSP 28.3 Analogies with Statistical Mecha

Trang 1

K Venkatesh Prasad “Inverse Problems, Statistical Mechanics and Simulated Annealing.”

2000 CRC Press LLC <http://www.engnetbase.com>.

Trang 2

Inverse Problems, Statistical Mechanics and Simulated

Annealing

K Venkatesh Prasad

Ford Motor Company

28.1 Background 28.2 Inverse Problems in DSP 28.3 Analogies with Statistical Mechanics

Combinatorial Optimization The Metropolis Criterion Gibbs’ Distribution

28.4 The Simulated Annealing Procedure Defining Terms

References Further Reading

28.1 Background

The focus of this chapter is on inverse problems — what they are, where they manifest themselves

in the realm of digital signal processing (DSP), and how they might be “solved1.” Inverse problems

deal with estimating hidden causes, such as a set of transmitted symbols ftg, given observable effects

such as a set of received symbolsfrg and a system (H) responsible for mapping ftg into frg Inverse problems are succinctly stated using vector-space notation and take the form of estimating t2 R M,

given:

where r 2 R Nand H 2 R MN andR denotes the space of real numbers whose dimensions are

specified in the superscript(s) Such problems call for the inversion of H, an operation which may or

may not be numerically possible We will shortly address these issues, but we should note here for

completeness that these problems contrast with direct problems — where r is to be directly (without

matrix inversion) estimated, given H and t.

1 The quotes are used to stress that unique deterministic solutions might not exist for such problems and the observed effects might not continuously track the underlying causes Formally speaking, this is a result of such problems of being

ill-posed in the sense of Hadamard [1 ] What is typically sought is an optimal solution, such as a minimum norm/minimum energy solution.

Trang 3

28.2 Inverse Problems in DSP

Inverse problems manifest themselves in a broad range of DSP applications in fields as diverse as digital astronomy, electronic communications, geophysics [2], medicine [3], and oceanography The core of all these problems takes the form shown in Eq (28.1) This, in fact, is the discrete version of the Fredholm integral equation of the first kind for which, by definition2, the limits of integration are

fixed and the unknown function f appears only inside the integral To motivate our discussion, we

will describe an application-specific problem, and in the process introduce some of the notations and concepts to be used in the later sections The inverse problem in the field of electronic communications

has to do with estimating t, given r which is often received with noise, commonly modeled to be

additive white Gaussian (AWG) in nature The communication system and the transmission channel

are typically stochastically characterizable and are represented by a linear system matrix (H) The problem, therefore, is to solve for t in the system of linear equations:

where vector n denotes AWG noise Two tempting solutions might come to mind: if matrix H is invertible, i.e., H−1exists, then why not solve for t as:

or else why not compute a minimum-norm solution such as the pseudoinverse solution:

where H†is referred to as the pseudoinverse [5] of H and is defined to be TH0HU−1H0, where H0

denotes the transpose of H There are several reasons why neither solution [Eqs (28.3) or (28.4)] might be viable One reason is that the dimensions of the system might be extremely large, placing a

greater computational load than might be affordable Another reason is that H is often numerically

ill-conditioned, implying that inversions or pseudo-inversions might not be reliable even if otherwise reliable numerical inversion procedures, such as Gaussian elimination or singular value decompo-sition [6,19], were to be employed Furthermore, even if preconditioning [6] were possible on the

system of linear equations r D Ht Cn, resulting in a numerical improvement of the coefficients of H,

there is one even more overbearing hurdle that has often to be dealt with, and this has to do with the

fact that such problems are frequently ill-posed In practical terms3this means that small changes in the inputs might result in arbitrarily large changes in outputs For all these reasons the most tempting solution-approaches are often ruled out As we describe in the next section, inverse problems may be recast as combinatorial optimization problems We will then show how combinatorial optimization

problems may be solved using a powerful tool called simulated annealing [7] that has evolved from our understanding of statistical mechanics [8] and the simulation of the annealing (cooling) behavior

of physical matter [9]

2 There exist two classes of integral equations ([ 4 ], pg 865): if the limits of integration are fixed, the equations are referred to as Fredholm integral equations; if one of the limits is a variable the equations are referred to as Volterra integral equations Further, if the unknown function appears only inside the integral the equation is called “first kind”, but if it appears both inside and outside the integral the equation is called “second kind”.

3 For a more complete description see [ 1 ].

Trang 4

28.3 Analogies with Statistical Mechanics

Understanding the analogies of inverse problems in DSP to problems in statistical mechanics is valuable to us because we can then draw upon the analytical and computational tools developed over the past century to solve inverse problems in the field of statistical mechanics [8] The broad analogy

is that just as the received symbols r in Eq (28.1) are the observed effects of hidden underlying

causes (the transmitted symbols t) — the measured temperature and state (solid, liquid, or gaseous)

of physical matter are the effects of underlying causes such as the momenta and velocities of the particles that compose the matter A more specific analogy comes from the reasoning that if the inverse problem were to be treated as a combinatorial optimization problem, where each candidate

solution is one possible configuration (or combination of the scalar elements of t), then we could use

the criterion developed by Metropolis et al [9] for physical systems to select the optimal configuration The Metropolis criterion is based on the assumption that candidate configurations have probabilistic distributions of the form originally described by Gibbs [8] to guarantee statistical equilibrium of ensembles of systems In order to apply Metropolis’ selection criterion, we must make one final analogy: we need to treat the combinatorial optimization problem as if it were the outcome of an imaginary physical system in which matter has been brought to boil When such a physical system is gradually cooled (a process referred to as annealing) then, provided the cooling rate is neither too fast nor too slow, the system will eventually solidify into a minimum energy configuration As depicted in Fig.28.1to solve inverse problems we first recast the problem as a combinatorial optimization problem

and then solve this recasted problem using simulated annealing — a procedure that numerically

mimics the annealing of physical systems In this section we will describe the basic principles of combinatorial optimization, Metropolis’ criterion to select or discard potential configurations, and the origins of Gibbs’ distribution We will outline the simulated annealing algorithm in the following section and will follow that with examples of implementation and applications

FIGURE 28.1: The direct path(a ! d) to solving the inverse problem is often not viable since it

relies on the inversion of a system matrix An optimal solution, however, may be obtained by an indirect path(a ! b ! c ! d) which involves recasting the inverse problem as an equivalent

combinatorial optimization problem and then solving this problem using simulated annealing

Trang 5

28.3.1 Combinatorial Optimization

The optimal solution to the inverse problem [Eq (28.1)], as explained above, amounts to estimating

vector t Under the assumptions enumerated below, the inverse problem can be recast as a

combina-torial problem whose solution then yields the desired optimal solution to the inverse problem The assumptions required are:

1 Each (scalar) elementt(i), 1 i M, of t 2 R Mcan take on only a finite set of finite

values That is−1 < t j (i)< 1I 8i&j, where t j (i) denotes the jth possible value that

theith element of t can take, and j is a finite valued index j J i < 1I 8i J idenotes

the number of possible values theith element of t can take.

2 Let each combination ofM scalar values t(i) of t be referred to as a candidate vector or a

feasible configuration tk, where the indexk K < 1 Associated with each candidate

vector tk we must have a quantifiable measure of error, cost, or energy (E k ).

Given the above assumptions, the combinatorial form of the inverse problem may be stated as: out

ofK possible candidate vectors t k , 1 k K, search for the vector t koptwith the lowest errorE kopt Although easily stated, the time and computational efficiency with which the solution is obtained hinges on at least two significant factors — the design of the error-function and the choice of the search strategy The error-function(E k ) must provide a quantifiable measure of dissimilarity or distance, between a feasible configuration (t k ) and the true (but unknown) configuration (ttrue), i.e.,

whered denotes a distance function The goal of the combinatorial optimization problem is to

efficiently search through the combinatorial space and stop at the optimal, minimum-error(Eopt),

configuration — tkopt:

EoptD Etkopt

wherekoptdenotes the value of indexk associated with the optimal configuration In the ideal case,

whenδ D 0, from Eq (28.5), we have that tkopt D ttrue In practice, however, owing to a combination

of factors such as noise (Eq.28.2), or the system (Eq.28.1) being underdetermined,Eopt D δ > 0,

implying that tkopt 6D ttrue, but that tkopt is the best possible solution given what is known about the problem and its solutions In general the error-function must satisfy the requirements of a distance function or metric (adapted from [10], pg 237):

E (t k ) D 0 <D> t k D ttrue,ψ(28.7a)

E (t k ) D E (−t k ) D d (ttrue 1 − tk ) ,ψ(28.7b)

E (t k ) E −t j

C d t k− tj

,ψ(28.7c)

where Eq (28.7a) follows from Eq (28.5), and where, likek, index j is defined in the range (1, K )

andK < 1 Eq (28.7a) stated that if the error is zero,t kis the true configuration The implication

of Eq (28.7b) is that error is a function of the absolute value of the distance of a configuration from the true configuration Eq (28.7c) implies that the triangle inequality law holds

In designing the error-function, one can classify the sources of error into two distinct categories: The first category of error, denoted byE ksignal, provides a measure of error (or distance) between the observed signal(r k ) and the estimated signal (Or k ) — computed for the current configuration t k

using Eq (28.1) The second category, denoted byEconstraints

k , accounts for the price to be “paid”

when an estimated solution deviates from the constraints we would want to impose on them based

on our understanding of the physical world The physical world, for instance, might suggest that

Trang 6

each element of the signal is very probably positive valued In this case, a negative valued estimate

of a signal element will result in an error-value that is proportionate to the magnitude of the signal negativity This constraint is popularly known as the non-negativity constraint Another constraint might arise from the assumption that the solution is expected to be smooth [11]:

Ot0S Ot D δsmooth, (28.8) whereS is a smoothing matrix and δsmoothis the degree of smoothness of the signal The error-function, therefore, takes the following form:

E k D E 1 signal

k C Econstraints

Esignalk D kr1 k− Orkk2 where

Econstraints

c2C

(α c E c ) ,

(28.9)

whereEconstraintsrepresents the total error from all other factors or constraints that might be imposed

on the solution,fCg represents the set of constraint indices, and α candE crepresent the weight and the error-function, respectively, associated withcth constraint.

28.3.2 The Metropolis Criterion

The core task in solving the combinatorial optimization described above is to search for a configuration

tk for which the error-function E K is a minimum Standard gradient descent methods [6, 12,

13] would have been the natural choice had theE k been a function with just one minimum (or maximum) value, but this function typically has multiple minimas (or maximas) — gradient descent methods would tend to get locked into a local minimum The simulated annealing procedure (Fig.28.2— discussed in the next section), suggested by Metropolis et al [9] for the problem of finding stable configurations of interacting atoms and adapted for combinatorial optimization by Kirkpatrick [7], provides a scheme to traverse the surface of theE k, get out of local minimas, and

eventually cool into a global minimum The contribution of Metropolis et al., commonly referred to

in the literature as Metropolis’ criterion, is based on the assumption that the difference in the error

of two consecutive feasible configurations (denoted as1E D E 1 kC1 − E k ) takes the form of Gibbs’

distribution [Eq (28.11)] The criterion states that even if a configuration were to result in increased error, i.e.,1E > 0, one can select the new configuration if:

where random denotes a random number drawn from a uniform distribution in the range [0,1) and

T denotes a the temperature of the physical system.

28.3.3 Gibbs’ Distribution

At the turn of the 20th century, Gibbs [8], building upon the work of Clausius, Maxwell, and Boltzmann in statistical mechanics, proposed the probability distribution P:

whereψ and 2 were constants and denoted the free energy in a system This distribution was crafted

to satisfy the condition of statistical equilibrium ([8], pg 32) for ensembles of (thermodynamical)

Trang 7

FIGURE 28.2: The outline of the annealing algorithm.

systems:

X dP

dp1

Pp iC dP

dq1

Pq i

wherep i andq i represented the generalized momentum and velocity, respectively, of theith degree

of freedom The negative sign on in Eq (28.11) was required to satisfy the condition:

Z

| {z } all phases

28.4 The Simulated Annealing Procedure

The simulated annealing algorithm as outlined in Fig.28.2mimics the annealing (or controlled cool-ing) of an imaginary physical system The unknown parameters are treated like particles in a physical system An initial configurationtinitialis chosen along with an initial (“boiling”) temperature value

(Tinitial) The choice of Tinitial is made so as to ensure that a vast majority, say 90%, of configura-tions are acceptable even if they result in a negative1E k The initial configuration is perturbed, either by using a random number generator or by sequential selection, to create a second configura-tion, and1E2is computed The Metropolis criterion is applied to decide whether or not to accept the new configuration After equilibrium is reached, i.e., afterj1E2j δequilib, whereδequilibis a small heuristically chosen threshold, the temperature is lowered according to a cooling schedule and

the process is repeated until a pre-selected frozen temperature is reached Several different cooling

schedules have been proposed in the literature ([18], pg 59) In one popular schedule [18,19] each

Trang 8

subsequent temperatureT kC1is less than the current temperatureT k, by a fixed percentage ofT k, i.e.,

T kC1 D β k T k, whereβ k is typically in the range of 0.8 to unity Based on the behavior of physical systems which attain minimum (free) energy (or global minimum) states when they freeze at the end

of an annealing process, the assumption underlying the simulated annealing procedure is that the

toptthat is finally attained is also globally minimum

The results of applying the simulated annealing procedure to the problems of three-dimensional signal restoration [14] is shown in Fig.28.3 In this problem, a defocused image, vector r, of an

opaque eight-step staircase object was provided along with the space-varying point-spread-function

matrix (H), and a well-focused image The unknown vector t represented the intensities of the

volume elements (voxels) with the visible voxels taking on positive values and hidden voxels having

a value of zero The vector t was lexicographically indexed so that by knowing which elements of

t were positive, one could reconstruct the three-dimensional structure Using simulated annealing,

and constraints (opacity, non-negativity of intensity, smoothness of intensity and depth, and tight bounds on the voxel intensity values obtained from the well-focused image), the original object was reconstructed

FIGURE 28.3: Three-dimensional signal recovery using simulated annealing The staircase object shown corresponding to era 17 is recovered from a defocused image by testing a number of feasible configurations and applying the Metropolis criterion to a simulated annealing procedure

Defining Terms

In the following definitions, as in the preceding discussion, t2 R M , r 2 R N, and H2 R MN.

Combinatorial Optimization: The process of selecting the optimal (lowest-cost) configuration

from a large space of candidate or feasible configurations

Configuration: Any vector t is a configuration The term is used in the combinatorial

opti-mization literature

Trang 9

Cost/energy/error function: The terms cost, energy, or error function are frequently used

in-terchangeably in the literature Cost function is often used in the optimization literature

to represent the mapping of a candidate vector into a (scalar) functional whose value is indicative of the optimality of the candidate vector Energy function is frequently used in electronic communication theory as a pseudonym for theL2norm or root-mean-square value of a vector Error function is typically used to measure a mismatch between an estimated (vector) and its expected value For purposes of this discussion we use the terms cost, energy, and error function interchangeably

Gibbs’ distribution: The distribution (in reality a probability density function (pdf)) in which

theη the index of probability (P) is a linear function of energy, i.e., η D log P D ψ− 2 , whereψ and 2 are constants and represents energy, giving the familiar pdf:

P D exp ψ − 2 , (28.14)

Inverse problem: Given matrix H and vector r, find t that satisfies r D Ht.

Metropolis’ criterion: The criterion first suggested by Metropolis et al [9] to decide whether

or not to accept a configuration that results in an increased error, when trying to search for minimum error configurations in a combinatorial optimization problem

Minimum-norm: The norm between two vectors is a (scalar) measure of distance (such as the

L1, L2) (or Euclidean), L1norms or the Mahalanobis distance ([10], pg 24), or the Manhattan metric [7]) between them Minimum-norm, unless otherwise noted, implies minimum Euclidean(L2) norm (denoted by k k):

min

Pseudoinverse: Let toptbe the unique minimum norm vector, therefore,

Htopt− r min

The pseudoinverse of matrix H denoted by H†2 R NMis the matrix mapping all r into

its corresponding topt

Statistical mechanics: That branch of mechanics in which the problem is to find the statistical

distribution of the parameters of ensembles (large numbers) of systems (each differing not just infinitesimally, but embracing every possible combination of the parameters) at a desired instant in time, given those distributions at the present time Maxwell, according

to Gibbs [8], coined the term “statistical mechanics” This field owes its origin to the desire to explain the laws of thermodynamics as stated by Gibbs ([8], pg viii): “The laws

of thermodynamics, as empirically determined, express the approximate and probable behavior of systems of a great number of particles, or, more precisely, they express the laws of mechanics for such systems as they appear to beings who have not the fineness

of perception to enable them to appreciate quantities of the order of magnitude of those which relate to single particles, and who cannot repeat their experiments often enough

to obtain any but the most probable results”

References

[1] Hadamard, J., Sur les problèmes aux dérivés partilles et leur signification physique, Bull 13, Princeton University, 1902

Trang 10

[2] Frolik, J.L and Yagle A.E., Reconstruction of multilayered lossy dielectrics from plane-wave impulse responses at 2 angles of incidence,IEEE Trans Geosci Remote Sens., 33: 268–279,

March, 1995

[3] Greensite, F., Well-posed formulation of the inverse problem of electrocardiography,Ann Biomed Eng., 22 (2): 172–183, 1994.

[4] Arfken, G.,Mathematical Methods for Physicists, Academic Press, 1985.

[5] Greville, T.N.E., The pseudoinverse of a rectangular or singular matrix and its application to the solution of systems of linear equations,SIAM Rev 1: 38–43, 1959.

[6] Golub, G.H and Van Loan, C.F.,Matrix Computations, 2nd ed., The Johns Hopkins University

Press, Baltimore, 1989

[7] Kirkpatrick, S., Optimization by simulated annealing: quantitative studies,J Stat Phys., 34(5,

6): 975–986, 1984

[8] Gibbs, J.W.,Elementary Particles in Statistical Mechanics, Yale University Press, New Haven,

1902

[9] Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E., Equation of state calculations by fast computing machines,J Chem Phys., 21: 1087–1092, June, 1953.

[10] Duda, R.O and Hart, P.E.,Pattern Classification and Scene Analysis, John Wiley, 1973.

[11] Pratt, W.K.,Digital Image Processing, John Wiley, New York, 1978.

[12] Luenberger, D.G.,Optimization by Vector Space Methods, John Wiley & Sons, New York, 1969.

[13] Gill, P.E and Murray, W., Quasi-Newton methods for linearly constrained optimization, in

Numerical Methods for Constrained Optimization, Gill, P.E and Murray, W., Eds., Academic

Press, London, 1974

[14] Prasad, K.V., Mammone, R.J., and Yogeshwar, J., 3-D image restoration using constrained optimization techniques,Opt Eng., 29: 279–288, April, 1990.

[15] Tikhonov, A.N and Arsenin, V.Y.,Solutions of Ill-Posed Problems, V.H Winston & Sons,

Washington, D.C., 1977

[16] Soumekh, M., Reconnaissance with ultra wideband UHF synthetic aperture radar,IEEE Acoust Speech, Signal Process., 12: 21–40, July, 1995.

[17] van Laarhoven, P.J.M and Aarts, E.H.L.,Simulated Annealing: Theory and Applications, D.

Riedel, Dordrecht, Holland, 1987

[18] Aarts, E and Korst, J.,Simulated Annealing and Boltzmann Machines, John Wiley, New York,

1989

[19] Press, W.H., Flannery, B.D., Teukolsky, S.A., and Vetterling, W.T.,Numerical Recipes in C,

Cambridge University Press, U.K., 1988

[20] Geman, S and Geman, D., Stochastic relaxation, Gibbs distributions and the Bayesian restora-tions of images,IEEE Trans Patt Recog Mach Intell., PAMI- 6: 721–741, November, 1984.

Further Reading

Inverse problems — The classic by Tikhonov [15] provides a good introduction to the subject matter For a description of inverse problems related to synthetic aperture radar application see [16] Statistical mechanics — Gibbs’ [8] work is historical treasure

Vector spaces and optimization — The books by Leunberger [12] and Gill and Murray [13] provide

a broad introductory foundation

Simulated annealing — Two recent books by van Laarhoven and Aarts [17] and Aarts and Korst [18] contain a comprehensive coverage of the theory and application of simulated annealing A useful simulated annealing algorithm, along with tips for numerical implementation and random number generation, can be found inNumerical Recipes in C [19] An alternative simulated annealing

Tiêu đề	Inverse Problems, Statistical Mechanics and Simulated Annealing
Tác giả	K Venkatesh Prasad
Trường học	CRC Press LLC
Chuyên ngành	Digital Signal Processing (DSP)
Thể loại	Tài liệu
Năm xuất bản	2000

Định dạng
Số trang	11
Dung lượng	144,04 KB