David G. Luenberger, Yinyu Ye - Linear and Nonlinear Programming International Series Episode 1 Part 10 pot

8.2 Line Search by Curve Fitting 219It follows from 1 that the interval of uncertainty at any point in the process haswidth dk= 1 The Fibonacci search method has a certain amount of theo

Trang 1

N= 2

N= 3

N= 4

N= 5 4

4

3 3

2 2 2 2

1 1 1 1

1 1

1

Fig 8.2 Fibonacci search

The solution to the Fibonacci difference equation

2 2=1−

√5

2 (The number 1 1618 is known as the golden section ratio and was considered

by early Greeks to be the most aesthetic value for the ratio of two adjacent sides

of a rectangle.) For large N the first term on the right side of (4) dominates thesecond, and hence

limN→

FN−1

F = 1

0618

Trang 2

8.2 Line Search by Curve Fitting 219

It follows from (1) that the interval of uncertainty at any point in the process haswidth

dk=

1

The Fibonacci search method has a certain amount of theoretical appeal, since

it assumes only that the function being searched is unimodal and with respect

to this broad class of functions the method is, in some sense, optimal In mostproblems, however, it can be safely assumed that the function being searched, aswell as being unimodal, possesses a certain degree of smoothness, and one might,therefore, expect that more efficient search techniques exploiting this smoothnesscan be devised; and indeed they can Techniques of this nature are usually based

on curve fitting procedures where a smooth curve is passed through the previouslymeasured points in order to determine an estimate of the minimum point A variety

of such techniques can be devised depending on whether or not derivatives of thefunction as well as the values can be measured, how many previous points areused to determine the fit, and the criterion used to determine the fit In this section

a number of possibilities are outlined and analyzed All of them have orders ofconvergence greater than unity

Newton’s Method

Suppose that the function f of a single variable x is to be minimized, and supposethat at a point xkwhere a measurement is made it is possible to evaluate the threenumbers fxk, fxk, fxk It is then possible to construct a quadratic function

q which at xkagrees with f up to second derivatives, that is

Trang 3

This process, which is illustrated in Fig 8.3, can then be repeated at xk+1.

We note immediately that the new point xk+1resulting from Newton’s methoddoes not depend on the value fxk The method can more simply be viewed as atechnique for iteratively solving equations of the form

This form is illustrated in Fig 8.4

We now show that Newton’s method has order two convergence:

x∗ satisfy gx∗= 0, gx∗ = 0 Then, provided x0 is sufficiently close to x∗, the sequence xk k=0 generated by Newton’s method (9) converges to x∗ with

an order of convergence at least two.

Proof. For points in a region near x∗ there is a k1such that 1and a

k2such that 2 Then since gx∗= 0 we can write

xk+1− x∗= xk− x∗−gxk− gx∗

gxk

= −gxk− gx∗+ gx x∗− xk/gx

Trang 4

g

Fig 8.4 Newton’s method for solving equations

The term in brackets is, by Taylor’s theorem, zero to first-order In fact, using theremainder term in a Taylor series expansion about xk, we obtain

xk+1− x∗=1

2

g

gxkxk− x∗2for some between x∗and xk Thus in the region near x∗,

Method of False Position

Newton’s method for minimization is based on fitting a quadratic on the basis ofinformation at a single point; by using more points, less information is required ateach of them Thus, using fxk, fxk, fxk−1 it is possible to fit the quadratic

by finding the point where the derivative of q vanishes; thus

Trang 5

x k x k+1 x k–1 x

f q

Fig 8.5 False position for minimization

either fxk or fxk−1 Also the formula can be regarded as an approximation toNewton’s method where the second derivative is replaced by the difference of twofirst derivatives

Again, since this method does not depend on values of f directly, it can beregarded as a method for solving fx≡ gx = 0 Viewed in this way the method,which is illustrated in Fig 8.6, takes the form

xk+1= xk− gxk

xk− xk−1gxk− gxk−1

We next investigate the order of convergence of the method of false positionand discover that it is order 1 1618, the golden mean

such that gx∗= 0, gx∗ = 0 Then for x0 sufficiently close to x∗, the sequence

xk k=0 generated by the method of false position (11) converges to x∗ with order 1 1618.

g

Fig 8.6 False position for solving equations

Trang 6

Proof. Introducing the notation

= xk− x∗

gxk−1 xk− gxk x∗gxk−1 xk

Trang 7

k= xk− x∗ we have, in the limit,

Having derived the error formula (17) by direct analysis, it is now appropriate

to point out a short-cut technique, based on symmetry and other considerations,that can sometimes be used in even more complicated situations The right side of

k k −1, since it is derived fromapproximations based on Taylor’s theorem Furthermore, it must be second order,since the method reduces to Newton’s method when xk= xk−1 Also, it must go

k k−1, since the order of points is

k +1 k k −1.

Cubic Fit

Given the points xk−1 and xk together with the values fxk−1, fxk−1, fxk,

fxk, it is possible to fit a cubic equation to the points having correspondingvalues The next point xk+1 can then be determined as the relative minimum point

of this cubic This leads to

Trang 8

It can be shown (see Exercise 3) that the order of convergence of the cubic fitmethod is 2.0 Thus, although the method is exact for cubic functions indicatingthat its order might be three, its order is actually only two

Quadratic Fit

The scheme that is often most useful in line searching is that of fitting a quadraticthrough three given points This has the advantage of not requiring any derivativeinformation Given x1 x2 x3 and corresponding values fx1= f1 fx2=

f2 fx3= f3we construct the quadratic passing through these points

and determine a new point x4as the point where the derivative of q vanishes Thus

x4=12

b23f1+ b31f2+ b12f3

a23f1+ a31f2+ a12f3 (21)where aij= xi− xj bij= x2

i− x2

j

1 2 3 It must be second order (since it is a quadratic fit)

1 2 3is zero (The reader shouldcheck this.) Finally, it must be symmetric (since the order of points is relevant) Itfollows that near a minimum point x∗of f , the errors are related approximately by

where M depends on the values of the second and third derivatives of f at x∗

k→ 0 with an order greater than unity, then for large k theerror is governed approximately by

k +2 k k −1Letting yk kthis becomes

yk+2= yk+ yk−1with characteristic equation

3− − 1 = 0

The largest root of this equation is 13 which thus determines the rate of growth

of y and is the order of convergence of the quadratic fit method

Trang 9

Approximate Methods

In practice line searches are terminated before they have converged to the actualminimum point In one method, for example, a fairly large value for x1 is chosenand this value is successively reduced by a positive factor < 1 until a sufficientdecrease in the function value is obtained Approximate methods and suitablestopping criteria are discussed in Section 8.5

Above, we analyzed the convergence of various curve fitting procedures in theneighborhood of the solution point If, however, any of these procedures wereapplied in pure form to search a line for a minimum, there is the danger—alas, themost likely possibility—that the process would diverge or wander about meaning-lessly In other words, the process may never get close enough to the solution forour detailed local convergence analysis to be applicable It is therefore important toartfully combine our knowledge of the local behavior with conditions guaranteeingglobal convergence to yield a workable and effective procedure

The key to guaranteeing global convergence is the Global ConvergenceTheorem of Chapter 7 Application of this theorem in turn hinges on theconstruction of a suitable descent function and minor modifications of a pure curvefitting algorithm We offer below a particular blend of this kind of constructionand analysis, taking as departure point the quadratic fit procedure discussed inSection 8.2 above

Let us assume that the function f that we wish to minimize is strictly unimodaland has continuous second partial derivatives We initiate our search procedure bysearching along the line until we find three points x1 x2 x3with x1< x2< x3suchthat fx1 fx2 fx3 In other words, the value at the middle of these threepoints is less than that at either end Such a sequence of points can be determined

in a number of ways—see Exercise 7

The main reason for using points having this pattern is that a quadratic fit tothese points will have a minimum (rather than a maximum) and the minimum pointwill lie in the interval x1 x3 See Fig 8.7 We modify the pure quadratic fit

algorithm so that it always works with points in this basic three-point pattern.

The point x4is calculated from the quadratic fit in the standard way and fx4

is measured Assuming (as in the figure) that x2< x4< x3, and accounting for theunimodal nature of f , there are but two possibilities:

Trang 10

8.3 Global Convergence of Curve Fitting 227

To prove convergence, we note that each three-point pattern can be thought

of as defining a vector x in E3 Corresponding to an x= x1 x2 x3 such that

x1 x2 x3 form a three-point pattern with respect to f , we define Ax=

¯x1 ¯x2 ¯x3 as discussed above For completeness we must consider the case wheretwo or more of the xi i= 1 2 3 are equal, since this may occur The appropriatedefinitions are simply limiting cases of the earlier ones For example, if x1= x2,then x1 x2 x3 form a three-point pattern if fx2 fx3 and fx2 < 0 (which

is the limiting case of fx2 < fx1) A quadratic is fit in this case by using thevalues at the two distinct points and the derivative at the duplicated point In case

x1= x2= x3 x1 x2 x3 forms a three-point pattern if fx2= 0 and fx

2 0

With these definitions, the map A is well defined It is also continuous, since curve

fitting depends continuously on the data

We next define the solution set ⊂ E3as the points x∗= x∗ x∗ x∗ where

fx∗= 0

Finally, we let Zx= fx1+fx2+fx3 It is easy to see that Z is a descent

function for A After application of A one of the values fx1 fx2 fx3 will

be replaced by fx4, and by construction, and the assumption that f is unimodal,

it will replace a strictly larger value Of course, at x∗= x∗ x∗ x∗ we have

Ax∗= x∗and hence ZAx∗= Zx∗.

Since all points are contained in the initial interval, we have all the requirementsfor the Global Convergence Theorem Thus the process converges to the solution

Trang 11

The order of convergence may not be destroyed by this modification, if near thesolution the three-point pattern is always formed from the previous three points Inthis case we would still have convergence of order 1.3 This cannot be guaranteed,however.

It has often been implicitly suggested, and accepted, that when using thequadratic fit technique one should require

fxk+1 < fxk

so as to guarantee convergence If the inequality is not satisfied at some cycle, then aspecial local search is used to find a better xk+1that does satisfy it This philosophy

amounts to taking Zx= fx3 in our general framework and, unfortunately, this

is not a descent function even for unimodal functions, and hence the special localsearch is likely to be necessary several times It is true, of course, that a similarspecial local search may, occasionally, be required for the technique we suggest inregions of multiple minima, but it is never required in a unimodal region

The above construction, based on the pure quadratic fit technique, can beemulated to produce effective procedures based on other curve fitting techniques.For application to smooth functions these techniques seem to be the best available interms of flexibility to accommodate as much derivative information as is available,fast convergence, and a guarantee of global convergence

ALGORITHMS

Since searching along a line for a minimum point is a component part of mostnonlinear programming algorithms, it is desirable to establish at once that thisprocedure is closed; that is, that the end product of the iterative procedures outlinedabove, when viewed as a single algorithmic step finding a minimum along a line,define closed algorithms That is the objective of this section

To initiate a line search with respect to a function f , two vectors must be

specified: the initial point x and the direction d in which the search is to be made The result of the search is a new point Thus we define the search algorithm S as a

mapping from E2nto En

We assume that the search is to be made over the semi-infinite line emanating

from x in the direction d We also assume, for simplicity, that the search is not

made in vain; that is, we assume that there is a minimum point along the line This

will be the case, for instance, if f is continuous and increases without bound as x

tends toward infinity

Definition The mapping S E2n→ En is defined by

Sx d = y y = x + d for some 0 fy = min

0 fx (23)

In some cases there may be many vectors y yielding the minimum, so S is a set-valued mapping We must verify that S is closed.

Trang 12

8.4 Closedness of Line Search Algorithms 229

closed at (x, d) if d = 0.

also that yk∈ Sxk dk and that yk→ y We must show that y ∈ Sx d.

For each k we have yk= xk+ kdkfor some k From this we may write

k= k− xk

k

Taking the limit of the right-hand side of the above, we see that

k→ ≡

It then follows that y = x + d It still remains to be shown that y ∈ Sx d.

For each k and each 0 < ,

The requirement that d = 0 is natural both theoretically and practically From

a practical point of view this condition implies that, when constructing algorithms,

the choice d = 0 had better occur only in the solution set; but it is clear that if

d = 0, no search will be made Theoretically, the map S can fail to be closed at

d = 0, as illustrated below.

Example. On E1define fx= x−12 Then Sx d is not closed at x= 0 d=0

To see this we note that for any d > 0

Trang 13

8.5 INACCURATE LINE SEARCH

In practice, of course, it is impossible to obtain the exact minimum point called for

by the ideal line search algorithm S described above As a matter of fact, it is often

desirable to sacrifice accuracy in the line search routine in order to conserve overallcomputation time Because of these factors we must, to be realistic, be certain, atevery stage of development, that our theory does not crumble if inaccurate linesearches are introduced

Inaccuracy generally is introduced in a line search algorithm by simply nating the search procedure before it has converged The exact nature of theinaccuracy introduced may therefore depend on the particular search techniqueemployed and the criterion used for terminating the search We cannot develop atheory that simultaneously covers every important version of inaccuracy withoutseriously detracting from the underlying simplicity of the algorithms discussedlater For this reason our general approach, which is admittedly more free-wheeling

termi-in spirit than necessary but thereby more transparent and less encumbered than adetailed account of inaccuracy, will be to analyze algorithms as if an accurate linesearch were made at every step, and then point out in side remarks and exercisesthe effect of inaccuracy

In the remainder of this section we present some commonly used criteria forterminating a line search

Percentage Test

One important inaccurate line search algorithm is the one that determines thesearch parameter to within a fixed percentage of its true value Specifically, aconstant c 0 < c < 1 is selected (c= 010 is reasonable) and the parameter

in the line search is found so as to satisfy

minimizing value of the parameter This criterion is easy to use in conjunctionwith the standard iterative search techniques described in the first sections of thischapter For example, in the case of the quadratic fit technique using three-pointpatterns applied to a unimodal function, at each stage it is known that the trueminimum point lies in the interval spanned by the three-point pattern, and hence

a bound on the maximum possible fractional error at that stage is easily deduced.One iterates until this bound is no greater than c It can be shown (see Exercise 13)that this algorithm is closed

Armijo’s Rule

A practical and popular criterion for terminating a line search is Armijo’s rule Theessential idea is that the rule should first guarantee that the selected is not toolarge, and next it should not be too small Let us define the function

= fxk + dk

Armijo’s rule is implemented by consideration of the function 0 0 for

Định dạng
Số trang	25
Dung lượng	491,18 KB