8.2 Line Search by Curve Fitting 219It follows from 1 that the interval of uncertainty at any point in the process haswidth dk= 1 The Fibonacci search method has a certain amount of theo
Trang 1N= 2
N= 3
N= 4
N= 5 4
4
3 3
2 2 2 2
1 1 1 1
1 1
1
Fig 8.2 Fibonacci search
The solution to the Fibonacci difference equation
2 2=1−
√5
2 (The number 1 1618 is known as the golden section ratio and was considered
by early Greeks to be the most aesthetic value for the ratio of two adjacent sides
of a rectangle.) For large N the first term on the right side of (4) dominates thesecond, and hence
limN→
FN−1
F = 1
0618
Trang 28.2 Line Search by Curve Fitting 219
It follows from (1) that the interval of uncertainty at any point in the process haswidth
dk=
1
The Fibonacci search method has a certain amount of theoretical appeal, since
it assumes only that the function being searched is unimodal and with respect
to this broad class of functions the method is, in some sense, optimal In mostproblems, however, it can be safely assumed that the function being searched, aswell as being unimodal, possesses a certain degree of smoothness, and one might,therefore, expect that more efficient search techniques exploiting this smoothnesscan be devised; and indeed they can Techniques of this nature are usually based
on curve fitting procedures where a smooth curve is passed through the previouslymeasured points in order to determine an estimate of the minimum point A variety
of such techniques can be devised depending on whether or not derivatives of thefunction as well as the values can be measured, how many previous points areused to determine the fit, and the criterion used to determine the fit In this section
a number of possibilities are outlined and analyzed All of them have orders ofconvergence greater than unity
Newton’s Method
Suppose that the function f of a single variable x is to be minimized, and supposethat at a point xkwhere a measurement is made it is possible to evaluate the threenumbers fxk, fxk, fxk It is then possible to construct a quadratic function
q which at xkagrees with f up to second derivatives, that is
Trang 3This process, which is illustrated in Fig 8.3, can then be repeated at xk+1.
We note immediately that the new point xk+1resulting from Newton’s methoddoes not depend on the value fxk The method can more simply be viewed as atechnique for iteratively solving equations of the form
This form is illustrated in Fig 8.4
We now show that Newton’s method has order two convergence:
x∗ satisfy gx∗= 0, gx∗ = 0 Then, provided x0 is sufficiently close to x∗, the sequence xk k=0 generated by Newton’s method (9) converges to x∗ with
an order of convergence at least two.
Proof. For points in a region near x∗ there is a k1such that 1and a
k2such that 2 Then since gx∗= 0 we can write
xk+1− x∗= xk− x∗−gxk− gx∗
gxk
= −gxk− gx∗+ gx x∗− xk/gx
Trang 48.2 Line Search by Curve Fitting 221
g
Fig 8.4 Newton’s method for solving equations
The term in brackets is, by Taylor’s theorem, zero to first-order In fact, using theremainder term in a Taylor series expansion about xk, we obtain
xk+1− x∗=1
2
g
gxkxk− x∗2for some between x∗and xk Thus in the region near x∗,
Method of False Position
Newton’s method for minimization is based on fitting a quadratic on the basis ofinformation at a single point; by using more points, less information is required ateach of them Thus, using fxk, fxk, fxk−1 it is possible to fit the quadratic
by finding the point where the derivative of q vanishes; thus
Trang 5x k x k+1 x k–1 x
f q
Fig 8.5 False position for minimization
either fxk or fxk−1 Also the formula can be regarded as an approximation toNewton’s method where the second derivative is replaced by the difference of twofirst derivatives
Again, since this method does not depend on values of f directly, it can beregarded as a method for solving fx≡ gx = 0 Viewed in this way the method,which is illustrated in Fig 8.6, takes the form
xk+1= xk− gxk
xk− xk−1gxk− gxk−1
We next investigate the order of convergence of the method of false positionand discover that it is order 1 1618, the golden mean
such that gx∗= 0, gx∗ = 0 Then for x0 sufficiently close to x∗, the sequence
xk k=0 generated by the method of false position (11) converges to x∗ with order 1 1618.
g
Fig 8.6 False position for solving equations
Trang 68.2 Line Search by Curve Fitting 223
Proof. Introducing the notation
= xk− x∗
gxk−1 xk− gxk x∗gxk−1 xk
Trang 7k= xk− x∗ we have, in the limit,
Having derived the error formula (17) by direct analysis, it is now appropriate
to point out a short-cut technique, based on symmetry and other considerations,that can sometimes be used in even more complicated situations The right side of
k k −1, since it is derived fromapproximations based on Taylor’s theorem Furthermore, it must be second order,since the method reduces to Newton’s method when xk= xk−1 Also, it must go
k k−1, since the order of points is
k +1 k k −1.
Cubic Fit
Given the points xk−1 and xk together with the values fxk−1, fxk−1, fxk,
fxk, it is possible to fit a cubic equation to the points having correspondingvalues The next point xk+1 can then be determined as the relative minimum point
of this cubic This leads to
Trang 88.2 Line Search by Curve Fitting 225
It can be shown (see Exercise 3) that the order of convergence of the cubic fitmethod is 2.0 Thus, although the method is exact for cubic functions indicatingthat its order might be three, its order is actually only two
Quadratic Fit
The scheme that is often most useful in line searching is that of fitting a quadraticthrough three given points This has the advantage of not requiring any derivativeinformation Given x1 x2 x3 and corresponding values fx1= f1 fx2=
f2 fx3= f3we construct the quadratic passing through these points
and determine a new point x4as the point where the derivative of q vanishes Thus
x4=12
b23f1+ b31f2+ b12f3
a23f1+ a31f2+ a12f3 (21)where aij= xi− xj bij= x2
i− x2
j
1 2 3 It must be second order (since it is a quadratic fit)
1 2 3is zero (The reader shouldcheck this.) Finally, it must be symmetric (since the order of points is relevant) Itfollows that near a minimum point x∗of f , the errors are related approximately by
where M depends on the values of the second and third derivatives of f at x∗
k→ 0 with an order greater than unity, then for large k theerror is governed approximately by
k +2 k k −1Letting yk kthis becomes
yk+2= yk+ yk−1with characteristic equation
3− − 1 = 0
The largest root of this equation is 13 which thus determines the rate of growth
of y and is the order of convergence of the quadratic fit method
Trang 9Approximate Methods
In practice line searches are terminated before they have converged to the actualminimum point In one method, for example, a fairly large value for x1 is chosenand this value is successively reduced by a positive factor < 1 until a sufficientdecrease in the function value is obtained Approximate methods and suitablestopping criteria are discussed in Section 8.5
Above, we analyzed the convergence of various curve fitting procedures in theneighborhood of the solution point If, however, any of these procedures wereapplied in pure form to search a line for a minimum, there is the danger—alas, themost likely possibility—that the process would diverge or wander about meaning-lessly In other words, the process may never get close enough to the solution forour detailed local convergence analysis to be applicable It is therefore important toartfully combine our knowledge of the local behavior with conditions guaranteeingglobal convergence to yield a workable and effective procedure
The key to guaranteeing global convergence is the Global ConvergenceTheorem of Chapter 7 Application of this theorem in turn hinges on theconstruction of a suitable descent function and minor modifications of a pure curvefitting algorithm We offer below a particular blend of this kind of constructionand analysis, taking as departure point the quadratic fit procedure discussed inSection 8.2 above
Let us assume that the function f that we wish to minimize is strictly unimodaland has continuous second partial derivatives We initiate our search procedure bysearching along the line until we find three points x1 x2 x3with x1< x2< x3suchthat fx1 fx2 fx3 In other words, the value at the middle of these threepoints is less than that at either end Such a sequence of points can be determined
in a number of ways—see Exercise 7
The main reason for using points having this pattern is that a quadratic fit tothese points will have a minimum (rather than a maximum) and the minimum pointwill lie in the interval x1 x3 See Fig 8.7 We modify the pure quadratic fit
algorithm so that it always works with points in this basic three-point pattern.
The point x4is calculated from the quadratic fit in the standard way and fx4
is measured Assuming (as in the figure) that x2< x4< x3, and accounting for theunimodal nature of f , there are but two possibilities:
Trang 108.3 Global Convergence of Curve Fitting 227
To prove convergence, we note that each three-point pattern can be thought
of as defining a vector x in E3 Corresponding to an x= x1 x2 x3 such that
x1 x2 x3 form a three-point pattern with respect to f , we define Ax=
¯x1 ¯x2 ¯x3 as discussed above For completeness we must consider the case wheretwo or more of the xi i= 1 2 3 are equal, since this may occur The appropriatedefinitions are simply limiting cases of the earlier ones For example, if x1= x2,then x1 x2 x3 form a three-point pattern if fx2 fx3 and fx2 < 0 (which
is the limiting case of fx2 < fx1) A quadratic is fit in this case by using thevalues at the two distinct points and the derivative at the duplicated point In case
x1= x2= x3 x1 x2 x3 forms a three-point pattern if fx2= 0 and fx
2 0
With these definitions, the map A is well defined It is also continuous, since curve
fitting depends continuously on the data
We next define the solution set ⊂ E3as the points x∗= x∗ x∗ x∗ where
fx∗= 0
Finally, we let Zx= fx1+fx2+fx3 It is easy to see that Z is a descent
function for A After application of A one of the values fx1 fx2 fx3 will
be replaced by fx4, and by construction, and the assumption that f is unimodal,
it will replace a strictly larger value Of course, at x∗= x∗ x∗ x∗ we have
Ax∗= x∗and hence ZAx∗= Zx∗.
Since all points are contained in the initial interval, we have all the requirementsfor the Global Convergence Theorem Thus the process converges to the solution
Trang 11The order of convergence may not be destroyed by this modification, if near thesolution the three-point pattern is always formed from the previous three points Inthis case we would still have convergence of order 1.3 This cannot be guaranteed,however.
It has often been implicitly suggested, and accepted, that when using thequadratic fit technique one should require
fxk+1 < fxk
so as to guarantee convergence If the inequality is not satisfied at some cycle, then aspecial local search is used to find a better xk+1that does satisfy it This philosophy
amounts to taking Zx= fx3 in our general framework and, unfortunately, this
is not a descent function even for unimodal functions, and hence the special localsearch is likely to be necessary several times It is true, of course, that a similarspecial local search may, occasionally, be required for the technique we suggest inregions of multiple minima, but it is never required in a unimodal region
The above construction, based on the pure quadratic fit technique, can beemulated to produce effective procedures based on other curve fitting techniques.For application to smooth functions these techniques seem to be the best available interms of flexibility to accommodate as much derivative information as is available,fast convergence, and a guarantee of global convergence
ALGORITHMS
Since searching along a line for a minimum point is a component part of mostnonlinear programming algorithms, it is desirable to establish at once that thisprocedure is closed; that is, that the end product of the iterative procedures outlinedabove, when viewed as a single algorithmic step finding a minimum along a line,define closed algorithms That is the objective of this section
To initiate a line search with respect to a function f , two vectors must be
specified: the initial point x and the direction d in which the search is to be made The result of the search is a new point Thus we define the search algorithm S as a
mapping from E2nto En
We assume that the search is to be made over the semi-infinite line emanating
from x in the direction d We also assume, for simplicity, that the search is not
made in vain; that is, we assume that there is a minimum point along the line This
will be the case, for instance, if f is continuous and increases without bound as x
tends toward infinity
Definition The mapping S E2n→ En is defined by
Sx d = y y = x + d for some 0 fy = min
0 fx (23)
In some cases there may be many vectors y yielding the minimum, so S is a set-valued mapping We must verify that S is closed.
Trang 128.4 Closedness of Line Search Algorithms 229
closed at (x, d) if d = 0.
also that yk∈ Sxk dk and that yk→ y We must show that y ∈ Sx d.
For each k we have yk= xk+ kdkfor some k From this we may write
k= k− xk
k
Taking the limit of the right-hand side of the above, we see that
k→ ≡
It then follows that y = x + d It still remains to be shown that y ∈ Sx d.
For each k and each 0 < ,
The requirement that d = 0 is natural both theoretically and practically From
a practical point of view this condition implies that, when constructing algorithms,
the choice d = 0 had better occur only in the solution set; but it is clear that if
d = 0, no search will be made Theoretically, the map S can fail to be closed at
d = 0, as illustrated below.
Example. On E1define fx= x−12 Then Sx d is not closed at x= 0 d=0
To see this we note that for any d > 0
Trang 138.5 INACCURATE LINE SEARCH
In practice, of course, it is impossible to obtain the exact minimum point called for
by the ideal line search algorithm S described above As a matter of fact, it is often
desirable to sacrifice accuracy in the line search routine in order to conserve overallcomputation time Because of these factors we must, to be realistic, be certain, atevery stage of development, that our theory does not crumble if inaccurate linesearches are introduced
Inaccuracy generally is introduced in a line search algorithm by simply nating the search procedure before it has converged The exact nature of theinaccuracy introduced may therefore depend on the particular search techniqueemployed and the criterion used for terminating the search We cannot develop atheory that simultaneously covers every important version of inaccuracy withoutseriously detracting from the underlying simplicity of the algorithms discussedlater For this reason our general approach, which is admittedly more free-wheeling
termi-in spirit than necessary but thereby more transparent and less encumbered than adetailed account of inaccuracy, will be to analyze algorithms as if an accurate linesearch were made at every step, and then point out in side remarks and exercisesthe effect of inaccuracy
In the remainder of this section we present some commonly used criteria forterminating a line search
Percentage Test
One important inaccurate line search algorithm is the one that determines thesearch parameter to within a fixed percentage of its true value Specifically, aconstant c 0 < c < 1 is selected (c= 010 is reasonable) and the parameter
in the line search is found so as to satisfy
minimizing value of the parameter This criterion is easy to use in conjunctionwith the standard iterative search techniques described in the first sections of thischapter For example, in the case of the quadratic fit technique using three-pointpatterns applied to a unimodal function, at each stage it is known that the trueminimum point lies in the interval spanned by the three-point pattern, and hence
a bound on the maximum possible fractional error at that stage is easily deduced.One iterates until this bound is no greater than c It can be shown (see Exercise 13)that this algorithm is closed
Armijo’s Rule
A practical and popular criterion for terminating a line search is Armijo’s rule Theessential idea is that the rule should first guarantee that the selected is not toolarge, and next it should not be too small Let us define the function
= fxk + dk
Armijo’s rule is implemented by consideration of the function 0 0 for