David G. Luenberger, Yinyu Ye - Linear and Nonlinear Programming International Series Episode 2 Part 10 potx

15.3 A SIMPLE MERIT FUNCTION It is very natural, when considering the system of necessary conditions 2, to formthe function mx =1 2fx + T hx2+1 and use it as a measure of how close a poi

Trang 1

of a minimization problem, but the sign may be reversed in some definitions.

For primal–dual methods, the merit function may depend on both x and

One especially useful merit function for equality constrained problems is

mx =1

2fx + T hx2+1

2hx2

It is examined in the next section

We shall examine other merit functions later in the chapter With interior pointmethods or semidefinite programming, we shall use a potential function thatserves as a merit function

2 Active Set Methods Inequality constraints can be treated using active set

methods that treat the active constraints as equality constraints, at least for

the current iteration However, in primal–dual methods, both x and are

changed We shall consider variations of steepest descent, conjugate directions,

and Newton’s method where movement is made in the x space.

3 Penalty Functions In some primal–dual methods, a penalty function can serve

as a merit function, even though the penalty function depends only on x This

is particularly attractive for recursive quadratic programming methods where aquadratic program is solved at each stage to determine the direction of change

in the pair x

4 Interior (Barrier) Methods Barrier methods lead to methods that move within

the relative interior of the inequality constraints This approach leads to theconcept of the primal–dual central path These methods are used for semidefiniteprogramming since these problems are characterized as possessing a specialform of inequality constraint

15.3 A SIMPLE MERIT FUNCTION

It is very natural, when considering the system of necessary conditions (2), to formthe function

mx =1

2fx + T hx2+1

and use it as a measure of how close a point x is to a solution.

It must be noted, however, that the function mx is not always well-behaved;

it may have local minima, and these are of no value in a search for a solution The

following theorem gives the conditions under which the function mx can serve

as a well-behaved merit function Basically, the main requirement is that the Hessian

of the Lagrangian be positive definite As usual, we define lx = fx+Thx

Theorem Let f and h be twice continuously differentiable functions on Enof

dimension 1 and m, respectively Suppose that x∗and ∗satisfy the first-order

necessary conditions for a local minimum of mx =1fx+T hx2+

Trang 2

15.3 A Simple Merit Function 473

1

2hx2 with respect to x and Suppose also that at x∗, ∗, (i) the rank

of hx∗ is m and (ii) the Hessian matrix Lx∗ ∗= Fx∗+ ∗THx∗ is

positive definite Then, x∗, ∗is a (possibly nonunique) global minimum point

f is strictly convex and h is linear Furthermore, even in nonconvex problems one

can often arrange for this condition to hold, at least near a solution to the originalconstrained minimization problem If it is assumed that the second-order sufficiency

conditions for a constrained minimum hold at x∗ ∗, then Lx∗ ∗ is positivedefinite on the subspace that defines the tangent to the constraints; that is, on the

subspace defined by hx∗x = 0 Now if the original problem is modified with a

penalty term to the problem

minimize fx+1

2chx2subject to hx = 0

An extension to problems with inequality constraints can be defined by

parti-tioning the constraints into the two groups active and inactive However, at this

point the simple merit function for problems with equality constraints is adequatefor the purpose of illustrating the general idea

† Unless explicitly indicated to the contrary, the notation lx refers to the gradient of

l with respect to x, that is, lx .

Trang 3

15.4 BASIC PRIMAL–DUAL METHODS

Many primal–dual methods are patterned after some of the methods used in earlierchapters, except of course that the emphasis is on equation solving rather thanexplicit optimization

First-Order Method

We consider first a simple straightforward approach, which in a sense parallelsthe idea of steepest descent in that it uses only a first-order approximation to theprimal–dual equations It is defined by

xk+1= xk k lxk kT

kis not yet determined This is based on the error in satisfying (2) Assume

that the Hessian of the Lagrangian Lx is positive definite in some compact

region of interest, and consider the simple merit function

function, unless lx kto minimize the merit function

in the search direction at each step, the process will converge to a point where

 lx = 0 However, there is no guarantee that hx = 0 at that point.

We can try to improve the method either by changing the way in whichthe direction is selected or by changing the merit function In this case a slightmodification of the merit function will work Let

Trang 4

15.4 Basic Primal–Dual Methods 475

for some > 0 We then calculate that the gradient of w has the two components

corresponding to x and

 lx Lx + hxT

 hx − lx

 lx hxT− hxTand hence the inner product of the gradient with the direction−lx T hx is

T− hx2

Now since we are assuming that Lx is positive definite in a compact region of interest, there is a > 0 such that Lx − I is positive definite in this region.

Then according to the above calculation, the direction−lx T hx is a descent

direction, and the standard descent method will converge to a solution This methodwill not converge very rapidly however (See Exercise 2 for further analysis of thismethod.)

As discussed in the previous section, this problem is equivalent to solving a system

of linear equations whose coefficient matrix is

This matrix is symmetric, but it is not positive definite (nor even semidefinite).

However, it is possible to formally generalize the conjugate gradient method tosystems of this type by just applying the conjugate-gradient formulae (17)–(20) of

Section 9.3 with Q replaced by M A difficulty is that singular directions (defined

as directions p such that pTMp= 0) may occur and cause the process to break down.Procedures for overcoming this difficulty have been developed, however Also,

as in the ordinary conjugate gradient method, the approach can be generalized totreat nonquadratic problems as well Overall, however, the application of conjugatedirection methods to the Lagrange system of equations, although very promising,

is not currently considered practical

Trang 5

by solving the linearized version recursively That is, given xk k the new point

xk+1 k+1is determined from the equations

The Newton equations have some important structural properties First, we

observe that by adding hxkTk to the top equation, the system can be formed to the form

to be nonsingular, it is possible to alter the original problem by incorporation of

a quadratic penalty term so that the new Hessian of the Lagrangian is Lx +

c hxT hx For sufficiently large c, this new Hessian will be positive definite

over the entire space

If Lx is positive definite (either originally or through the incorporation

of a penalty term), it is possible to write an explicit expression for the solution of

Trang 6

15.4 Basic Primal–Dual Methods 477

the system (21) Let us define Lk= Lxk k Ak= hxk lk= lxk kT hk=

hxk The system then takes the form

of nonlinear equations that are applicable to the system (19) These results statethat if the linearized system is nonsingular at the solution (as is implied by ourassumptions) and if the initial point is sufficiently close to the solution, the methodwill in fact converge to the solution and the convergence will be of order at least two

To guarantee convergence from remote initial points and hence be more broadlyapplicable, it is desirable to use the method as a descent process Fortunately, wecan show that the direction generated by Newton’s method is a descent directionfor the simple merit function

This is strictly negative unless both lk= 0 and hk= 0 Thus Newton’s method has

desirable global convergence properties when executed as a descent method withvariable step size

Note that the calculation above does not employ the explicit formulae (24)

and (25), and hence it is not necessary that Lx be positive definite, as long as

the system (21) is invertible We summarize the above discussion by the followingtheorem

Theorem. Define the Newton process by

Trang 7

Assume that dk ykexist and that the points generated lie in a compact set Then any limit point of these points satisfies the first-order necessary conditions for

a solution to the constrained minimization problem (1).

Proof. Most of this follows from the above observations and the Global gence Theorem The one-dimensional search process is well-defined, since the meritfunction m is bounded below

Conver-In view of this result, it is worth pursuing Newton’s method further We wouldlike to extend it to problems with inequality constraints We would also like to

avoid the necessity of evaluating Lxk k at each step and to consider alternativemerit functions—perhaps those that might distinguish a local maximum from alocal minimum, which the simple merit function does not do These considerationsguide the developments of the next several sections

Relation to Quadratic Programming

It is clear from the development of the preceding discussion that Newton’s method

is closely related to quadratic programming with equality constraints We explorethis relationship more fully here, which will lead to a generalization of Newton’smethod to problems with inequality constraints

Consider the problem

The first-order necessary conditions of this problem are exactly (21), or equivalently

(23), where yk corresponds to the Lagrange multiplier of (26) Thus, the solution

of (26) produces a Newton step

Alternatively, we may consider the quadratic program

corre-by merely subtracting TkAkdkfrom the objective function; and this change has no

influence on dk, since Akdkis fixed

The connection with quadratic programming suggests a procedure for extendingNewton’s method to minimization problems with inequality constraints Considerthe problem

minimize fx

subject to hx = 0

gx 0

Trang 8

15.5 Modiﬁed Newton Methods 479

Given an estimated solution point xk and estimated Lagrange multipliers k k,one solves the quadratic program

kGxk hk= hxk gk= gxk The new point is

determined by xk+1= xk+ dk, and the new Lagrange multipliers are the Lagrangemultipliers of the quadratic program (28) This is the essence of an early method fornonlinear programming termed SOLVER It is a very attractive procedure, since itapplies directly to problems with inequality as well as equality constraints withoutthe use of an active set strategy (although such a strategy might be used to solvethe required quadratic program) Methods of this general type, where a quadratic

program is solved at each step, are referred to as recursive quadratic programming

methods, and several variations are considered in this chapter

As presented here the recursive quadratic programming method extendsNewton’s method to problems with inequality constraints, but the method has limita-tions The quadratic program may not always be well-defined, the method requiressecond-order derivative information, and the simple merit function is not a descentfunction for the case of inequalities Of these, the most serious is the requirement

of second-order information, and this is addressed in the next section

15.5 MODIFIED NEWTON METHODS

A modified Newton method is based on replacing the actual linearized system by

The basic equations for Newton’s method can be written

LkAT k

A 0

−1

lkh

Trang 9

where as before Lkis the Hessian of the Lagrangian, Ak= hxk lk k+

Tk hxkT hk= hxk A structured modified Newton method is a method of the

BkAT k

Of course the method is implemented by solving the system

Then xk+1= xk+ dk, and k+1 is found directly as a solution to system (32)

There are, of course, various ways to choose the approximation Bk One is touse a fixed, constant matrix throughout the iterative process A second is to base

Bkon some readily accessible information in Lxk k, such as setting Bkequal to

the diagonal of Lxk k Finally, a third possibility is to update Bk using one ofthe various quasi-Newton formulae

One important advantage of the structured method is that Bk can be taken to

be positive definite even though Lk is not If this is done, we can write the explicitsolution

Trang 10

quadratic program is k+1 The equivalence of (35) and (36) leads to a recursive

quadratic programming method, where at each xk the quadratic program (35) is

solved to determine the direction dk In this case an arbitrary symmetric matrix Bk

is used in place of the Hessian of the Lagrangian Note that the problem (35) does

not explicitly depend on k; but Bk, often being chosen to approximate the Hessian

of the Lagrangian, may depend on k

As before, a principal advantage of the quadratic programming formulation

is that there is an obvious extension to problems with inequality constraints: Onesimply employs a linearized version of the inequalities

15.6 DESCENT PROPERTIES

In order to ensure convergence of the structured modified Newton methods of theprevious section, it is necessary to find a suitable merit function—a merit functionthat is compatible with the direction-finding algorithm in the sense that it decreasesalong the direction generated We must abandon the simple merit function at this

point, since it is not compatible with these methods when Bk= Lk However, twoother penalty functions considered earlier, the absolute-value exact penalty function

and the quadratic penalty function, are compatible with the modified Newton

approach

Absolute-Value Penalty Function

Let us consider the constrained minimization problem

minimize fx

where gx is r-dimensional For notational simplicity we consider the case of

inequality constraints only, since it is, in fact, the most difficult case The extension

to equality constraints is straightforward In accordance with the recursive quadratic

programming approach, given a current point x, we select the direction of movement

dby solving the quadratic programming problem

Trang 11

where B is positive definite.

The first-order necessary conditions for a solution to this quadratic programare

Note that if the solution to the quadratic program has d = 0, then the point x,

together with from (39), satisfies the first-order necessary conditions for the

original minimization problem (37) The following proposition is the fundamentalresult concerning the compatibility of the absolute-value penalty function and thequadratic programming method for determining the direction of movement

Proposition 1 Let d, (with d = 0) be a solution of the quadratic program

gjx+

j∈Jx

 gjxd

j∈Jx

Trang 12

estab-on the descent property.

Theorem Let B be positive definite and assume that throughout some compact

region ⊂ En, the quadratic program (38) has a unique solution d, such that

at each point the Lagrange multipliers satisfy max

j j c Let the sequence

xk be generated by

xk+1= xk kdk

where dkis the solution to (38) at xk kminimizes Pxk+1 Assume

that each xk∈ Then every limit point ¯x of xk satisfies the first-order

necessary conditions for the constrained minimization problem (37).

Proof. The solution to a quadratic program depends continuously on the data,and hence the direction determined by the quadratic program (38) is a continuous

function of x The function Px is also continuous, and by Proposition 1, it follows

that P is a descent function at every point that does not satisfy the first-orderconditions The result thus follows from the Global Convergence Theorem

In view of the above result, recursive quadratic programming in conjunctionwith the absolute-value penalty function is an attractive technique There are,however, some difficulties to be kept in mind First, the selection of the parameter

k requires a one-dimensional search with respect to a nondifferentiable function.Thus the efficient curve-fitting search methods of Chapter 8 cannot be used withoutsignificant modification Second, use of the absolute-value function requires anestimate of an upper bound for j’s, so that c can be selected properly In someapplications a suitable bound can be obtained from previous experience, but ingeneral one must develop a method for revising the estimate upward when necessary

15.3 A Simple Merit Function 473

1

2< /small>hx2< /small>... employ the explicit formulae (24 )

and (25 ), and hence it is not necessary that Lx be positive definite, as long as

the system (21 ) is invertible We summarize... xk+ dk, and the new Lagrange multipliers are the Lagrangemultipliers of the quadratic program (28 ) This is the essence of an early method fornonlinear programming termed SOLVER

Định dạng
Số trang	25
Dung lượng	417,75 KB