15.3 A SIMPLE MERIT FUNCTION It is very natural, when considering the system of necessary conditions 2, to formthe function mx =1 2fx + T hx2+1 and use it as a measure of how close a poi
Trang 1of a minimization problem, but the sign may be reversed in some definitions.
For primal–dual methods, the merit function may depend on both x and
One especially useful merit function for equality constrained problems is
mx =1
2fx + T hx2+1
2hx2
It is examined in the next section
We shall examine other merit functions later in the chapter With interior pointmethods or semidefinite programming, we shall use a potential function thatserves as a merit function
2 Active Set Methods Inequality constraints can be treated using active set
methods that treat the active constraints as equality constraints, at least for
the current iteration However, in primal–dual methods, both x and are
changed We shall consider variations of steepest descent, conjugate directions,
and Newton’s method where movement is made in the x space.
3 Penalty Functions In some primal–dual methods, a penalty function can serve
as a merit function, even though the penalty function depends only on x This
is particularly attractive for recursive quadratic programming methods where aquadratic program is solved at each stage to determine the direction of change
in the pair x
4 Interior (Barrier) Methods Barrier methods lead to methods that move within
the relative interior of the inequality constraints This approach leads to theconcept of the primal–dual central path These methods are used for semidefiniteprogramming since these problems are characterized as possessing a specialform of inequality constraint
15.3 A SIMPLE MERIT FUNCTION
It is very natural, when considering the system of necessary conditions (2), to formthe function
mx =1
2fx + T hx2+1
and use it as a measure of how close a point x is to a solution.
It must be noted, however, that the function mx is not always well-behaved;
it may have local minima, and these are of no value in a search for a solution The
following theorem gives the conditions under which the function mx can serve
as a well-behaved merit function Basically, the main requirement is that the Hessian
of the Lagrangian be positive definite As usual, we define lx = fx+Thx
Theorem Let f and h be twice continuously differentiable functions on Enof
dimension 1 and m, respectively Suppose that x∗and ∗satisfy the first-order
necessary conditions for a local minimum of mx =1fx+T hx2+
Trang 215.3 A Simple Merit Function 473
1
2hx2 with respect to x and Suppose also that at x∗, ∗, (i) the rank
of hx∗ is m and (ii) the Hessian matrix Lx∗ ∗= Fx∗+ ∗THx∗ is
positive definite Then, x∗, ∗is a (possibly nonunique) global minimum point
f is strictly convex and h is linear Furthermore, even in nonconvex problems one
can often arrange for this condition to hold, at least near a solution to the originalconstrained minimization problem If it is assumed that the second-order sufficiency
conditions for a constrained minimum hold at x∗ ∗, then Lx∗ ∗ is positivedefinite on the subspace that defines the tangent to the constraints; that is, on the
subspace defined by hx∗x = 0 Now if the original problem is modified with a
penalty term to the problem
minimize fx+1
2chx2subject to hx = 0
An extension to problems with inequality constraints can be defined by
parti-tioning the constraints into the two groups active and inactive However, at this
point the simple merit function for problems with equality constraints is adequatefor the purpose of illustrating the general idea
† Unless explicitly indicated to the contrary, the notation lx refers to the gradient of
l with respect to x, that is, lx .
Trang 315.4 BASIC PRIMAL–DUAL METHODS
Many primal–dual methods are patterned after some of the methods used in earlierchapters, except of course that the emphasis is on equation solving rather thanexplicit optimization
First-Order Method
We consider first a simple straightforward approach, which in a sense parallelsthe idea of steepest descent in that it uses only a first-order approximation to theprimal–dual equations It is defined by
xk+1= xk k lxk kT
kis not yet determined This is based on the error in satisfying (2) Assume
that the Hessian of the Lagrangian Lx is positive definite in some compact
region of interest, and consider the simple merit function
function, unless lx kto minimize the merit function
in the search direction at each step, the process will converge to a point where
lx = 0 However, there is no guarantee that hx = 0 at that point.
We can try to improve the method either by changing the way in whichthe direction is selected or by changing the merit function In this case a slightmodification of the merit function will work Let
Trang 415.4 Basic Primal–Dual Methods 475
for some > 0 We then calculate that the gradient of w has the two components
corresponding to x and
lx Lx + hxT
hx − lx
lx hxT− hxTand hence the inner product of the gradient with the direction−lx T hx is
T− hx2
Now since we are assuming that Lx is positive definite in a compact region of interest, there is a > 0 such that Lx − I is positive definite in this region.
Then according to the above calculation, the direction−lx T hx is a descent
direction, and the standard descent method will converge to a solution This methodwill not converge very rapidly however (See Exercise 2 for further analysis of thismethod.)
As discussed in the previous section, this problem is equivalent to solving a system
of linear equations whose coefficient matrix is
This matrix is symmetric, but it is not positive definite (nor even semidefinite).
However, it is possible to formally generalize the conjugate gradient method tosystems of this type by just applying the conjugate-gradient formulae (17)–(20) of
Section 9.3 with Q replaced by M A difficulty is that singular directions (defined
as directions p such that pTMp= 0) may occur and cause the process to break down.Procedures for overcoming this difficulty have been developed, however Also,
as in the ordinary conjugate gradient method, the approach can be generalized totreat nonquadratic problems as well Overall, however, the application of conjugatedirection methods to the Lagrange system of equations, although very promising,
is not currently considered practical
Trang 5by solving the linearized version recursively That is, given xk k the new point
xk+1 k+1is determined from the equations
The Newton equations have some important structural properties First, we
observe that by adding hxkTk to the top equation, the system can be formed to the form
to be nonsingular, it is possible to alter the original problem by incorporation of
a quadratic penalty term so that the new Hessian of the Lagrangian is Lx +
c hxT hx For sufficiently large c, this new Hessian will be positive definite
over the entire space
If Lx is positive definite (either originally or through the incorporation
of a penalty term), it is possible to write an explicit expression for the solution of
Trang 615.4 Basic Primal–Dual Methods 477
the system (21) Let us define Lk= Lxk k Ak= hxk lk= lxk kT hk=
hxk The system then takes the form
of nonlinear equations that are applicable to the system (19) These results statethat if the linearized system is nonsingular at the solution (as is implied by ourassumptions) and if the initial point is sufficiently close to the solution, the methodwill in fact converge to the solution and the convergence will be of order at least two
To guarantee convergence from remote initial points and hence be more broadlyapplicable, it is desirable to use the method as a descent process Fortunately, wecan show that the direction generated by Newton’s method is a descent directionfor the simple merit function
This is strictly negative unless both lk= 0 and hk= 0 Thus Newton’s method has
desirable global convergence properties when executed as a descent method withvariable step size
Note that the calculation above does not employ the explicit formulae (24)
and (25), and hence it is not necessary that Lx be positive definite, as long as
the system (21) is invertible We summarize the above discussion by the followingtheorem
Theorem. Define the Newton process by
Trang 7Assume that dk ykexist and that the points generated lie in a compact set Then any limit point of these points satisfies the first-order necessary conditions for
a solution to the constrained minimization problem (1).
Proof. Most of this follows from the above observations and the Global gence Theorem The one-dimensional search process is well-defined, since the meritfunction m is bounded below
Conver-In view of this result, it is worth pursuing Newton’s method further We wouldlike to extend it to problems with inequality constraints We would also like to
avoid the necessity of evaluating Lxk k at each step and to consider alternativemerit functions—perhaps those that might distinguish a local maximum from alocal minimum, which the simple merit function does not do These considerationsguide the developments of the next several sections
Relation to Quadratic Programming
It is clear from the development of the preceding discussion that Newton’s method
is closely related to quadratic programming with equality constraints We explorethis relationship more fully here, which will lead to a generalization of Newton’smethod to problems with inequality constraints
Consider the problem
The first-order necessary conditions of this problem are exactly (21), or equivalently
(23), where yk corresponds to the Lagrange multiplier of (26) Thus, the solution
of (26) produces a Newton step
Alternatively, we may consider the quadratic program
corre-by merely subtracting TkAkdkfrom the objective function; and this change has no
influence on dk, since Akdkis fixed
The connection with quadratic programming suggests a procedure for extendingNewton’s method to minimization problems with inequality constraints Considerthe problem
minimize fx
subject to hx = 0
gx 0
Trang 815.5 Modified Newton Methods 479
Given an estimated solution point xk and estimated Lagrange multipliers k k,one solves the quadratic program
kGxk hk= hxk gk= gxk The new point is
determined by xk+1= xk+ dk, and the new Lagrange multipliers are the Lagrangemultipliers of the quadratic program (28) This is the essence of an early method fornonlinear programming termed SOLVER It is a very attractive procedure, since itapplies directly to problems with inequality as well as equality constraints withoutthe use of an active set strategy (although such a strategy might be used to solvethe required quadratic program) Methods of this general type, where a quadratic
program is solved at each step, are referred to as recursive quadratic programming
methods, and several variations are considered in this chapter
As presented here the recursive quadratic programming method extendsNewton’s method to problems with inequality constraints, but the method has limita-tions The quadratic program may not always be well-defined, the method requiressecond-order derivative information, and the simple merit function is not a descentfunction for the case of inequalities Of these, the most serious is the requirement
of second-order information, and this is addressed in the next section
15.5 MODIFIED NEWTON METHODS
A modified Newton method is based on replacing the actual linearized system by
The basic equations for Newton’s method can be written
LkAT k
A 0
−1
lkh
Trang 9
where as before Lkis the Hessian of the Lagrangian, Ak= hxk lk k+
Tk hxkT hk= hxk A structured modified Newton method is a method of the
BkAT k
Of course the method is implemented by solving the system
Then xk+1= xk+ dk, and k+1 is found directly as a solution to system (32)
There are, of course, various ways to choose the approximation Bk One is touse a fixed, constant matrix throughout the iterative process A second is to base
Bkon some readily accessible information in Lxk k, such as setting Bkequal to
the diagonal of Lxk k Finally, a third possibility is to update Bk using one ofthe various quasi-Newton formulae
One important advantage of the structured method is that Bk can be taken to
be positive definite even though Lk is not If this is done, we can write the explicitsolution
Trang 10quadratic program is k+1 The equivalence of (35) and (36) leads to a recursive
quadratic programming method, where at each xk the quadratic program (35) is
solved to determine the direction dk In this case an arbitrary symmetric matrix Bk
is used in place of the Hessian of the Lagrangian Note that the problem (35) does
not explicitly depend on k; but Bk, often being chosen to approximate the Hessian
of the Lagrangian, may depend on k
As before, a principal advantage of the quadratic programming formulation
is that there is an obvious extension to problems with inequality constraints: Onesimply employs a linearized version of the inequalities
15.6 DESCENT PROPERTIES
In order to ensure convergence of the structured modified Newton methods of theprevious section, it is necessary to find a suitable merit function—a merit functionthat is compatible with the direction-finding algorithm in the sense that it decreasesalong the direction generated We must abandon the simple merit function at this
point, since it is not compatible with these methods when Bk= Lk However, twoother penalty functions considered earlier, the absolute-value exact penalty function
and the quadratic penalty function, are compatible with the modified Newton
approach
Absolute-Value Penalty Function
Let us consider the constrained minimization problem
minimize fx
where gx is r-dimensional For notational simplicity we consider the case of
inequality constraints only, since it is, in fact, the most difficult case The extension
to equality constraints is straightforward In accordance with the recursive quadratic
programming approach, given a current point x, we select the direction of movement
dby solving the quadratic programming problem
Trang 11where B is positive definite.
The first-order necessary conditions for a solution to this quadratic programare
Note that if the solution to the quadratic program has d = 0, then the point x,
together with from (39), satisfies the first-order necessary conditions for the
original minimization problem (37) The following proposition is the fundamentalresult concerning the compatibility of the absolute-value penalty function and thequadratic programming method for determining the direction of movement
Proposition 1 Let d, (with d = 0) be a solution of the quadratic program
gjx+
j∈Jx
gjxd
j∈Jx
Trang 12estab-on the descent property.
Theorem Let B be positive definite and assume that throughout some compact
region ⊂ En, the quadratic program (38) has a unique solution d, such that
at each point the Lagrange multipliers satisfy max
j j c Let the sequence
xk be generated by
xk+1= xk kdk
where dkis the solution to (38) at xk kminimizes Pxk+1 Assume
that each xk∈ Then every limit point ¯x of xk satisfies the first-order
necessary conditions for the constrained minimization problem (37).
Proof. The solution to a quadratic program depends continuously on the data,and hence the direction determined by the quadratic program (38) is a continuous
function of x The function Px is also continuous, and by Proposition 1, it follows
that P is a descent function at every point that does not satisfy the first-orderconditions The result thus follows from the Global Convergence Theorem
In view of the above result, recursive quadratic programming in conjunctionwith the absolute-value penalty function is an attractive technique There are,however, some difficulties to be kept in mind First, the selection of the parameter
k requires a one-dimensional search with respect to a nondifferentiable function.Thus the efficient curve-fitting search methods of Chapter 8 cannot be used withoutsignificant modification Second, use of the absolute-value function requires anestimate of an upper bound for j’s, so that c can be selected properly In someapplications a suitable bound can be obtained from previous experience, but ingeneral one must develop a method for revising the estimate upward when necessary
... class="text_page_counter">Trang 2< /span>15.3 A Simple Merit Function 473
1
2< /small>hx2< /small>... employ the explicit formulae (24 )
and (25 ), and hence it is not necessary that Lx be positive definite, as long as
the system (21 ) is invertible We summarize... xk+ dk, and the new Lagrange multipliers are the Lagrangemultipliers of the quadratic program (28 ) This is the essence of an early method fornonlinear programming termed SOLVER