David G. Luenberger, Yinyu Ye - Linear and Nonlinear Programming International Series Episode 2 Part 9 pdf

The dual canonical convergence rate associated with the original constrained problem, which is the rate of convergence of steepest ascent applied to the dual, is B− b2/B+ b2 where b and

Trang 1

14.4 Separable Problems 447

that while the convergence of primal methods is governed by the restriction of L∗

to M, the convergence of dual methods is governed by a restriction of L∗−1 tothe orthogonal complement of M

The dual canonical convergence rate associated with the original constrained

problem, which is the rate of convergence of steepest ascent applied to the dual,

is B− b2/B+ b2 where b and B are, respectively, the smallest and largesteigenvalues of

minimize fx

subject to hx = 0

If a change of primal variables x is introduced, the primal rate will in general change

but the dual rate will not On the other hand, if the constraints are transformed (by

replacing them by Thx = 0 where T is a nonsingular m×m matrix), the dual rate

will change but the primal rate will not

A structure that arises frequently in mathematical programming applications is that

of the separable problem:

Trang 2

of functions of the individual groups For each i, the functions fi, hi, and gi aretwice continuously differentiable functions of dimensions 1, m, and p, respectively.

Example 1. Suppose that we have a fixed budget of, say, A dollars that may beallocated among n activities If xi dollars is allocated to the ith activity, then therewill be a benefit (measured in some units) of fixi To obtain the maximum benefitwithin our budget, we solve the separable problem

In the example x is partitioned into its individual components.

Example 2. Problems involving a series of decisions made at distinct times areoften separable For illustration, consider the problem of scheduling water releasethrough a dam to produce as much electric power as possible over a given timeinterval while satisfying constraints on acceptable water levels A discrete-timemodel of this problem is to

In this example we consider x as the 2N -dimensional vector of unknowns

yk uk k= 1 2 N This vector is partitioned into the pairs xk =

yk uk The objective function is then clearly in separable form The

constraints can be viewed as being in the form (27) with hkxk having dimension

N and such that hkxk is identically zero except in the k and k+ 1 components

Decomposition

Separable problems are ideally suited to dual methods, because the required strained minimization decomposes into small subproblems To see this we recall

Trang 3

uncon-14.4 Separable Problems 449

that the generally most difficult aspect of a dual method is evaluation of the dual

function For a separable problem, if we associate with the equality constraints (27) and 0 with the inequality constraints (28), the required dual function is

Example 4. In Example 2 using duality with respect to the equality constraints wedenote the dual variables by k k= 1 2 N The kth subproblem becomes

k The kth subproblem can then be interpreted as that faced by an entrepreneur wholeased the dam for one period He can buy water for the dam at the beginning ofthe period at price k and sell what he has left at the end of the period at price

k+1 His problem is to determine yk and uk so that his net profit, accruingfrom sale of generated power and purchase and sale of water, is maximized

Example 5. (The hanging chain) Consider again the problem of finding theequilibrium position of the hanging chain considered in Example 4, Section 11.3,and Example 1, Section 12.7 The problem is

Trang 4

= 0

or

ci+ 21− y2

i= 2yi2This yields

Trang 5

14.5 Augmented Lagrangians 451

Table 14.1 Results of Dual of Chain Problem

Final solution

= −1000048Iteration Value = −6761136

One of the most effective general classes of nonlinear programming methods is

the augmented Lagrangian methods, alternatively referred to as multiplier methods.

These methods can be viewed as a combination of penalty functions and local dualitymethods; the two concepts work together to eliminate many of the disadvantagesassociated with either method alone

The augmented Lagrangian for the equality constrained problem

From a penalty function viewpoint the augmented Lagrangian, for a fixed value

of the vector , is simply the standard quadratic penalty function for the problem

minimize fx+ Thx

This problem is clearly equivalent to the original problem (31), since combinations

of the constraints adjoined to fx do not affect the minimum point or the minimum

value However, if the multiplier vector were selected equal to ∗, the correct

Trang 6

Lagrange multiplier, then the gradient of lcx ∗ would vanish at the solution x∗.

This is because lcx ∗= 0 implies fx + ∗T hx + chxhx = 0, which is satisfied by fx + ∗T hx= 0 and hx = 0 Thus the augmented

Lagrangian is seen to be an exact penalty function when the proper value of ∗ isused

A typical step of an augmented Lagrangian method starts with a vector k

Then xk is found as the minimum point of

To motivate the adjustment procedure, consider the constrained problem (32)

with = k The Lagrange multiplier corresponding to this problem is ∗− k,

where ∗ is the Lagrange multiplier of (31) On the other hand since (33) is thepenalty function corresponding to (32), it follows from the results of Section 13.3

that chxk is approximately equal to the Lagrange multiplier of (32) Combining

these two facts, we obtain chxk ∗− k Therefore, a good approximation to

the unknown ∗ is k+1= k+ chxk

Although the main iteration in augmented Lagrangian methods is with respect

to , the penalty parameter c may also be adjusted during the process As in ordinary

penalty function methods, the sequence of c’s is usually preselected; c is either heldfixed, is increased toward a finite value, or tends(slowly) toward infinity Since inthis method it is not necessary for c to go to infinity, and in fact it may remain

of relatively modest value, the ill-conditioning usually associated with the penaltyfunction approach is mediated

From the viewpoint of duality theory, the augmented Lagrangian is simply thestandard Lagrangian for the problem

2chx2 tends to “convexify” the Lagrangian For sufficientlylarge c, the Lagrangian will indeed be locally convex Thus the duality method can

be employed, and the corresponding dual problem can be solved by an iterative

process in This viewpoint leads to the development of additional multiplier

adjustment processes

Trang 7

14.5 Augmented Lagrangians 453

The Penalty Viewpoint

We begin our more detailed analysis of augmented Lagrangian methods by showingthat if the penalty parameter c is sufficiently large, the augmented Lagrangian has

a local minimum point near the true optimal point This follows from the followingsimple lemma

Lemma Let A and B be n × n symmetric matrices Suppose that B is positive semi-definite and that A is positive definite on the subspace Bx = 0.

Then there is a c∗such that for all c≥ c∗the matrix A +cB is positive definite.

Proof. Suppose to the contrary that for every k there were an xkwithxk = 1 such

that xT

kA + kBxk≤ 0 The sequence xk

converging to a limit x Now since xT

kBxk≥ 0, it follows that xTBx= 0 It also

follows that xTAx ≤ 0 However, this contradicts the hypothesis of the lemma.

This lemma applies directly to the Hessian of the augmented Lagrangian

evaluated at the optimal solution pair x∗, ∗ We assume as usual that the

second-order sufficiency conditions for a constrained minimum hold at x∗ ∗ The Hessian

of the augmented Lagrangian evaluated at the optimal pair x∗ ∗is

Lcx∗ ∗= Fx∗+ ∗THx∗+ chx∗T

 hx∗

= Lx∗+ chx∗T hx∗

The first term, the Hessian of the normal Lagrangian, is positive definite on the

subspace hx∗x = 0 This corresponds to the matrix A in the lemma The matrix

 hx∗T hx∗ is positive semi-definite and corresponds to B in the lemma It

follows that there is a c∗ such that for all c > c∗ Lcx∗ ∗ is positive definite.This leads directly to the first basic result concerning augmented Lagrangians

Proposition 1. Assume that the second-order sufficiency conditions for a local

minimum are satisfied at x∗ ∗ Then there is a c∗such that for all c≥ c∗, the

augmented Lagrangian lcx ∗ has a local minimum point at x∗.

By a continuity argument the result of the above proposition can be extended to

a neighborhood around x∗ ∗ That is, for any near ∗, the augmented Lagrangian

has a unique local minimum point near x∗ This correspondence defines a continuous

function If a value of can be found such that hx = 0, then that must in fact be ∗, since x satisfies the necessary conditions of the original problem Therefore, the problem of determining the proper value of can be viewed as one

of solving the equation hx= 0 For this purpose the iterative process

k+1= k+ chxk

is a method of successive approximation This process will converge linearly in a

neighborhood around ∗, although a rigorous proof is somewhat complex We shallgive more definite convergence results when we consider the duality viewpoint

Trang 8

Example 1. Consider the simple quadratic problem studied in Section 13.8

minimize 2x2+ 2xy + y2− 2ysubject to x= 0

The augmented Lagrangian for this problem is

lcx y = 2x2+ 2xy + y2− 2y + x +1

2cx

2

The minimum of this can be found analytically to be x= −2 + /2 + c y =

4+ c + /2 + c Since hx y = x in this example, it follows that the iterativeprocess for kis

k+1= k−c2+ k

2+ cor

k+1=

2

2+ c

k− 2c

2+ cThis converges to = −2 for any c > 0 The coefficient 2/2 +c governs the rate

of convergence, and clearly, as c is increased the rate improves

Geometric Interpretation

The augmented Lagrangian method can be interpreted geometrically in terms ofthe primal function in a manner analogous to that in Sections 13.3 and 13.8 forthe ordinary quadratic penalty function and the absolute-value penalty function

Consider again the primal function y defined as

y

where the minimum is understood to be taken locally near x∗ We remind the

reader that 0 = fx∗ and that 0T= −∗ The minimum of the augmented

Lagrangian at step k can be expressed in terms of the primal function as follows:

where the minimization with respect to y is to be taken locally near y = 0 This

minimization is illustrated geometrically for the case of a single constraint in

Trang 9

Fig 14.5 Primal function and augmented Lagrangian

Fig 14.5 The lower curve represents y, and the upper curve represents y+

1

2cy2 The minimum point yk of (30) occurs at the point where this upper curvehas slope equal to −k It is seen that for c sufficiently large this curve will beconvex at y= 0 If k is close to ∗, it is clear that this minimum point will beclose to 0; it will be exact if k= ∗.

The process for updating kis also illustrated in Fig 14.5 Note that in general,

if xk minimizes lcx k, then yk= hxk is the minimum point of y+ T

as shown in Fig 14.5 for the one-dimensional case In the figure the next point

yk+1is the point where y+1cy2has slope−k+1, which will yield a positive

Trang 10

value of yk+1 in this case It can be seen that if k is sufficiently close to ∗, then

k+1 will be even closer, and the iterative process will converge

14.6 THE DUAL VIEWPOINT

In the method of augmented Lagrangians (the method of multipliers), the primary

iteration is with respect to , and therefore it is most natural to consider the method

from the dual viewpoint This is in fact the more powerful viewpoint and leads toimprovements in the algorithm

As we observed earlier, the constrained problem

positive definite at the solution pair x∗, ∗ Thus local duality theory is applicable

to problem (37) for sufficiently large c

To apply the dual method to (37), we define the dual function

using a constant stepsize c

Although the stepsize c is a good choice (as will become even more evidentlater), it is clearly advantageous to apply the algorithmic principles of optimizationdeveloped previously by selecting the stepsize so that the new value of the dualfunction satisfies an ascent criterion This can extend the range of convergence ofthe algorithm

Trang 11

14.6 The Dual Viewpoint 457

The rate of convergence of the optimal steepest ascent method (where thesteplength is selected to maximize in the gradient direction) is determined by theeigenvalues of the Hessian of The Hessian of is found from (15) to be

It is easily seen from the above identity that the matrices BA + cBTB−1BT and

(BA−1BT) have identical eigenvectors One way to see this is to multiply both sides

of the identity by (I + cBA−1BT) on the right to obtain

cBA + cBTB−1BTI + cBA−1BT= cBA−1BT

Suppose both sides are applied to an eigenvector e of BA−1BT having value w Then we obtain

eigen-cBA + cBTB−1BT1+ cwe = cwe

It follows that e is also an eigenvector of BA + cBTB−1BT, and if is thecorresponding eigenvalue, the relation

c1+ cw = cwmust hold Therefore, the eigenvalues are related by

The above relations apply directly to the Hessian (39) through the associations

A = Lx∗ ∗ and B= hx∗ Note that the matrix hx∗Lx∗ ∗−1 hx∗T,

corresponding to BA−1BT above, is the Hessian of the dual function of the originalproblem (36) As shown in Section 14.3 the eigenvalues of this matrix determinethe rate of convergence for the ordinary dual method Let w and W be the smallestand largest eigenvalues of this matrix From (40) it follows that the ratio of smallest

to largest eigenvalues of the Hessian of the dual for the augmented problem is

1

W + c1

w+ c

Trang 12

This shows explicitly how the rate of convergence of the multiplier method depends

on c As c goes to infinity, the ratio of eigenvalues goes to unity, implying arbitrarilyfast convergence

Other unconstrained optimization techniques may be applied to themaximization of the dual function defined by the augmented Lagrangian; conjugategradient methods, Newton’s method, and quasi-Newton methods can all be used.The use of Newton’s method requires evaluation of the Hessian matrix (39) Forsome problems this may be feasible, but for others some sort of approximation isdesirable One approximation is obtained by noting that for large values of c, the

Hessian (39) is approximately equal to 1/cI Using this value for the Hessian and

hxfor the gradient, we are led to the iterative scheme

k+1= k+ chxk

which is exactly the simple method of multipliers originally proposed

We might summarize the above observations by the following statement relatingprimal and dual convergence rates If a penalty term is incorporated into a problem,the condition number of the primal problem becomes increasingly poor as c→ but the condition number of the dual becomes increasingly good To apply the dualmethod, however, an unconstrained penalty problem of poor condition number must

be solved at each step

Inequality Constraints

One advantage of augmented Lagrangian methods is that inequality constraints can

be easily incorporated Let us consider the problem with inequality constraints:

minimize fx

where g is p-dimensional We assume that this problem has a well-defined solution

x∗, which is a regular point of the constraints and which satisfies the order sufficiency conditions for a local minimum as specified in Section 11.8 Thisproblem can be written as an equivalent problem with equality constraints:

second-minimize fx

subject to gjx+ z2

j= 0 j = 1 2 p (42)Through this conversion we can hope to simply apply the theory for equalityconstraints to problems with inequalities

In order to do so we must insure that (42) satisfies the second-order sufficiency

conditions of Section 11.5 These conditions will not hold unless we impose a strict complementarity assumption that gjx∗= 0 implies ∗

j> 0 as well as the usualsecond-order sufficiency conditions for the original problem (41) (See Exercise 10.)

k− 2c

2+ cThis converges to = ? ?2 for any c > The coefficient 2/ 2 +c governs the rate

of convergence, and clearly, as c is increased... class="text_page_counter">Trang 9< /span>

Fig 14.5 Primal function and augmented Lagrangian

Fig 14.5 The lower curve represents y, and the upper curve... primal function in a manner analogous to that in Sections 13.3 and 13.8 forthe ordinary quadratic penalty function and the absolute-value penalty function

Consider again the primal function

Định dạng
Số trang	25
Dung lượng	490,79 KB