The dual canonical convergence rate associated with the original constrained problem, which is the rate of convergence of steepest ascent applied to the dual, is B− b2/B+ b2 where b and
Trang 114.4 Separable Problems 447
that while the convergence of primal methods is governed by the restriction of L∗
to M, the convergence of dual methods is governed by a restriction of L∗−1 tothe orthogonal complement of M
The dual canonical convergence rate associated with the original constrained
problem, which is the rate of convergence of steepest ascent applied to the dual,
is B− b2/B+ b2 where b and B are, respectively, the smallest and largesteigenvalues of
minimize fx
subject to hx = 0
If a change of primal variables x is introduced, the primal rate will in general change
but the dual rate will not On the other hand, if the constraints are transformed (by
replacing them by Thx = 0 where T is a nonsingular m×m matrix), the dual rate
will change but the primal rate will not
A structure that arises frequently in mathematical programming applications is that
of the separable problem:
Trang 2of functions of the individual groups For each i, the functions fi, hi, and gi aretwice continuously differentiable functions of dimensions 1, m, and p, respectively.
Example 1. Suppose that we have a fixed budget of, say, A dollars that may beallocated among n activities If xi dollars is allocated to the ith activity, then therewill be a benefit (measured in some units) of fixi To obtain the maximum benefitwithin our budget, we solve the separable problem
In the example x is partitioned into its individual components.
Example 2. Problems involving a series of decisions made at distinct times areoften separable For illustration, consider the problem of scheduling water releasethrough a dam to produce as much electric power as possible over a given timeinterval while satisfying constraints on acceptable water levels A discrete-timemodel of this problem is to
In this example we consider x as the 2N -dimensional vector of unknowns
yk uk k= 1 2 N This vector is partitioned into the pairs xk =
yk uk The objective function is then clearly in separable form The
constraints can be viewed as being in the form (27) with hkxk having dimension
N and such that hkxk is identically zero except in the k and k+ 1 components
Decomposition
Separable problems are ideally suited to dual methods, because the required strained minimization decomposes into small subproblems To see this we recall
Trang 3uncon-14.4 Separable Problems 449
that the generally most difficult aspect of a dual method is evaluation of the dual
function For a separable problem, if we associate with the equality constraints (27) and 0 with the inequality constraints (28), the required dual function is
Example 4. In Example 2 using duality with respect to the equality constraints wedenote the dual variables by k k= 1 2 N The kth subproblem becomes
k The kth subproblem can then be interpreted as that faced by an entrepreneur wholeased the dam for one period He can buy water for the dam at the beginning ofthe period at price k and sell what he has left at the end of the period at price
k+1 His problem is to determine yk and uk so that his net profit, accruingfrom sale of generated power and purchase and sale of water, is maximized
Example 5. (The hanging chain) Consider again the problem of finding theequilibrium position of the hanging chain considered in Example 4, Section 11.3,and Example 1, Section 12.7 The problem is
Trang 4= 0
or
ci+ 21− y2
i= 2yi2This yields
Trang 514.5 Augmented Lagrangians 451
Table 14.1 Results of Dual of Chain Problem
Final solution
= −1000048Iteration Value = −6761136
One of the most effective general classes of nonlinear programming methods is
the augmented Lagrangian methods, alternatively referred to as multiplier methods.
These methods can be viewed as a combination of penalty functions and local dualitymethods; the two concepts work together to eliminate many of the disadvantagesassociated with either method alone
The augmented Lagrangian for the equality constrained problem
From a penalty function viewpoint the augmented Lagrangian, for a fixed value
of the vector , is simply the standard quadratic penalty function for the problem
minimize fx+ Thx
This problem is clearly equivalent to the original problem (31), since combinations
of the constraints adjoined to fx do not affect the minimum point or the minimum
value However, if the multiplier vector were selected equal to ∗, the correct
Trang 6Lagrange multiplier, then the gradient of lcx ∗ would vanish at the solution x∗.
This is because lcx ∗= 0 implies fx + ∗T hx + chxhx = 0, which is satisfied by fx + ∗T hx= 0 and hx = 0 Thus the augmented
Lagrangian is seen to be an exact penalty function when the proper value of ∗ isused
A typical step of an augmented Lagrangian method starts with a vector k
Then xk is found as the minimum point of
To motivate the adjustment procedure, consider the constrained problem (32)
with = k The Lagrange multiplier corresponding to this problem is ∗− k,
where ∗ is the Lagrange multiplier of (31) On the other hand since (33) is thepenalty function corresponding to (32), it follows from the results of Section 13.3
that chxk is approximately equal to the Lagrange multiplier of (32) Combining
these two facts, we obtain chxk ∗− k Therefore, a good approximation to
the unknown ∗ is k+1= k+ chxk
Although the main iteration in augmented Lagrangian methods is with respect
to , the penalty parameter c may also be adjusted during the process As in ordinary
penalty function methods, the sequence of c’s is usually preselected; c is either heldfixed, is increased toward a finite value, or tends(slowly) toward infinity Since inthis method it is not necessary for c to go to infinity, and in fact it may remain
of relatively modest value, the ill-conditioning usually associated with the penaltyfunction approach is mediated
From the viewpoint of duality theory, the augmented Lagrangian is simply thestandard Lagrangian for the problem
2chx2 tends to “convexify” the Lagrangian For sufficientlylarge c, the Lagrangian will indeed be locally convex Thus the duality method can
be employed, and the corresponding dual problem can be solved by an iterative
process in This viewpoint leads to the development of additional multiplier
adjustment processes
Trang 714.5 Augmented Lagrangians 453
The Penalty Viewpoint
We begin our more detailed analysis of augmented Lagrangian methods by showingthat if the penalty parameter c is sufficiently large, the augmented Lagrangian has
a local minimum point near the true optimal point This follows from the followingsimple lemma
Lemma Let A and B be n × n symmetric matrices Suppose that B is positive semi-definite and that A is positive definite on the subspace Bx = 0.
Then there is a c∗such that for all c≥ c∗the matrix A +cB is positive definite.
Proof. Suppose to the contrary that for every k there were an xkwithxk = 1 such
that xT
kA + kBxk≤ 0 The sequence xk
converging to a limit x Now since xT
kBxk≥ 0, it follows that xTBx= 0 It also
follows that xTAx ≤ 0 However, this contradicts the hypothesis of the lemma.
This lemma applies directly to the Hessian of the augmented Lagrangian
evaluated at the optimal solution pair x∗, ∗ We assume as usual that the
second-order sufficiency conditions for a constrained minimum hold at x∗ ∗ The Hessian
of the augmented Lagrangian evaluated at the optimal pair x∗ ∗is
Lcx∗ ∗= Fx∗+ ∗THx∗+ chx∗T
hx∗
= Lx∗+ chx∗T hx∗
The first term, the Hessian of the normal Lagrangian, is positive definite on the
subspace hx∗x = 0 This corresponds to the matrix A in the lemma The matrix
hx∗T hx∗ is positive semi-definite and corresponds to B in the lemma It
follows that there is a c∗ such that for all c > c∗ Lcx∗ ∗ is positive definite.This leads directly to the first basic result concerning augmented Lagrangians
Proposition 1. Assume that the second-order sufficiency conditions for a local
minimum are satisfied at x∗ ∗ Then there is a c∗such that for all c≥ c∗, the
augmented Lagrangian lcx ∗ has a local minimum point at x∗.
By a continuity argument the result of the above proposition can be extended to
a neighborhood around x∗ ∗ That is, for any near ∗, the augmented Lagrangian
has a unique local minimum point near x∗ This correspondence defines a continuous
function If a value of can be found such that hx = 0, then that must in fact be ∗, since x satisfies the necessary conditions of the original problem Therefore, the problem of determining the proper value of can be viewed as one
of solving the equation hx= 0 For this purpose the iterative process
k+1= k+ chxk
is a method of successive approximation This process will converge linearly in a
neighborhood around ∗, although a rigorous proof is somewhat complex We shallgive more definite convergence results when we consider the duality viewpoint
Trang 8Example 1. Consider the simple quadratic problem studied in Section 13.8
minimize 2x2+ 2xy + y2− 2ysubject to x= 0
The augmented Lagrangian for this problem is
lcx y = 2x2+ 2xy + y2− 2y + x +1
2cx
2
The minimum of this can be found analytically to be x= −2 + /2 + c y =
4+ c + /2 + c Since hx y = x in this example, it follows that the iterativeprocess for kis
k+1= k−c2+ k
2+ cor
k+1=
2
2+ c
k− 2c
2+ cThis converges to = −2 for any c > 0 The coefficient 2/2 +c governs the rate
of convergence, and clearly, as c is increased the rate improves
Geometric Interpretation
The augmented Lagrangian method can be interpreted geometrically in terms ofthe primal function in a manner analogous to that in Sections 13.3 and 13.8 forthe ordinary quadratic penalty function and the absolute-value penalty function
Consider again the primal function y defined as
y
where the minimum is understood to be taken locally near x∗ We remind the
reader that 0 = fx∗ and that 0T= −∗ The minimum of the augmented
Lagrangian at step k can be expressed in terms of the primal function as follows:
where the minimization with respect to y is to be taken locally near y = 0 This
minimization is illustrated geometrically for the case of a single constraint in
Trang 9Fig 14.5 Primal function and augmented Lagrangian
Fig 14.5 The lower curve represents y, and the upper curve represents y+
1
2cy2 The minimum point yk of (30) occurs at the point where this upper curvehas slope equal to −k It is seen that for c sufficiently large this curve will beconvex at y= 0 If k is close to ∗, it is clear that this minimum point will beclose to 0; it will be exact if k= ∗.
The process for updating kis also illustrated in Fig 14.5 Note that in general,
if xk minimizes lcx k, then yk= hxk is the minimum point of y+ T
as shown in Fig 14.5 for the one-dimensional case In the figure the next point
yk+1is the point where y+1cy2has slope−k+1, which will yield a positive
Trang 10value of yk+1 in this case It can be seen that if k is sufficiently close to ∗, then
k+1 will be even closer, and the iterative process will converge
14.6 THE DUAL VIEWPOINT
In the method of augmented Lagrangians (the method of multipliers), the primary
iteration is with respect to , and therefore it is most natural to consider the method
from the dual viewpoint This is in fact the more powerful viewpoint and leads toimprovements in the algorithm
As we observed earlier, the constrained problem
positive definite at the solution pair x∗, ∗ Thus local duality theory is applicable
to problem (37) for sufficiently large c
To apply the dual method to (37), we define the dual function
using a constant stepsize c
Although the stepsize c is a good choice (as will become even more evidentlater), it is clearly advantageous to apply the algorithmic principles of optimizationdeveloped previously by selecting the stepsize so that the new value of the dualfunction satisfies an ascent criterion This can extend the range of convergence ofthe algorithm
Trang 1114.6 The Dual Viewpoint 457
The rate of convergence of the optimal steepest ascent method (where thesteplength is selected to maximize in the gradient direction) is determined by theeigenvalues of the Hessian of The Hessian of is found from (15) to be
It is easily seen from the above identity that the matrices BA + cBTB−1BT and
(BA−1BT) have identical eigenvectors One way to see this is to multiply both sides
of the identity by (I + cBA−1BT) on the right to obtain
cBA + cBTB−1BTI + cBA−1BT= cBA−1BT
Suppose both sides are applied to an eigenvector e of BA−1BT having value w Then we obtain
eigen-cBA + cBTB−1BT1+ cwe = cwe
It follows that e is also an eigenvector of BA + cBTB−1BT, and if is thecorresponding eigenvalue, the relation
c1+ cw = cwmust hold Therefore, the eigenvalues are related by
The above relations apply directly to the Hessian (39) through the associations
A = Lx∗ ∗ and B= hx∗ Note that the matrix hx∗Lx∗ ∗−1 hx∗T,
corresponding to BA−1BT above, is the Hessian of the dual function of the originalproblem (36) As shown in Section 14.3 the eigenvalues of this matrix determinethe rate of convergence for the ordinary dual method Let w and W be the smallestand largest eigenvalues of this matrix From (40) it follows that the ratio of smallest
to largest eigenvalues of the Hessian of the dual for the augmented problem is
1
W + c1
w+ c
Trang 12
This shows explicitly how the rate of convergence of the multiplier method depends
on c As c goes to infinity, the ratio of eigenvalues goes to unity, implying arbitrarilyfast convergence
Other unconstrained optimization techniques may be applied to themaximization of the dual function defined by the augmented Lagrangian; conjugategradient methods, Newton’s method, and quasi-Newton methods can all be used.The use of Newton’s method requires evaluation of the Hessian matrix (39) Forsome problems this may be feasible, but for others some sort of approximation isdesirable One approximation is obtained by noting that for large values of c, the
Hessian (39) is approximately equal to 1/cI Using this value for the Hessian and
hxfor the gradient, we are led to the iterative scheme
k+1= k+ chxk
which is exactly the simple method of multipliers originally proposed
We might summarize the above observations by the following statement relatingprimal and dual convergence rates If a penalty term is incorporated into a problem,the condition number of the primal problem becomes increasingly poor as c→ but the condition number of the dual becomes increasingly good To apply the dualmethod, however, an unconstrained penalty problem of poor condition number must
be solved at each step
Inequality Constraints
One advantage of augmented Lagrangian methods is that inequality constraints can
be easily incorporated Let us consider the problem with inequality constraints:
minimize fx
where g is p-dimensional We assume that this problem has a well-defined solution
x∗, which is a regular point of the constraints and which satisfies the order sufficiency conditions for a local minimum as specified in Section 11.8 Thisproblem can be written as an equivalent problem with equality constraints:
second-minimize fx
subject to gjx+ z2
j= 0 j = 1 2 p (42)Through this conversion we can hope to simply apply the theory for equalityconstraints to problems with inequalities
In order to do so we must insure that (42) satisfies the second-order sufficiency
conditions of Section 11.5 These conditions will not hold unless we impose a strict complementarity assumption that gjx∗= 0 implies ∗
j> 0 as well as the usualsecond-order sufficiency conditions for the original problem (41) (See Exercise 10.)
...k− 2c
2+ cThis converges to = ? ?2 for any c > The coefficient 2/ 2 +c governs the rate
of convergence, and clearly, as c is increased... class="text_page_counter">Trang 9< /span>
Fig 14.5 Primal function and augmented Lagrangian
Fig 14.5 The lower curve represents y, and the upper curve... primal function in a manner analogous to that in Sections 13.3 and 13.8 forthe ordinary quadratic penalty function and the absolute-value penalty function
Consider again the primal function