GRADIENT DYNAMICAL SYSTEMS, TAME OPTIMIZATION AND APPLICATIONS

1.1 Elementary properties of gradient systemsA Lyapunov function of a dynamical system is any scalar function on R nthat is strictly decreasing alongits integral curves.. Using 1.8 we ea

Trang 1

Spring School on Variational Analysis Paseky nad Jizerou, Czech Republic (April 20–24, 2009)

Aris DANIILIDIS Departament de Matemàtiques Universitat Autònoma de Barcelona E–08193, Bellaterra (Cerdanyola del Vallès) http://mat.uab.cat/~arisd/

Lecture Notes

Abstract These lectures present an introduction to what is nowadays called Tame Optimization, with

emphasis to (nonsmooth) ÃLojasiewicz gradient inequalities and Sard–type theorems The former topic will

be introduced via the asymptotic analysis of dynamical systems of (sub)gradient type; its consequences in the algorithmic analysis (proximal algorithm, gradient-type methods) will also be discussed The latter topic will be presented as a natural consequence of the structural assumptions made on the function (o-minimality, stratification) Our secondary aim is to provide essential background and material for further research During the lectures, some open problems will be eventually mentioned

Contents

1.1 Elementary properties of gradient systems 3

1.2 Asymptotic analysis: convergence, length, Palis & De Melo example 3

1.3 Convex case: Brezis theorem, Baillon example 4

1.4 Self-contracted curves Quasiconvex planar systems 6

2 ÃLojasiewicz inequality and generalizations 8 2.1 Defragmented gradient curves 9

2.2 The Kurdyka-ÃLojasiewicz inequality: characterizations and applications 10

2.3 Convex case: asymptotic equivalence between continuous and discrete systems 11

2.4 A convex counterexample 12

2.5 The semialgebraic case 14

3 Tame variational analysis 16 3.1 Definable functions, o-minimal structures 16

3.2 Stratification vs Clarke subdifferential 17

3.3 Sard-type theorem for (nonsmooth) tame functions 18

3.4 Applications 19

Trang 2

1 Trajectories of (sub)gradient systems

We consider the autonomous dynamical system (differential equation)

generated by a locally Lipschitz continuous function F : R n → R n We call solution curve (also trajectory

or orbit) of the vector field F any C1 curve t 7−→ γ(t) ∈ R n satisfying (1.1) Existence and uniqueness

(if we fix the initial condition γ(0) = x0∈ R n) of solutions is a classical result in the theory of Ordinary

Differential Equations (see [41, Section 2.2] e.g.).

Unless otherwise stated, we consider maximal solutions, meaning that the trajectory t 7→ γ x0(t) starting at the point x0 is defined for all t in the maximal interval [0, T x0) (T x0 ∈ (0, +∞]) for which

(1.1) makes sense Maximal intervals are always right–open and that if T x0 < +∞ then {γx0(t)} t≥0 is

unbounded (see also [41, Section 2.4]) Note that if F is (globally) Lipschitz continuous, then every orbit satisfies T x0 = +∞ The length of a C1orbit is given by the formula:

of each point of Rn by the flow generated by F , i.e a function Φ(t, x0) which associates to each x0∈ R n

and t0∈ [0, Tx0) a point γ x0(t0), where γ x0 is the unique orbit starting at x0, that is, Φ(0, x0) = γ x0(0) =

x0 (Note however that if we relax the assumption on F from local Lipschitz continuity to mere continuity, uniqueness will no longer hold: think of the example F (x1, x2) = (0,p|x2|) with initial condition any

point of the x1-axis.)

The flow is often represented by its phase portrait, that is, the picture of its integral curves in R n

Reparametrizing trajectories (e.g length–parametrization), does not change the portrait of the system.

More generally, the systems

˙x(t) = F (x(t)) and ˙x(t) = g(x(t)) F (x(t)) (1.3)

have the same portrait, provided g : R n → R is a positive smooth function Intuitively, g corresponds to

a change of velocity that the orbits are run through

Our departure point in these lectures is a particular type of autonomous dynamical system, defined

by a (sub)gradient field To define this system properly, let us first consider a C 1,1 -function f : R n → R

(that is, a differentiable function whose gradient is Lipschitz continuous) and set F = −∇f Then (1.1)

we shall eventually also consider nonsmooth functions (convex or semiconvex ) for which the so-called

subgradient integral curves can be defined in a unique way These curves are absolutely continuouscurves that are solutions of the differential inclusion

where ∂f (x) denotes the set of subgradients (subdifferential) of f at x and where the notation “a.e.”

stands for “almost everywhere” in the sense of the Lebesgue measure of R All important features

of the asymptotical study of gradient systems remain true in this nonsmooth case Let us point outthough, an important difference stemming from the above remark: Systems (1.6) are of unilateral nature

(−∂f (x) 6= ∂(−f )(x)), which means that a subgradient trajectory cannot be reversed on time (The terminology semiflow is thus employed in this case.)

Trang 3

1.1 Elementary properties of gradient systems

A Lyapunov function of a dynamical system is any scalar function on R nthat is strictly decreasing alongits integral curves Although Lyapunov functions might not always exist for the general system (1.1),

they do exist for gradient systems Indeed, given any orbit γ(t) (solution of (1.4)) one easily sees that the derivative of the function t 7−→ ρ(t) := f (γ(t)) satisfies

ρ 0 (t) = −||∇f (γ(t))||2= −|| ˙γ(t)||2≤ 0. (1.7)

We shall now use an important consequence of Remark 1(i) combined with the uniqueness of the flow of

(1.5): If γ x0 denotes the (unique, maximal) trajectory of (1.4) starting at x0, it holds

∇f (x0) 6= 0 =⇒ ∇f (γ x0(t)) 6= 0 for all t ∈ [0, T x0). (1.8)

In the sequel we denote by S the set of critical (or singular ) points of f , that is, x0∈ S if and only

if ∇f (x0) = 0 Thus, according to (1.8), unless x0 is already a singularity, the corresponding trajectory

will never pass through S This shows that the derivative in (1.7) is strictly negative on [0, T x0), thus the

function f is a Lyapunov function for the system (1.4).

(Level-set parametrization) Assume x(t) is an orbit of (1.4), and set x(0) = x0, r0 = f (x0) and

r∞= limt→T x0 f (x(t)) Using (1.8) we easily see that whenever ∇f (x0) 6= 0 (that is, whenever the orbit

x(t) is not reduced to a singleton) the mapping ρ(t) = f (x(t)) is a diffeomorphism between [0, Tx0) and

(r ∞, r0], and that the curve u(r) := x(ρ −1 (r)) satisfies the differential equation

˙u(r) = ∇f (u(r))

The existence of a Lyapunov function guarantees that a gradient system does not have periodic

(closed) orbits nor limit cycles Moreover, the ω-limit Ω(γ) of each bounded1 orbit γ (i.e the set of all limits of sequences {γ(t n )} n with t n % ∞) consists of singularities, and is either singleton or infinite ([40,

page 14] e.g.) Moreover, in view of (1.7), f is constant on Ω(γ) and dist(γ(t), Ω(γ)) ≤ dist(γ(t), S) → 0,

as t goes to infinity In this sense, gradient systems are much simpler than general dynamical systems

(1.1), and the study of their behavior around singularities results in the study of the asymptotic behavior

of their orbits

1.2 Asymptotic analysis: convergence, length, Palis & De Melo example

We shall now focus on the study of the asymptotic behavior of the orbits of the systems (1.4) and (1.6)

Let us introduce some notation For every λ ∈ R we set

[f ≤ λ] := {x ∈ R n : f (x) ≤ λ}.

The notations [f < λ] (or [λ1< f ≤ λ2] and so on) are defined analogously From now on we assume:

• f is inf-compact, that is, [f ≤ λ] is a compact subset of R n for all λ ∈ R.

The above assumption implies in particular that f attains a minimum, which will be assumed to be zero (if this is not the case, we replace f by ˜ f = f − min f and observe that f and ˜ f have the same phase

portrait) Moreover every trajectory γ x lies in a compact set, whence T x = +∞ and its ω-limit set is nonempty In view of (1.7), f is constant on the ω-limit set of each of its orbits The following example ([40, page 14]) shows that an ω-limit set can be infinite.

Example 2 (Palis, De Melo) Let f : R2→ R be defined (in polar coordinates) by

Then f is C ∞ and there exists an orbit whose ω-limit set is the unit sphere S1.

1The ω-limit set may be empty for unbounded orbits (think of the example f (x) = x3 and x0< 0).

Trang 4

The function f of the above example presents many oscillations It is interesting to visualize its level

sets (or its graph in R3) A similar example of such Mexican–hat type function was given in [1, §2].

In both cases, there exist nontrivial trajectories of the corresponding gradient system (1.4) with infinitelength

It is straightforward to see that if an orbit γ x0 of (1.4) has finite length, then it converges to its ω-limit

(a singleton in this case) We denote

the origin, with no critical point except at the origin (global minimum of f ) The gradient trajectory

of f issued from the point (r, θ) = (( 3π

2)−1 , 0) remains close to the spiral given by

(

r = ¡3π

2 + t¢−1

θ = −t

and thus has infinite length

We finally mention the following classical result due to ÃLojasiewicz:

• (real-analytic case) Bounded gradient trajectories of real-analytic functions f : R n → R have

finite length

This is a consequence of the classical (ÃLojasiewicz) gradient inequality (we give more details in

Se-ction 2) In particular, each bounded trajectory γ of an analytic gradient system is converging to its

ω-limit γ∞ Moreover, the so-called Thom conjecture [45] for the gradient orbits of real-analytic tions holds true: the secants converge towards a fixed direction of the unit sphere (see K Kurdyka,

func-T Mostowski and A Parusinski [33] for the proof)

γ(t) − γ∞

||γ(t) − γ∞|| → d∞ ∈ S

1.3 Convex case: Brezis theorem, Baillon example

The case when the function f is convex and attains its minimum is particularly interesting in view of

the important role of convex functions in optimization Note that differentiability assumptions are not

needed for the study of the corresponding integral curves Indeed, under the assumption that f is convex

Trang 5

and merely lower semicontinuous, we consider the (differential inclusion) subgradient system (1.6), where

∂f : R n ⇒ Rn is the classical convex subdifferential, defined for every x ∈ dom f as the set ∂f (x) of all

ζ ∈ R n such that for every y ∈ R n

Recall that for convex functions every critical point x ∈ S (in the nonsmooth case this means 0 ∈ ∂f (x))

is a global minimizer of the function

Solution curves (subgradient trajectories) are then absolutely continuous curves that satisfy the ferential inclusion (1.6) almost everywhere Existence and uniqueness follow from Brezis theorem (see [15, Theorem 3.2, p 57] or [2, Chapter 3.4]) Given a trajectory γ : [0, T ) → R n of (1.6), for almost all

dif-t ∈ (0, T ) we have

d

dt (f ◦ γ)(t) = h ˙γ(t), ζi, for all ζ ∈ ∂f (γ(t)) ,

and the function ζ 7→ h ˙γ(t), ζi is constant on ∂f (γ(t)) Furthermore, if

where argminf stands for the set of global minimizers of f (there is no loss of generality in assuming this), we have the following consequence of (1.6) and (1.12): for every a ∈ C0

12

d

dt ||γ(t) − a||

2≤ −f (γ(t)) ≤ 0 a.e on (0, +∞),

and therefore the distance mapping t 7→ ||γ(t) − a|| is nonincreasing Since dist (γ(t), S) → 0 as t goes to

infinity, we deduce the following result

• Every subgradient orbit of a (nonnegative) convex function f converges to a global minimizer of f

Note this result also holds in an infinite dimensional Hilbert space [17, Theorem 4], however the

convergence should be taken in the weak topology, unless f is even [17, Theorem 5] (see also [39] for a slightly more general statement) Indeed, Baillon [4] shows that for any λ ≥ 1 the function f λ : R2 →

is lower semicontinuous and convex and uses it to construct a lower semicontinuous function ϕ : `2(N) →

R ∪ {+∞} with minimum at 0 and with a bounded gradient trajectory which remains bounded away

from 0 In particular, this trajectory does not converge for the norm topology and has infinite length

A natural question though, is whether or not in finite dimensions subgradient orbits of convex functionshave finite length The rigid structure of convex functions makes natural to think that such orbits should

be of finite length It is rather surprising that the answer of this question is not yet known except in

some particular cases

Before we proceed, let us mention the particular case where the set of minimizers in (1.13) hasnonempty interior (see [14])

Theorem 4 (int (argmin f ) 6= ∅) Let f : H → [0, +∞] be a lower semi-continuous convex function such that the set of critical points (in this case, global minimizers) S = argmin has nonempty interior Then subgradient orbits have finite length.

More precisely, assuming B(0, δ) ⊂ S for some δ > 0, we obtain the estimation

Z T

0

|| ˙γ(t))||dt ≤ p1 + (||γ(0)||/δ)2 (||γ(0)|| − ||γ(T )||), for all T ≥ 0.

The following result ([13, Section 4.1]) is an extension of Theorem 4 under the assumption that the

vector subspace span(S) generated by S = argmin f , has codimension one in H We denote by ri(S) the relative interior of S in span(S).

Trang 6

Theorem 5 Let f : H → R ∪ {+∞} be a lower semicontinuous convex function satisfying min f =

f (0) = 0 For S = argmin f , assume that the subspace span(S) has codimension 1 and that the relative interior ri(S) of C with respect to span(S) is not empty If x0∈ dom f is such that γx0(t) converges (for

the norm topology) to a ∈ ri(S) as t → +∞, then length(γx0) < +∞.

In the next section we tackle this problem in the plane, in a more general setting that also encompassesquasiconvex systems

1.4 Self-contracted curves Quasiconvex planar systems

We recall that the length of a continuous curve γ : I → R n is defined as

length(γ) := sup

( kX

i=1

dist(γ(t i ), γ(t i+1))

)

where the supremum is taken over all the finite subdivisions {t i} k+1 i=1 of I (Note that the above definition

is equivalent to (1.2) in case γ is C1.) The key notion in this section is the notion of self-contracted curve [24, Definition 1.2], which allows to provide a unified framework for the study of convex and quasiconvex

gradient systems Let us define this notion

Definition 6 (Self-contracted curve) A curve γ : I → R n defined in an interval I of [0, +∞) is called self-contracted, if for every t1≤ t2≤ t3, with t i ∈ I, we have

Remark 7 Orientation is important in Definition 6 In particular, the curve

t ∈ (a, b) 7→ γ(a + b − t)

might not be self-contracted, while γ : (a, b) → R n is so This unilateral aspect can be compared toRemark 1(ii)

We further recall from [24] some elementary properties of self-contracted curves

• Let γ : I 7→ R n be a bounded self-contracted curve and (a, b) ⊂ I Then, γ has a limit in R n

whenever t ∈ (a, b) tends to an endpoint of (a, b) In particular, every self-contracted curve can be extended by continuity to the endpoints of I (possibly equal to ±∞).

In the sequel, we shall assume that every self-contracted curve γ : I 7→ R n is (defined and) continuous

at the endpoints of I The following result is a straightforward consequence of the above.

Corollary 8 (Convergence of bounded self-contracted curves) Every bounded self-contracted curve γ :

(0, +∞) → R n converges to some point x0∈ R2 as t → +∞ Moreover, the function t 7→ dist(x0, γ(t)) is nonincreasing.

Trang 7

Corollary 8 reveals that the trajectories of a general gradient system

self-Recall that a function f : R n → R is called quasiconvex, if its sublevel sets [f ≤ λ] (λ ∈ R) are

convex in Rn If f is differentiable, then it is quasiconvex if and only if for every x, y ∈ R n the followingimplication holds:

where D(γ) is the distance between the endpoints of γ.

As a consequence we obtain the following

Theorem 10 (Convex gradient system) Let f : R2 → R be a smooth convex function with a unique minimum Then, the trajectories γ of the gradient system (1.4) have a (uniformly) finite length.

• (open problem) It is not known whether Theorem 9 and Theorem 10 hold in higher dimensions.Let us conclude this section with a final remark

Trang 8

Remark 11 (Failure of Thom conjecture in the convex case) Theorem 10 guarantees that the orbits of

the gradient flow of f have finite length (thus, a fortiori, are converging to the global minimum of f ).

However, in strong contrast to the analytic case, it may happen that each orbit turns around its limitinfinitely many times (see counterexample in [24, Section 7.2] – an illustration is presented in Figure 1.4),

so Thom conjecture fails in the convex case

2 ÃLojasiewicz inequality and generalizations

The results of this section are motivated by a well-known result due to S ÃLojasiewicz (see [37]), which

asserts that if f : R n → R is a real-analytic function and ¯ x ∈ f −1 (0) is a critical point of f , then there exist two constants θ ∈ [1/2, 1) and k > 0 such that

for all x belonging in a neighborhood U of ¯ x This result is a cornerstone of the modern theory of semianalytic geometry [36] and allows to deduce that all gradient orbits of f that converge to ¯ x and lie

inside U have finite length The proof is illustrated below:

Let γ : [0, +∞) → U be a gradient trajectory of f , that is, ˙γ(t) = −∇f (γ(t)) Then,

in the more general context of subgradient systems)

Inequality (2.1) has been extended by K Kurdyka in [31] for C1 functions belonging to an arbitrary

o-minimal structure (we give this definition in Section 3), in a way that allows to deduce the finiteness

of the lengths of the gradient orbits in this more general context In [8] and [9], a further extension hasbeen realized to encompass (nonsmooth) functions and orbits of the corresponding subgradient systems

For sake of simplicity in the presentation, we limit ourselves in the smooth case and we fix a C1function

f : R n → R.

We recall that a value ¯r ∈ f (R n ) is called critical value for f if there exists a critical point ¯ x ∈ R such

that f (¯ x) = ¯ r It is called regular value, if it is not a critical value (Note that if ¯ r is a regular value,

then M := [f = ¯ r] is a submanifold of R n of codimension 1.)

We introduce the following property KÃL(¯r) for the critical value ¯ r of f.

Definition 12 (property KÃL(¯r)) We say that the function f satisfies property KÃL(¯ r) if there exists a

C1 function ψ : (¯ r, ¯ r + δ) → (0, ∞) with positive derivative and limr→¯ r ψ(r) = 0 such that

||∇(ψ ◦ f )(x)|| ≥ 1, for all ¯ r < f (x) < ¯ r + δ. (2.2)

Remark 13 (i) If r is a regular value of an inf-compact C1 function f, then KÃL(r) holds.

(ii) If KÃL(r) holds and r ∈ f (R n ) is a critical value, then r is an upper isolated critical value, that is, there exists δ > 0 such that the interval (r, r + δ) is made up of regular values.

Trang 9

The aforementioned result of Kurdyka asserts that every o-minimal function f satisfies property KÃL(r) for every r ∈ R (see also Section 3) Thus this is true in particular for real-analytic functions In fact

(2.1) follows from KÃL(¯r) for ¯ r = 0 and ψ(r) = r 1−θ

Note finally that the functions ψ ◦ f and f have the gradient curves on [¯ r < f < ¯ r + δ] In view of

(2.2) it is natural to call ψ a desingularization function for f For convenience, we introduce the following

As we shall see in Section 2.3, in some particular cases it is possible (and highly convenient) to obtain

a desingularization function ψ which is concave and defined in the half-line [0, +∞).

2.1 Defragmented gradient curves

In this subsection we mention an important consequence of (2.2) for the asymptotic behavior of gradient

systems Let γ be a bounded orbit of (1.4) starting at γ(0) = x0 and set r0 = f (x0) In view of (1.7),

the limit r ∞= limt→∞ f (γ(t)) exists and the common value r∞ is necessarily critical (Note that we do

not know yet that the limit of γ(t) exists: r ∞ is the value of any ω-limit of γ) Note further that in view

is not bounded around r ∞ the above integral may diverge But this cannot happen if f satisfies KÃL(r ∞)

Indeed, in this case using (2.2) and the identity f (u(r)) = r we deduce for some δ > 0 that

Z r ∞ +δ

r ∞

dr k∇f (u(r))k ≤

But formula (2.5) also expresses a uniformity result which is not reflected in the previous statement

To make this precise, let us introduce the notion of defragmented (or piecewise) gradient curve.

Definition 14 (defragmented gradient curve) A curve γ : [0, T ) → R n (T ∈ (0, ∞]) is called mented gradient curve if there exists a countable partition of [0, T ] into (nonempty) intervals I k suchthat:

defrag-– the restriction γ| I k of γ to each interval I k is a gradient curve (i.e solution of (1.4)) ;

– for each disjoint pair of intervals I k , I l , the intervals f (γ(I k )) and f (γ(I l)) have at most one point

in common

Trang 10

Note that gradient orbits satisfy the above definition in a trivial way It is now easy to obtain anotherconsequence of (2.2).

• if f satisfies KÃL(r), then there exists δ > 0 such that the length of every defragmented gradient curve that lies in [r ≤ f ≤ r + δ] is bounded by the number ψ(r + δ) − ψ(r).

In the sequel, for r1> r2 we shall use the notation

γ ⊂ [r2≤ f ≤ r1]

to indicate that the image of the (defragmented) gradient curve γ lies in [r2 ≤ f ≤ r1] We finish thissection with an interesting observation ([7, Section 2.3])

Remark 15 (Reduction to one–dimension) Let ψ be a desingularization function of f as in (2.2) We set

φ = ψ −1 : [0, ψ(¯ r +δ)) → [¯ r, ¯ r +δ] and we denote by χφthe gradient curve of the (trivial one-dimensional)

˙χ(r) = −φ 0 (χ(r))

χ(0) = 0

Then length(γ) ≤ length(χ φ ) for every gradient curve γ ⊂ [¯ r ≤ f ≤ ¯ r + δ] of f

2.2 The Kurdyka-ÃLojasiewicz inequality: characterizations and applications

In this section we give several characterizations of the property given in Definition 12 The proof of the

following characterization is almost straightforward for C 1,1 inf-compact functions (This result remainstrue for nonsmooth semiconvex functions in a Hilbert space, though its proof is not straightforward, see[13, Section 3.3] for details.)

Proposition 16 (local integrability of the inverse minimal gradient norm) Assume that the interval

(¯r, ¯ r + δ) is made up of regular values and consider the function ϕ : (¯ r, ¯ r + δ) → (0, +∞) defined by

ϕ(r) = max

f (x)=r

1

k∇f (x)k . Then KÃL(¯ r) holds if and only if ϕ is locally integrable around ¯ r.

We shall also need the notion of a valley ([21], [22])

Definition 17 (Valley) For any ρ > 1 the ρ-valley V ρ (·) of f is defined as follows:

discussion on the notion of metric regularity.

Theorem 18 (Characterization of KÃL-property) Let f : R n → R be a C 1,1 inf-compact function Assume ¯ r = 0 is an upper isolated critical value (cf Remark 13 (ii)) and let r0 > 0 The following are equivalent:

(i) (Kurdyka–ÃLojasiewicz inequality) Property KÃL(0) holds with dom ψ ⊃ [0, r0).

(ii) (uniform length of defragmented gradient curves) There exists M > 0 such that for every defragmented

gradient curve γ ⊂ [0 ≤ f < r0] we have length(γ) < M.

(iii) (Talweg on the valley) For every ρ > 1, there exists a piecewise C1 curve (discontinuous with countable pieces) θ : (0, r0) → R n with finite length such that θ(r) ∈ Vρ (r), for all r ∈ (0, r0) (Such a

curve is called talweg.)

(iv) (metric regularity) There exists a C1 function ψ : (0, r0) → R+ with lim

x→0+ψ(r) = 0 and positive derivatives such that

Dist([f ≤ r], [f ≤ s]) ≤ |ψ(r) − ψ(s)|, for all r, s ∈ (0, r0),

where Dist(A, B) denotes the Hausdorff distance between the compact sets A and B.

Định dạng
Số trang	21
Dung lượng	269,41 KB