Bài giảng Tối ưu hóa nâng cao: Chương 6 - Hoàng Nam Dũng

Bài giảng Tối ưu hóa nâng cao - Chương 6: Subgradients cung cấp cho người học các kiến thức: Last time - gradient descent, subgradients, examples of subgradients, monotonicity, examples of non-subdifferentiable functions,... Mời các bạn cùng tham khảo.

Trang 1

Hoàng Nam Dũng

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội

Trang 2

Last time: gradient descent

Consider the problem

min

x f(x)for f convex and differentiable,dom(f ) =Rn

Gradient descent: choose initial x(0) ∈ Rn, repeat

Trang 4

Basic inequality

Recall that for convex and differentiable f ,

f(y )≥ f (x) + ∇f (x)T(y − x), ∀x, y ∈ dom(f )

Basic inequality

recall the basic inequality for differentiable convex functions:

f (y) ≥ f(x) + ∇f(x) T (y − x) ∀y ∈ dom f

• the first-order approximation of f at x is a global lower bound

• ∇f(x) defines a non-vertical supporting hyperplane to epi f at (x, f (x)) :

−

x

Trang 5

Asubgradientof a convex function f at x is any g ∈ Rn such that

f(y )≥ f (x) + gT(y− x), ∀y ∈ dom(f )

I Always exists (on the relative interior of dom(f ))

I If f differentiable at x , then g =∇f (x) uniquely

I Same definition works for nonconvex f (however, subgradients

need not exist)

Subgradient

gis a subgradient of a convex functionf at x ∈ dom f if

Trang 6

Asubgradientof a convex function f at x is any g ∈ Rn such that

f(y )≥ f (x) + gT(y− x), ∀y ∈ dom(f )

I Always exists (on the relative interior of dom(f ))

I If f differentiable at x , then g =∇f (x) uniquely

I Same definition works for nonconvex f (however, subgradientsneed not exist)

Subgradient

gis a subgradient of a convex functionf at x ∈ dom f if

g 1 , g 2 are subgradients at x 1 ; g 3 is a subgradient at x 2

g1 and g2 are subgradients at x1, g3 is subgradient at x2 4

Trang 7

I For x 6= 0, unique subgradient g = sign(x)

I For x = 0, subgradient g is any element of [−1, 1]

5

Trang 8

I For x 6= 0, unique subgradient g = x

kxk2

I For x = 0, subgradient g is any element of {z : kzk2 ≤ 1} 6

Trang 9

I For xi 6= 0, unique ith component gi = sign(xi)

I For xi = 0, ith component gi is any element of[−1, 1]

7

Trang 10

I For f1(x) > f2(x), unique subgradient g =∇f1(x)

I For f2(x) > f1(x), unique subgradient g =∇f2(x)

I For f1(x) = f2(x), subgradient g is any point on line segment

Trang 11

Set of all subgradients of convex f is called thesubdifferential:

∂f (x) ={g ∈ Rn: g is a subgradient of f at x}

Properties:

I Nonempty for convex f at x ∈ int(domf )

I ∂f (x) is closed and convex (even for nonconvex f )

I If f is differentiable at x , then∂f (x) ={∇f (x)}

I If∂f (x) ={g}, then f is differentiable at x and ∇f (x) = g.Proof: Seehttp://www.seas.ucla.edu/~vandenbe/236C/lectures/subgradients.pdf

9

Trang 12

Set of all subgradients of convex f is called thesubdifferential:

∂f (x) ={g ∈ Rn: g is a subgradient of f at x}

Properties:

I Nonempty for convex f at x ∈ int(domf )

I ∂f (x) is closed and convex (even for nonconvex f )

Trang 13

f(y )≥ f (x) + uT(y− x) and f (x) ≥ f (y) + vT(x− y).

Combining them get shows monotonicity

Question: Monotonicity for differentiable convex function?

(∇f (x) − ∇f (y))T(x− y) ≥ 0,which follows directly from the first order characterization ofconvex functions

10

Trang 14

(∇f (x) − ∇f (y))T(x− y) ≥ 0,which follows directly from the first order characterization ofconvex functions

10

Trang 15

(∇f (x) − ∇f (y))T(x− y) ≥ 0,which follows directly from the first order characterization of

convex functions

10

Trang 16

Examples of non-subdifferentiable functions

The following functions are not subdifferentiable at x = 0

Trang 17

Connection to convex geometry

Convex set C ⊆ Rn, consider indicator function IC:Rn→ R,

Trang 19

Subgradient calculus

Basic rules for convex functions:

I Scaling:∂(af ) = a· ∂f provided a > 0

Trang 20

I Norms: important special case, f(x) =kxkp Let q be such

Trang 21

Why subgradients?

Subgradients are important for two reasons:

I Convex analysis: optimality characterization via subgradients,monotonicity, relationship to duality

I Convex optimization: if you can compute subgradients, thenyou can minimize any convex function

16

Trang 22

Optimality condition

Subgradient optimality condition: For any f (convex or not),

f(x∗) = min

x f(x)⇐⇒ 0 ∈ ∂f (x∗),i.e., x∗ is a minimizer if and only if 0 is a subgradient of f at x∗

Why? Easy: g = 0 being a subgradient means that for all y

f(y )≥ f (x∗) + 0T(y− x∗) = f (x∗)

Note the implication for a convex and differentiable function fwith∂f (x) ={∇f (x)}

17

Trang 23

Optimality condition

Subgradient optimality condition: For any f (convex or not),

f(x∗) = min

x f(x)⇐⇒ 0 ∈ ∂f (x∗),i.e., x∗ is a minimizer if and only if 0 is a subgradient of f at x∗.Why? Easy: g = 0 being a subgradient means that for all y

f(y )≥ f (x∗) + 0T(y− x∗) = f (x∗)

Note the implication for a convex and differentiable function f

with∂f (x) ={∇f (x)}

17

Trang 24

Derivation of first-order optimality

Example of the power of subgradients: we can use what we havelearned so far to derive thefirst-order optimality condition

Direct proof see, e.g., http://www.princeton.edu/~amirali/Public/

Teaching/ORF523/S16/ORF523_S16_Lec7_gh.pdf Proof using subgradient next slide.

Intuitively: says that gradient increases as we move away from x Note that for C =Rn (unconstrained case) it reduces to ∇f = 0 18

Trang 25

Derivation of first-order optimality

Trang 26

Example: lasso optimality conditions

Given y ∈ Rn, X ∈ Rn×p,lasso problem can be parametrized as

minβ

1

2ky − X βk22+ λkβk1whereλ≥ 0

[−1, 1] if βi = 0

20

Trang 27

Given y ∈ Rn, X ∈ Rn×p,lasso problem can be parametrized as

minβ

1

2ky − X βk22+ λkβk1whereλ≥ 0 Subgradient optimality

0∈ ∂ 12ky − X βk22+ λkβk1

⇔ 0 ∈ −XT(y − X β) + λ∂ kβk1

⇔ XT(y− X β) = λvfor some v ∈ ∂ kβk1, i.e.,

[−1, 1] if βi = 0

20

Trang 28

Write X1, , Xp for columns of X Then our condition reads

They are also helpful in understanding the lasso estimator; e.g., if

|XT

i (y − X β)| < λ, then βi = 0 (used by screening rules, later?)

21

Trang 29

Write X1, , Xp for columns of X Then our condition reads

They are also helpful in understanding the lasso estimator; e.g., if

|XT

i (y− X β)| < λ, then βi = 0 (used by screening rules, later?)

21

Trang 30

Example: soft-thresholding

Simplfied lasso problem with X = I :

minβ

1

2ky − βk22+ λkβk1.This we can solve directly using subgradient optimality Solution is

β = Sλ(y ), where Sλ is thesoft-thresholding operator







yi − βi = λ· sign(βi) if βi 6= 0

|yi − βi| ≤ λ ifβi = 0

22

Trang 31

Example: soft-thresholding

Now plug inβ = Sλ(y ) and check these are satisfied:

I When yi > λ, βi = yi − λ > 0, so yi − βi = λ = λ· 1

I When yi <−λ, argument is similar

I When|yi| ≤ λ, βi = 0, and |yi − βi| = |yi| ≤ λ

Trang 32

Example: distance to a convex set

Recall thedistance function to a closed, convex set C

dist(x, C ) = min

y ∈Cky − xk2.This is a convex function What are its subgradients?

Writedist(x, C ) =kx − PC(x)k2, where PC(x) is the projection of

x onto C It turns out that whendist(x, C ) > 0,

Trang 33

Recall thedistance function to a closed, convex set C

dist(x, C ) = min

y ∈Cky − xk2.This is a convex function What are its subgradients?

Writedist(x, C ) =kx − PC(x)k2, where PC(x) is the projection of

x onto C It turns out that whendist(x, C ) > 0,

Trang 34

We will only show one direction, i.e., that

Trang 35

Now for y 6∈ H, we have

(x− u)T(y − u) = kx − uk2ky − uk2cos θ where θ is the angle

between x− u and y − u Thus

26

Trang 36

References and further reading

S Boyd, Lecture notes for EE 264B, Stanford University,

Spring 2010-2011

R T Rockafellar (1970), Convex analysis, Chapters 23–25

L Vandenberghe, Lecture notes for EE 236C, UCLA, Spring2011-2012

27

Định dạng
Số trang	36
Dung lượng	747,96 KB