1. Trang chủ
  2. » Giáo án - Bài giảng

Bài giảng Tối ưu hóa nâng cao: Chương 6 - Hoàng Nam Dũng

36 48 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 36
Dung lượng 747,96 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Bài giảng Tối ưu hóa nâng cao - Chương 6: Subgradients cung cấp cho người học các kiến thức: Last time - gradient descent, subgradients, examples of subgradients, monotonicity, examples of non-subdifferentiable functions,... Mời các bạn cùng tham khảo.

Trang 1

Hoàng Nam Dũng

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội

Trang 2

Last time: gradient descent

Consider the problem

min

x f(x)for f convex and differentiable,dom(f ) =Rn

Gradient descent: choose initial x(0) ∈ Rn, repeat

Trang 4

Basic inequality

Recall that for convex and differentiable f ,

f(y )≥ f (x) + ∇f (x)T(y − x), ∀x, y ∈ dom(f )

Basic inequality

recall the basic inequality for differentiable convex functions:

f (y) ≥ f(x) + ∇f(x) T (y − x) ∀y ∈ dom f

• the first-order approximation of f at x is a global lower bound

• ∇f(x) defines a non-vertical supporting hyperplane to epi f at (x, f (x)) :



 x

Trang 5

Asubgradientof a convex function f at x is any g ∈ Rn such that

f(y )≥ f (x) + gT(y− x), ∀y ∈ dom(f )

I Always exists (on the relative interior of dom(f ))

I If f differentiable at x , then g =∇f (x) uniquely

I Same definition works for nonconvex f (however, subgradients

need not exist)

Subgradient

gis a subgradient of a convex functionf at x ∈ dom f if

Trang 6

Asubgradientof a convex function f at x is any g ∈ Rn such that

f(y )≥ f (x) + gT(y− x), ∀y ∈ dom(f )

I Always exists (on the relative interior of dom(f ))

I If f differentiable at x , then g =∇f (x) uniquely

I Same definition works for nonconvex f (however, subgradientsneed not exist)

Subgradient

gis a subgradient of a convex functionf at x ∈ dom f if

g 1 , g 2 are subgradients at x 1 ; g 3 is a subgradient at x 2

g1 and g2 are subgradients at x1, g3 is subgradient at x2 4

Trang 7

I For x 6= 0, unique subgradient g = sign(x)

I For x = 0, subgradient g is any element of [−1, 1]

5

Trang 8

I For x 6= 0, unique subgradient g = x

kxk2

I For x = 0, subgradient g is any element of {z : kzk2 ≤ 1} 6

Trang 9

I For xi 6= 0, unique ith component gi = sign(xi)

I For xi = 0, ith component gi is any element of[−1, 1]

7

Trang 10

I For f1(x) > f2(x), unique subgradient g =∇f1(x)

I For f2(x) > f1(x), unique subgradient g =∇f2(x)

I For f1(x) = f2(x), subgradient g is any point on line segment

Trang 11

Set of all subgradients of convex f is called thesubdifferential:

∂f (x) ={g ∈ Rn: g is a subgradient of f at x}

Properties:

I Nonempty for convex f at x ∈ int(domf )

I ∂f (x) is closed and convex (even for nonconvex f )

I If f is differentiable at x , then∂f (x) ={∇f (x)}

I If∂f (x) ={g}, then f is differentiable at x and ∇f (x) = g.Proof: Seehttp://www.seas.ucla.edu/~vandenbe/236C/lectures/subgradients.pdf

9

Trang 12

Set of all subgradients of convex f is called thesubdifferential:

∂f (x) ={g ∈ Rn: g is a subgradient of f at x}

Properties:

I Nonempty for convex f at x ∈ int(domf )

I ∂f (x) is closed and convex (even for nonconvex f )

Trang 13

f(y )≥ f (x) + uT(y− x) and f (x) ≥ f (y) + vT(x− y).

Combining them get shows monotonicity

Question: Monotonicity for differentiable convex function?

(∇f (x) − ∇f (y))T(x− y) ≥ 0,which follows directly from the first order characterization ofconvex functions

10

Trang 14

f(y )≥ f (x) + uT(y− x) and f (x) ≥ f (y) + vT(x− y).

Combining them get shows monotonicity

Question: Monotonicity for differentiable convex function?

(∇f (x) − ∇f (y))T(x− y) ≥ 0,which follows directly from the first order characterization ofconvex functions

10

Trang 15

f(y )≥ f (x) + uT(y− x) and f (x) ≥ f (y) + vT(x− y).

Combining them get shows monotonicity

Question: Monotonicity for differentiable convex function?

(∇f (x) − ∇f (y))T(x− y) ≥ 0,which follows directly from the first order characterization of

convex functions

10

Trang 16

Examples of non-subdifferentiable functions

The following functions are not subdifferentiable at x = 0

Trang 17

Connection to convex geometry

Convex set C ⊆ Rn, consider indicator function IC:Rn→ R,

Trang 19

Subgradient calculus

Basic rules for convex functions:

I Scaling:∂(af ) = a· ∂f provided a > 0

Trang 20

I Norms: important special case, f(x) =kxkp Let q be such

Trang 21

Why subgradients?

Subgradients are important for two reasons:

I Convex analysis: optimality characterization via subgradients,monotonicity, relationship to duality

I Convex optimization: if you can compute subgradients, thenyou can minimize any convex function

16

Trang 22

Optimality condition

Subgradient optimality condition: For any f (convex or not),

f(x∗) = min

x f(x)⇐⇒ 0 ∈ ∂f (x∗),i.e., x∗ is a minimizer if and only if 0 is a subgradient of f at x∗

Why? Easy: g = 0 being a subgradient means that for all y

f(y )≥ f (x∗) + 0T(y− x∗) = f (x∗)

Note the implication for a convex and differentiable function fwith∂f (x) ={∇f (x)}

17

Trang 23

Optimality condition

Subgradient optimality condition: For any f (convex or not),

f(x∗) = min

x f(x)⇐⇒ 0 ∈ ∂f (x∗),i.e., x∗ is a minimizer if and only if 0 is a subgradient of f at x∗.Why? Easy: g = 0 being a subgradient means that for all y

f(y )≥ f (x∗) + 0T(y− x∗) = f (x∗)

Note the implication for a convex and differentiable function f

with∂f (x) ={∇f (x)}

17

Trang 24

Derivation of first-order optimality

Example of the power of subgradients: we can use what we havelearned so far to derive thefirst-order optimality condition

Direct proof see, e.g., http://www.princeton.edu/~amirali/Public/

Teaching/ORF523/S16/ORF523_S16_Lec7_gh.pdf Proof using subgradient next slide.

Intuitively: says that gradient increases as we move away from x Note that for C =Rn (unconstrained case) it reduces to ∇f = 0 18

Trang 25

Derivation of first-order optimality

Trang 26

Example: lasso optimality conditions

Given y ∈ Rn, X ∈ Rn×p,lasso problem can be parametrized as

minβ

1

2ky − X βk22+ λkβk1whereλ≥ 0

[−1, 1] if βi = 0

20

Trang 27

Example: lasso optimality conditions

Given y ∈ Rn, X ∈ Rn×p,lasso problem can be parametrized as

minβ

1

2ky − X βk22+ λkβk1whereλ≥ 0 Subgradient optimality

0∈ ∂ 12ky − X βk22+ λkβk1



⇔ 0 ∈ −XT(y − X β) + λ∂ kβk1

⇔ XT(y− X β) = λvfor some v ∈ ∂ kβk1, i.e.,

[−1, 1] if βi = 0

20

Trang 28

Example: lasso optimality conditions

Write X1, , Xp for columns of X Then our condition reads

They are also helpful in understanding the lasso estimator; e.g., if

|XT

i (y − X β)| < λ, then βi = 0 (used by screening rules, later?)

21

Trang 29

Example: lasso optimality conditions

Write X1, , Xp for columns of X Then our condition reads

They are also helpful in understanding the lasso estimator; e.g., if

|XT

i (y− X β)| < λ, then βi = 0 (used by screening rules, later?)

21

Trang 30

Example: soft-thresholding

Simplfied lasso problem with X = I :

minβ

1

2ky − βk22+ λkβk1.This we can solve directly using subgradient optimality Solution is

β = Sλ(y ), where Sλ is thesoft-thresholding operator

yi − βi = λ· sign(βi) if βi 6= 0

|yi − βi| ≤ λ ifβi = 0

22

Trang 31

Example: soft-thresholding

Now plug inβ = Sλ(y ) and check these are satisfied:

I When yi > λ, βi = yi − λ > 0, so yi − βi = λ = λ· 1

I When yi <−λ, argument is similar

I When|yi| ≤ λ, βi = 0, and |yi − βi| = |yi| ≤ λ

Trang 32

Example: distance to a convex set

Recall thedistance function to a closed, convex set C

dist(x, C ) = min

y ∈Cky − xk2.This is a convex function What are its subgradients?

Writedist(x, C ) =kx − PC(x)k2, where PC(x) is the projection of

x onto C It turns out that whendist(x, C ) > 0,

Trang 33

Example: distance to a convex set

Recall thedistance function to a closed, convex set C

dist(x, C ) = min

y ∈Cky − xk2.This is a convex function What are its subgradients?

Writedist(x, C ) =kx − PC(x)k2, where PC(x) is the projection of

x onto C It turns out that whendist(x, C ) > 0,

Trang 34

Example: distance to a convex set

We will only show one direction, i.e., that

Trang 35

Example: distance to a convex set

Now for y 6∈ H, we have

(x− u)T(y − u) = kx − uk2ky − uk2cos θ where θ is the angle

between x− u and y − u Thus

26

Trang 36

References and further reading

S Boyd, Lecture notes for EE 264B, Stanford University,

Spring 2010-2011

R T Rockafellar (1970), Convex analysis, Chapters 23–25

L Vandenberghe, Lecture notes for EE 236C, UCLA, Spring2011-2012

27

Ngày đăng: 16/05/2020, 01:15

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w