Bài giảng Tối ưu hóa nâng cao - Chương 8: Proximal gradient descent (and acceleration)

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội... We use pre-set rules for the step sizes (e.g., diminshing step sizes rule).[r]

Trang 1

Proximal Gradient Descent (and Acceleration)

Hoàng Nam Dũng

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội

Trang 2

Last time: subgradient method

Consider the problem

min

x f(x) with f convex, anddom(f ) = Rn

x(k) = x(k−1)− tk · g(k−1), k = 1, 2, 3, where g(k−1)∈ ∂f (x(k−1)) We use pre-set rules for the step sizes (e.g., diminshing step sizes rule)

If f is Lipschitz, then subgradient method has a convergence rate O(1/ε2)

Upside: very generic Downside: can be slow — addressed today

Trang 3

Today

I Proximal gradient descent

I Convergence analysis

I ISTA, matrix completion

I Special cases

I Acceleration

2

Trang 4

Decomposable functions

Suppose

f(x) = g (x) + h(x) where

I g is convex, differentiable, dom(g ) = Rn

I h is convex, not necessarily differentiable

If f were differentiable, then gradient descent update would be

x+ = x− t · ∇f (x) Recall motivation: minimizequadratic approximationto f around

x , replace∇2f(x) by 1tI

x+= argminzf(x) +∇f (x)T(z − x) +2t1kz − xk22

˜

Trang 5

Decomposable functions

In our case f is not differentiable, but f = g + h, g differentiable Why don’t we makequadratic approximation to g , leave h alone? I.e., update

x+= argminzg˜t(z) + h(z)

= argminzg(x) +∇g(x)T(z − x) +2t1kz − xk2

2+ h(z)

= argminz 1

2tkz − (x − t∇g(x))k22+ h(z)

1

2tkz − (x − t∇g(x))k2

2 stay close to gradient update for g

4

Trang 6

Proximal mapping

defined as

proxh(x) = argminz1

2kx − zk2

2+ h(z)

Examples:

I h(x) = 0: proxh(x) = x

I h(x) is indicator function of a closed convex set C : proxh is the projection on C

proxh(x) = argminz∈C 1

2kx − zk2

2= PC(x)

I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage) operation

proxh(x)i =





xi − 1 xi ≥ 1

0 |xi| ≤ 1

xi + 1 xi ≤ −1

Trang 7

Proximal mapping

defined as

proxh(x) = argminz1

2kx − zk2

2+ h(z)

Examples:

2kx − zk2

2= PC(x)

proxh(x)i =





xi − 1 xi ≥ 1

0 |xi| ≤ 1

xi + 1 xi ≤ −1

5

Trang 8

Proximal mapping

defined as

proxh(x) = argminz1

2kx − zk2

2+ h(z)

Examples:

I h(x) is indicator function of a closed convex set C : proxh is

the projection on C

2kx − zk2

2= PC(x)

proxh(x)i =





xi − 1 xi ≥ 1

0 |xi| ≤ 1

xi + 1 xi ≤ −1

Trang 9

Proximal mapping

defined as

proxh(x) = argminz1

2kx − zk2

2+ h(z)

Examples:

2kx − zk2

2= PC(x)

I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage)

operation

proxh(x)i =





xi − 1 xi ≥ 1

0 |xi| ≤ 1

xi + 1 xi ≤ −1 5

Trang 10

Proximal mapping

Theorem

If h is convex and closed (has closed epigraph) then

proxh(x) = argminz1

2kx − zk22+ h(z)

exists and is unique for all x

Chứng minh

proxop.pdf

Uniqueness since the objective function is strictly convex

Optimality condition

z = proxh(x)⇔ x − z ∈ ∂h(z)

⇔ h(u) ≥ h(z) + (x − z)T(u− z), ∀u

Định dạng
Số trang	10
Dung lượng	193,62 KB