1. Trang chủ
  2. » Y Tế - Sức Khỏe

Bài giảng Tối ưu hóa nâng cao - Chương 8: Proximal gradient descent (and acceleration)

10 14 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 193,62 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội... We use pre-set rules for the step sizes (e.g., diminshing step sizes rule).[r]

Trang 1

Proximal Gradient Descent (and Acceleration)

Hoàng Nam Dũng

Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội

Trang 2

Last time: subgradient method

Consider the problem

min

x f(x) with f convex, anddom(f ) = Rn

x(k) = x(k−1)− tk · g(k−1), k = 1, 2, 3, where g(k−1)∈ ∂f (x(k−1)) We use pre-set rules for the step sizes (e.g., diminshing step sizes rule)

If f is Lipschitz, then subgradient method has a convergence rate O(1/ε2)

Upside: very generic Downside: can be slow — addressed today

Trang 3

Today

I Proximal gradient descent

I Convergence analysis

I ISTA, matrix completion

I Special cases

I Acceleration

2

Trang 4

Decomposable functions

Suppose

f(x) = g (x) + h(x) where

I g is convex, differentiable, dom(g ) = Rn

I h is convex, not necessarily differentiable

If f were differentiable, then gradient descent update would be

x+ = x− t · ∇f (x) Recall motivation: minimizequadratic approximationto f around

x , replace∇2f(x) by 1tI

x+= argminzf(x) +∇f (x)T(z − x) +2t1kz − xk22

˜

Trang 5

Decomposable functions

In our case f is not differentiable, but f = g + h, g differentiable Why don’t we makequadratic approximation to g , leave h alone? I.e., update

x+= argminzg˜t(z) + h(z)

= argminzg(x) +∇g(x)T(z − x) +2t1kz − xk2

2+ h(z)

= argminz 1

2tkz − (x − t∇g(x))k22+ h(z)

1

2tkz − (x − t∇g(x))k2

2 stay close to gradient update for g

4

Trang 6

Proximal mapping

defined as

proxh(x) = argminz1

2kx − zk2

2+ h(z)

Examples:

I h(x) = 0: proxh(x) = x

I h(x) is indicator function of a closed convex set C : proxh is the projection on C

proxh(x) = argminz∈C 1

2kx − zk2

2= PC(x)

I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage) operation

proxh(x)i =

xi − 1 xi ≥ 1

0 |xi| ≤ 1

xi + 1 xi ≤ −1

Trang 7

Proximal mapping

defined as

proxh(x) = argminz1

2kx − zk2

2+ h(z)

Examples:

I h(x) = 0: proxh(x) = x

I h(x) is indicator function of a closed convex set C : proxh is the projection on C

proxh(x) = argminz∈C 1

2kx − zk2

2= PC(x)

I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage) operation

proxh(x)i =

xi − 1 xi ≥ 1

0 |xi| ≤ 1

xi + 1 xi ≤ −1

5

Trang 8

Proximal mapping

defined as

proxh(x) = argminz1

2kx − zk2

2+ h(z)

Examples:

I h(x) = 0: proxh(x) = x

I h(x) is indicator function of a closed convex set C : proxh is

the projection on C

proxh(x) = argminz∈C 1

2kx − zk2

2= PC(x)

I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage) operation

proxh(x)i =

xi − 1 xi ≥ 1

0 |xi| ≤ 1

xi + 1 xi ≤ −1

Trang 9

Proximal mapping

defined as

proxh(x) = argminz1

2kx − zk2

2+ h(z)

Examples:

I h(x) = 0: proxh(x) = x

I h(x) is indicator function of a closed convex set C : proxh is the projection on C

proxh(x) = argminz∈C 1

2kx − zk2

2= PC(x)

I h(x) =kxk1:proxh is the ’soft-threshold’ (shrinkage)

operation

proxh(x)i =

xi − 1 xi ≥ 1

0 |xi| ≤ 1

xi + 1 xi ≤ −1 5

Trang 10

Proximal mapping

Theorem

If h is convex and closed (has closed epigraph) then

proxh(x) = argminz1

2kx − zk22+ h(z)

exists and is unique for all x

Chứng minh

proxop.pdf

Uniqueness since the objective function is strictly convex

Optimality condition

z = proxh(x)⇔ x − z ∈ ∂h(z)

⇔ h(u) ≥ h(z) + (x − z)T(u− z), ∀u

Ngày đăng: 09/03/2021, 04:46

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm