Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội... Stop at some point...[r]
Trang 1Gradient Descent
Hoàng Nam Dũng
Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội
Trang 2Gradient descent
Consider unconstrained, smooth convex optimization
min
x f(x) with convex and differentiable function f : Rn
→ R Denote the optimal value by f∗ = minxf(x) and a solution by x∗
x(k)= x(k−1)− tk · ∇f (x(k−1)), k = 1, 2, 3, Stop at some point
Trang 3Gradient descent
Consider unconstrained, smooth convex optimization
min
x f(x) with convex and differentiable function f : Rn
→ R Denote the optimal value by f∗ = minxf(x) and a solution by x∗
x(k)= x(k−1)− tk· ∇f (x(k−1)), k = 1, 2, 3, Stop at some point
Trang 4●
●
●
●
Trang 5●
●
●
●
Trang 6Gradient descent interpretation
At each iteration, consider the expansion
f(y )≈ f (x) + ∇f (x)T(y− x) +2t1 ky − xk22
tI
f(x) +∇f (x)T(y− x) linear approximation to f
1
2tky − xk22 proximity term to x, with weight 1/2t
Choose next point y= x+to minimize quadratic approximation
x+= x− t∇f (x)
Trang 7Gradient descent interpretation
●
●
Trang 8I How to choose step sizes
I Convergence analysis
I Nonconvex functions
I Gradient boosting
Trang 9Fixed step size
Simply take tk = t for all k = 1, 2, 3, , candivergeif t is too big
Consider f(x) = (10x2+ x2)/2, gradient descent after 8 steps:
Fixed step size Simply take tk= t for all k = 1, 2, 3, , can diverge if t is too big Consider f (x) = (10x2
1+ x22)/2, gradient descent after 8 steps:
●
●
*
Trang 10Fixed step size
Can beslowif t is too small Same example, gradient descent after 100 steps:
Can be slow if t is too small Same example, gradient descent after
100 steps:
●
●
●
●
●
*