1. Trang chủ
  2. » Giáo án - Bài giảng

introduction to kernel methods.

19 272 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 19
Dung lượng 918,41 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Learning Kernels -Tutorial Part I: Introduction to Kernel Methods... Binary Classification ProblemTraining data: sample drawn i.i.d.. Optimization Problem Constrained optimization: Prope

Trang 1

Learning Kernels -Tutorial

Part I: Introduction to Kernel Methods.

Trang 2

Outline Part I: Introduction to kernel methods Part II: Learning kernel algorithms

Part III: Theoretical guarantees

Part IV: Software tools

Trang 3

Binary Classification Problem

Training data: sample drawn i.i.d from set

according to some distribution ,

Problem: find hypothesis in

(classifier) with small generalization error

Linear classification:

• Hypotheses based on hyperplanes

• Linear separation in high-dimensional space

D

Trang 4

Linear Separation

Classifiers: .H = {x�→sgn(w · x + b): w ∈ RN, b ∈ R}

w·x+b=0

w·x+b=0

Trang 5

Optimal Hyperplane: Max Margin

Canonical hyperplane: and chosen such that for closest points

Margin:

margin

(Vapnik and Chervonenkis, 1964)

ρ =min

x ∈S

|w·x+b|

�w�

w·x+b=+1 w·x+b=−1

w·x+b=0

|w·x + b|=1

Trang 6

Optimization Problem Constrained optimization:

Properties:

• Convex optimization (strictly convex)

• Unique solution for linearly separable sample

min

w,b

1

Trang 7

Support Vector Machines

Problem: data often not linearly separable in

practice For any hyperplane, there exists such

that

Idea: relax constraints using slack variables

(Cortes and Vapnik, 1995)

xi

Trang 8

Support vectors: points along the margin or outliers.

Soft margin:

Soft-Margin Hyperplanes

ξ i

ξj

w·x+b=+1 w·x+b=−1

w·x+b=0

Trang 9

Optimization Problem Constrained optimization:

Properties:

• trade-off parameter

• Convex optimization (strictly convex)

• Unique solution

min

w,b,ξ

1

m

i=1

ξi

(Cortes and Vapnik, 1995)

Trang 10

Dual Optimization Problem Constrained optimization:

Solution:

for any SV

i=1

αiyi(xi · x) + b�,

m

with

max

α

m

i=1

2

m

i,j=1

αiαjyiyj(xi · xj)

m

i=1

Trang 11

Kernel Methods

Idea:

• Define , called kernel, such that:

• often interpreted as a similarity measure

Benefits:

• Efficiency: is often more efficient to compute

than and the dot product

• Flexibility: can be chosen arbitrarily so long as

the existence of is guaranteed (Mercer’s

condition)

Φ(x) · Φ(y) = K(x, y).

K

K Φ

K

Φ

Trang 12

Example - Polynomial Kernels Definition:

Example: for and ,N =2 d =2

=

x21

x22

2 x1x2

·

y12

y22

2 y1y2

√ 2c y1

√ 2c y2

Trang 13

(1, 1,−√2, +√2,−√2, 1)

XOR Problem

Use second-degree polynomial kernel with :

x 1

x 2

(1, 1)

(-1, -1)

(-1, 1)

(1, -1)

√2 x 1x2

√2 x1

Linearly non-separable Linearly separable by

(1 , 1 , −

2 , −

2 , +√2 , 1)

(1 , 1 , +√2 , −

2 , −

2 , 1) (1, 1, +√2, +√2, +√2, 1)

c = 1

Trang 14

Other Standard PDS Kernels

Gaussian kernels:

Sigmoid Kernels:

K(x, y) = exp

2

Trang 15

Consequence: SVMs with PDS Kernels

Constrained optimization:

Solution:

0<α

(Boser, Guyon, and Vapnik, 1992)

max

α

m

i=1

2

m

i,j=1

αiαjyiyjK(xi, xj)

m

i=1

i=1

,

m

Trang 16

SVMs with PDS Kernels Constrained optimization:

Solution:

0<αi < C.

xi

i=1

αiyiK(xi, ·) + b�,

max

Trang 17

Regression Problem

Training data: sample drawn i.i.d from set

according to some distribution ,

Loss function: a measure of closeness, typically or for some

Problem: find hypothesis in with small

generalization error with respect to target H

D

X

with is a measurable subset.Y ⊆R

f

.

Trang 18

Kernel Ridge Regression Optimization problem:

Solution:

or max

max

h(x) =

m

i=1

αiK(xi, x),

(Saunders et al., 1998)

Trang 19

How should the user choose the kernel?

• problem similar to that of selecting features for

other learning algorithms

• poor choice learning made very difficult

• good choice even poor learners could

succeed

The requirement from the user is thus critical

• can this requirement be lessened?

• is a more automatic selection of features

possible?

Ngày đăng: 24/04/2014, 13:07

TỪ KHÓA LIÊN QUAN