Bài 8 Slide Unsupervised Learning: K‐Means Gaussian Mixture Models

Bài 8 Slide Unsupervised Learning: K‐Means Gaussian Mixture Models. Unsupervised Learning K ‐Means Gaussian Mixture Models Unsupervised Learning K ‐Means Gaussian Mixture Models Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a.

Trang 1

Unsupervised Learning:

K- ‐Means & Gaussian Mixture Models

Trang 2

Unsupervised Learning

• Supervised learning used labeled data pairs (x, y)

to learn a function f : X→Y

– But, what if we don’t have labels?

• No labels = unsupervised learning

• Only some points are labeled = semi- supervised ‐ learning

– Labels may be expensive to obtain, so we only get a few

knowledge discovery.

Trang 3

K- ‐Means Clustering

Trang 4

Clustering Data

Trang 5

K- ‐Means ( k , X )

• Randomly choose k cluster center locations

(centroids)

• Loop until convergence

• Assign each point to the cluster of the closest centroid

• Re- estimate ‐ the cluster centroids based on the data assigned to each cluster

Trang 6

K- ‐Means ( k , X )

• Randomly choose k cluster center locations (centroids)

Trang 7

K- ‐Means ( k , X )

• Randomly choose k cluster center locations (centroids)

Trang 8

K- ‐Means Animation

Example generated by Andrew Moore using Dan Pelleg’s super- duper fast K-means system:

Dan Pelleg and Andrew Moore Accelerating Exact k-means Algorithms with Geometric Reasoning.

Proc Conference on Knowledge Discovery in Databases 1999.

Trang 9

K- ‐Means Objective Function

• K- ‐means fnds a local optimum of the following objective function:

Trang 10

Problems with K- Means ‐

– Do many runs of K- Means, ‐ each with diﬀerent initial centroids

– Seed the centroids using a better method than randomly choosing the centroids

• e.g., Farthest- frst ‐ sampling

• Must manually choose k

– Learn the optimal k for the clustering

• Note that this requires a performance measure

Trang 11

• How do you tell it which clustering you want?

Problems with K- Means ‐

k = 2

Constrained clustering techniques (semi- supervised) ‐

Same- ‐cluster constraint (must- link) ‐ Diﬀerent- cluster ‐ constraint (cannot- link) ‐

Trang 12

Gaussian Mixture Models

• Recall the Gaussian distribution:

Trang 14

• Each component generates data from a

Gaussian with mean µi and covariance matrix

σ2I

Assume that each datapoint is generated

according to the following recipe:

µ1

µ2

µ3

Trang 15

σ2I

1 Pick a component at random Choose

component i with probability P(ωi).

µ2

Trang 16

σ2I

1. Pick a component at random Choose

2. Datapoint ~ N(µi, σ2I )

µ2

x

Trang 17

The General GMM assumption

Gaussian with mean µi and covariance matrix Σi

1. Pick a component at random Choose

2. Datapoint ~ N(µi , Σi )

Trang 18

Fitting a Gaussian Mixture Model

(Optional)

Trang 19

Just evaluate a Gaussian at xk

Expectation-Maximization for GMMs

Iterate until convergence:

On the t’th iteration let our estimates be

λt = { µ1(t), µ2(t) … µc(t) }E-step: Compute “expected” classes of all datapoints for each class

Trang 20

E.M for General GMMs

Iterate On the t’th iteration let our estimates be

M-step: Estimate µ, Σ given our data’s class membership distributions

pi(t) is shorthand for estimate of P( ωi) on t’th iteration

k i

Trang 21

(End optional section)

Trang 23

After first iteration

Trang 24

After 2nd iteration

Trang 25

After 3rd iteration

Trang 26

After 4th iteration

Trang 27

After 5th iteration

Trang 28

After 6th iteration

Trang 29

After 20th iteration

Trang 30

Some Bio Assay data

Trang 31

clustering of the assay data

Trang 32

Resulting Density Estimator

Định dạng
Số trang	32
Dung lượng	675,57 KB

Bài 8 Slide Unsupervised Learning: K­‐Means Gaussian Mixture Models

Bài 8 Slide Unsupervised Learning: K‐Means Gaussian Mixture Models