1. Trang chủ
  2. » Công Nghệ Thông Tin

Machine Learning Clustering

30 399 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 30
Dung lượng 772,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

What is clustering • Clustering can be considered the most important unsupervised learning problem; • An other definition of clustering could be “the process of organizing object

Trang 2

What is clustering

• Clustering can be considered the most

important unsupervised learning problem;

• An other definition of clustering could be “the

process of organizing objects into groups whose members are similar”

Trang 3

What is clustering

A cluster is therefore a collection of objects which

are “similar” between them and are “dissimilar” to the objects belonging to other clusters

Trang 4

What is clustering

• In this case we identify the 4 clusters into which the

data can be divided;

the similarity criterion is distance:

• two or more objects belong to the same cluster if

they are “close” according to a given distance

(called distance-based clustering.)

Trang 5

What is clustering

Another kind of clustering is conceptual clustering:

two or more objects belong to the same cluster if this

one defines a concept common to all that objects.

• In other words, objects are grouped according to

their fit to descriptive concepts, not according to

simple similarity measures

Trang 7

Marketing: finding groups of customers with

similar behavior given a large database of

customer data containing their properties and past

Trang 8

City-planning: identifying groups of houses

according to their house type, value and

geographical location;

Earthquake studies: clustering observed

earthquake to identify dangerous zones;

WWW: document classification; clustering weblog

data to discover groups of similar access patterns

8

Trang 9

• dealing with large number of dimensions

and large number of data items.

• the effectiveness of the method depends on

the definition of “distance” (for

distance-based clustering);

9

Trang 10

Classification of clustering algorithm

Trang 11

Classification of clustering algorithm

• four of the most used clustering algorithms:

Trang 12

K-Means Algorithm Properties

– There are always K clusters.

– There is always at least one item in each cluster.– The clusters are non-hierarchical and they do

not overlap

– Every member of a cluster is closer to its

cluster than any other cluster

12

Trang 13

• Assumes instances are real-valued vectors.

• Clusters based on centroids , or mean of

points in a cluster, c:

• Reassignment of instances to clusters is

based on distance to the current cluster

μ

Trang 15

K-Means

Let d be the distance measure between instances.

Select k random instances {s1, s2,… s k} as seeds

Until clustering converges or other stopping criterion:

For each instance x i:

Assign x i to the cluster c j such that d(x i , sj) is minimal (Update the seeds to the centroid of each cluster)

For each cluster c j

sj = µ(c j)

Trang 16

16

Trang 17

K Means Example

(K=2)

Pick seeds Reassign clusters Compute centroids

Trang 18

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10 0

Trang 20

Hierarchical Clustering

• Start by assigning each item to a cluster, so

that if you have N items.

• Find the closest (most similar) pair of

clusters and merge them into a single

cluster, so that now you have one cluster

less.

• Compute distances (similarities) between

the new cluster and each of the old clusters.

• Repeat steps 2 and 3 until all items are

clustered into a single cluster of size N (*) 20

Trang 22

Hierarchical Clustering

• The nearest pair of cities is MI and TO, at

distance 138 These are merged into a single cluster called "MI/TO" The level of the

new cluster is L(MI/TO) = 138 and the new sequence number is m = 1.

22

Trang 24

Hierarchical Clustering

24

• min d(i,j) = d(NA,RM) = 219 => merge NA

and RM into a new cluster called NA/RM

L(NA/RM) = 219

m = 2

Trang 26

Hierarchical Clustering

• min d(i,j) = d(BA,NA/RM) = 255 => merge

BA and NA/RM into a new cluster called

BA/NA/RM

L(BA/NA/RM) = 255

m = 3

26

Trang 27

Hierarchical Clustering

27

BA/NA/R

M FI MI/TO BA/NA/R

M 0 268 564

FI 268 0 295

MI/TO 564 295 0

Trang 28

Hierarchical Clustering

• min d(i,j) = d(BA/NA/RM,FI) = 268 =>

merge BA/NA/RM and FI into a new

cluster called BA/FI/NA/RM

L(BA/FI/NA/RM) = 268

m = 4

28

Trang 30

Hierarchical Clustering

• Finally, we merge the last two clusters at

level 295.

30

Ngày đăng: 03/07/2015, 15:27

TỪ KHÓA LIÊN QUAN

w