1. Trang chủ
  2. » Giáo Dục - Đào Tạo

co so du lieu nang cao do phuc bai 5 csdl dothi cuuduongthancong com (1)

46 6 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Mining, Indexing and Searching Graph Databases
Tác giả Jiawei Han, Vladimir Lipets
Trường học Unknown University
Chuyên ngành Graph Databases and Data Mining
Thể loại Lecture Presentation
Năm xuất bản 2010
Thành phố Unknown City
Định dạng
Số trang 46
Dung lượng 484,46 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Why Graph Mining and Searching?„ Graphs are ubiquitous „ Chemical compounds Cheminformatics „ Protein structures, biological pathways/networks Bioinformactics „ Program control flow, tra

Trang 1

Mining, Indexing and Searching

Graph Databases

Presenter: A/ Prof Do PhucSource: Jiawei Han , Vladimir Lipets

Trang 2

Graph, Graph, Everywhere

Aspirin Yeast protein interaction network

Trang 3

Why Graph Mining and Searching?

„ Graphs are ubiquitous

„ Chemical compounds (Cheminformatics)

„ Protein structures, biological pathways/networks (Bioinformactics)

„ Program control flow, traffic flow, and workflow analysis

„ XML databases, Web, and social network analysis

„ Graph is a general model

„ Trees, lattices, sequences, and items are degenerated graphs

Trang 4

„ Graph Isomorphism, Subgraph Isomorphism

„ Mining frequent graph patterns

„ Graph indexing methods

„ Similairty search in graph databases

„ Biological network analysis

Trang 5

„ Graph, Subgraph isomorphism is important and

very general form of pattern matching that finds practical application in areas such as:

„ pattern recognition and computer vision,

Trang 6

A hierarchy of pattern matching problems

„ Graph isomorphism

„ Approximate subgraph isomorphism

„ Graph edit distance

Trang 7

Isomorphic Graphs

Trang 8

Graph Isomorphism

Trang 9

Subgraph of a given graph

Trang 10

Subgraph Isomorphism

Trang 11

Subgraph Isomorphism and Related

Problems

„ Given a pattern graph G and a target graph H

„ Decision problem: Answer whether H contains a

subgraph isomorphic to G

„ Search problem: Return an occurrence of G as a

subgraph of H

„ Counting problem: Return a count of the number

of subgraphs of H that are isomorphic to G

„ Enumeration problem: Return all occurrences of

G as a subgraph of H

Trang 12

„ Graph Isomorphism, Subgraph Isomorphism

„ Mining frequent graph patterns

„ Graph indexing methods

„ Similairty search in graph databases

„ Biological network analysis

Trang 13

Graph Pattern Mining

„ Frequent subgraphs

„ A (sub)graph is frequent if its support (occurrence

frequency) in a given dataset is no less than a

minimum support threshold

„ Applications of graph pattern mining

„ Mining biochemical structures

„ Program control flow analysis

„ Mining XML structures or Web communities

„ Building blocks for graph classification, clustering,

comparison, and correlation analysis

Trang 14

Example: Frequent Subgraphs

S OH

O O

O N

O N

HO

O N

O N

O N

Trang 15

Frequent Subgraph Mining Approaches

„ Apriori-based approach

„ AGM/AcGM: Inokuchi, et al (PKDD’00)

„ FSG: Kuramochi and Karypis (ICDM’01)

„ PATH: Vanetik and Gudes (ICDM’02, ICDM’04)

„ FFSM: Huan, et al (ICDM’03)

„ Pattern growth-based approach

„ MoFa, Borgelt and Berthold (ICDM’02)

„ gSpan: Yan and Han (ICDM’02)

„ Gaston: Nijssen and Kok (KDD’04)

Trang 16

Properties of Graph Mining Algorithms

„ Search order

„ breadth vs depth

„ Generation of candidate subgraphs

„ apriori vs pattern growth

„ Elimination of duplicate subgraphs

„ passive vs active

„ Support calculation

„ embedding store or not

„ Discover order of patterns

„ path Æ tree Æ graph

Trang 17

„ Mining frequent graph patterns

„ Graph indexing methods

„ Similairty search in graph databases

„ Biological network analysis

Trang 18

Graph Search: Querying Graph Databases

„ Querying graph databases:

„ Given a graph database and a query graph, find all graphs containing this query graph

N N

OH O

N

O

N

OH O

S OH

S

HO O

O N

N O O

query graph graph database

Trang 19

S OH

S HO O O

N N O

O

OH O

Query graph

Trang 20

„ Index substructures of a query graph to prune graphs that do not contain these substructures

Trang 21

„ Two steps in processing graph queries

Step 1 Index Construction

database, build an inverted index between structures and graphs

Step 2 Query Processing

these structures

performing subgraph isomorphism test

Trang 22

„ Mining frequent graph patterns

„ Graph indexing methods

„ Similairty search in graph databases

„ Biological network analysis

„ Some recent progress on graph mining

Trang 23

Graph Clustering

„ Graph similarity measure

„ Feature-based similarity measure

„ Each graph is represented as a feature vector

„ The similarity is defined by the distance of their corresponding vectors

„ Frequent subgraphs can be used as features

„ Structure-based similarity measure

„ Maximal common subgraph

„ Graph edit distance: insertion, deletion, and relabel

Trang 24

Graph Classification

„ Local structure based approach

„ Local structures in a graph, e.g., neighbors

surrounding a vertex, paths with fixed length

„ Graph pattern-based approach

„ Subgraph patterns from domain knowledge

„ Subgraph patterns from data mining

„ Kernel-based approach

„ Random walk (Gärtner ’02, Kashima et al ’02,

ICML’03, Mahé et al ICML’04)

„ Optimal local assignment (Fröhlich et al

Trang 25

Structure Similarity Search

(a) caffeine (b) diurobromine (c) viagra

Trang 26

Some “Straightforward” Methods

„ Method1: Directly compute the similarity between the

graphs in the DB and the query graph

„ Sequential scan

„ Subgraph similarity computation

„ Method 2: Form a set of subgraph queries from the

original query graph and use the exact subgraph

search

„ Costly: If we allow 3 edges to be missed in a

20-edge query graph, it may generate 1,140 subgraphs

Trang 27

Index: Precise vs Approximate Search

„ Precise Search

„ Use frequent patterns as indexing features

„ Select features in the database space based on their selectivity

„ Build the index

„ Approximate Search

„ Hard to build indices covering similar subgraphs—

explosive number of subgraphs in databases

„ Idea: (1) keep the index structure

(2) select features in the query space

Trang 28

Substructure Similarity Measure

„ Query relaxation measure

„ The number of edges that can be relabeled or

missed; but the position of these edges are

not fixed

QUERY GRAPH

Trang 29

Substructure Similarity Measure

„ Feature-based similarity measure

„ Each graph is represented as a feature vector

X = {x1, x2, …, xn}

„ The similarity is defined by the distance of

their corresponding vectors

„ Advantages

„ Easy to index

„ Fast Rough measure

Trang 30

Query Processing Framework

„ Three steps in processing approximate graph

queries

Step 1 Index Construction

„ Select small structures as features in a graph database, and build the feature-graph matrix between the features

and the graphs in the database

Trang 31

Framework (cont.)

Step 2 Feature Miss Estimation

„ Determine the indexed features belonging

to the query graph

„ Calculate the upper bound of the number

of features that can be missed for an approximate matching, denoted by J

„ On the query graph, not the graph database

Trang 32

Framework (cont.)

Step 3 Query Processing

„ Use the feature-graph matrix to calculate the difference in the number

of features between graph G and query

Q, FG – FQ

„ If FG – FQ > J, discard G The remaining graphs constitute a candidate answer set

Trang 33

„ Mining frequent graph patterns

„ Graph indexing methods

„ Similairty search in graph databases

„ Biological network analysis

Trang 35

Data Mining Across Multiple Networks

a

b c

d e f

c e

f

j a

b c

d e

g

h

k f

i j

a

b

c e

f a

b d

j

Trang 36

Data Mining Across Multiple Networks

a

b c

d e f

c e

f

j a

b c

d e f

d e f

a

b c

j

Trang 37

Identify Frequent Co-expression Clusters

across Multiple Microarray Data Sets

d e f

d e f

a b c

d e f

d e f

d e f

Trang 38

CODENSE: Mine Coherent Dense Subgraphs

f a

b d

e g

h

i c

a

b d

summary graph Ĝ

f

a

b c

d e f

d e f

d e f

d e f

d

e g

Trang 39

(2) Identify dense subgraphs of the summary graph

dense subgraph in the summary graph However, the

reverse is not true

CODENSE: Mine Coherent Dense Subgraphs

Trang 40

d e f

d e f

a b c

d e f

d e f

g

h j

k i

a b c

d e f

Applying CoDense to 39 Yeast Microarray Data Sets

Trang 41

MRPL51

MRP49 YDR115W

PHB1

PET100

Discovery of New Genes Based on Similar Genes

Trang 42

Brown: YDR115W, FMC1, ATP12, MRPL37, MRPS18

MRPL32

ACN9

MRPL51 MRP49

YDR115W

PHB1

PET100 PET100

Network of Known Similar Genes

Trang 43

ACN9

MRPL51

MRP49 YDR115W

PHB1

PET100

Network Involved in the New Genes

Trang 44

„ Mining frequent graph patterns

„ Graph indexing methods

„ Similairty search in graph databases

„ Biological network analysis

Trang 45

„ Graph mining has wide applications

„ Frequent and closed subgraph mining methods

„ gSpan and CloseGraph: pattern-growth depth-first search

approach

„ Graph indexing techniques:

„ Frequent and discirminative subgraphs as indexing fatures

„ Similairty search in graph databases

„ Indexing and approximate matching help similar subgraph search

„ Biological network analysis

„ Mining coherent, dense, multiple biological networks

Many new developments along the line of graph pattern mining

Trang 46

Thanks and Questions

Ngày đăng: 16/12/2022, 22:43

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w