Keyword Search in Databases- P18 doc

TheLCAof nodes v1and v2is denoted as lcav1, v2... of the works in the literature, there exists an inverted index of Dewey IDs for each keyword.. Note that lists In the following, we use

Trang 1

0.1 0.0

0.0.0 0.1.0 0.1.1

0.1.0.0 0.1.1.0

0.1.0.0.0 0.1.1.0.0 0.1.1.1.0

0.1.1.1 0.1.1.2

0.1.1.2.0 0.1.2.0.0

0.1.2.0

Dean

Instructor

John

Title

CS2A

Instructor

John

TA

Ben

Classes

0.1.2

0.1.2.1 0.1.2.2

0.1.2.2.0 0.1.2.1.0

Instructor Students

Class

Title

CS3A Ben John

0.1.3 0.1.4

0.1.4.0

0.1.4.0.0 0.1.3.0

0.1.3.0.0

Class

Title Title Class

CS5A CS4A

0.2

0.2.0

0.2.0.0

0.2.0.0.0 0.2.0.0.1

0.3

0.3.0 0.3.0

0.3.0.0 0.3.1.0

0.3.0.0.0 0.3.1.0.0

Participants Participants

Participants Autonet

Figure 4.1: Example XML documents [Xu and Papakonstantinou,2005]

• A node u is a sibling of node v if and only if pre(u) differs from pre(v) only in the last

component For example, 0.1.1.0 (Title) and 0.1.1.1 (Instructor) are sibling nodes, but 0.1.1 (Class) and 0.1.1.1 (Instructor) are not sibling nodes

• A node u is an ancestor of another node v if and only if pre(u) is a prefix of pre(v) For

example, 0.1 (Classes) is an ancestor of 0.1.2.0.0 ( John)

true

4.1.1 LCA, SLCA, ELCA, AND CLCA

[Guo et al.,2003], andCLCA[Li et al.,2007a], which are the basis of semantics of answer defini-tions

Definition 4.1 Lowest Common Ancestor (LCA) For any two nodes v1and v2, u is theLCA

of v1 and v2 if and only if: (1) u ≺ v1 and u ≺ v2, (2) for any u, if u ≺ v1 and u≺ v2, then

u  u TheLCAof nodes v1and v2is denoted as lca(v1, v2) Note that lca(v1, v2)is the same as

lca(v2, v1)

Property 4.2 Given any three nodes v2, v1, v where v2< v1< v , lca(v, v2) lca(v, v1) Given

any three nodes v, v1, v2where v < v1< v2, lca(v, v2) lca(v, v1)

Trang 2

x2

x3

u1

(a) lca(u1, u2)

lca(v1, v2)

x1

x2

x3

u2

v2

u1

v1

(b) lca(v1, v2)≺

lca(u1, u2)

x1

x3 x5 w2 w1 u2 u1 v2 v1

(c) lca(v1, v2) < ⊀ lca(u1, u2)

Figure 4.2: Different situations of lca(v1, v2) and lca(u1, u2)

Property 4.3 Given any two pairs of nodes (v1, v2) and (u1, u2) , with v1≤ u1 and v2≤ u2,

without loss of generality, we can assume that v1< v2 and u1< u2 Let lca(v1, v2)and lca(u1, u2)

1 if lca(v1, v2) ≥ lca(u1, u2) , then lca(u1, u2) lca(v1, v2), as shown in Figure 4.2(a),

2 if lca(v1, v2) < lca(u1, u2), then

• either lca(v1, v2) ≺ lca(u1, u2), as shown in Figure 4.2(b),

• or lca(v1, v2) ⊀ lca(u1, u2) , in which case for any w1, w2with u1≤ w1and u2 ≤ w2,

lca(v1, v2) ⊀ lca(w1, w2), as shown in Figure 4.2(c)

ofLCAfor more than two nodes Let lca(v1, · · · , v l )denote theLCAof nodes v1, · · · , v l, where

lca(v1, · · · , v l ) = lca(lca(v1, · · · , v l−1), v l ) for l > 2 TheLCAof sets of nodes, S1, · · · , S l, is

lca(S1, · · · , S l ) = {lca(v1, · · · , v l ) | v1 ∈ S1, · · · , v l ∈ S l}

{0, 0.1, 0.1.1, 0.1.2, 0.2, 0.2.0, 0.2.0.0}

Definition 4.4 Smallest LCA (SLCA) TheSLCAof l sets S1,· · · , S lis defined to be

slca(S1, · · · , S l ) = {v ∈ lca(S1, · · · , S l ) | ∀v∈ lca(S1, · · · , S l ), v ⊀ v}.

Trang 3

Intuitively, it is the set of nodes in lca(S1, · · · , S l ) such that none of their descendants is in

lca(S1, · · · , S l )

A node v is called a SLCA of S1, · · · , S l if v ∈ slca(S1, · · · , S l ) Note that a node in

slca(S1, · · · , S l ) can not be an ancestor of any other node in slca(S1, · · · , S l ) Continuing the

0.1.2 (Class), and 0.2.0.0 (Participants)

Definition 4.5 Exclusive LCA (ELCA) TheELCAof l sets S1, · · · , S lis defined to be

elca(S1, · · · , S l )= {u | ∃v1∈ S1,· · · , v l ∈ S l , (u = lca(v1,· · · , v l )∧

∀i ∈ [1, l], x(x ∈ lca(S1, · · · , S l ) ∧ child(u, v i ) x))}

where child(u, v i ) is the child of u in the path from u to v i

A node u is called anELCAof l sets S1, · · · , S l if u ∈ elca(S1, · · · , S l ), i.e., if and only if

there exist l nodes v1∈ S1, · · · , v l ∈ S l , such that u = lca(v1, · · · , v l ) , and for every v i (1≤ i ≤ l)

lca(S1, · · · , S l ) The node v iis called anELCAwitness node of u in S i Note that, the witness node

elca(S1, S2) = {0, 0.1.1, 0.1.2, 0.2.0.0}, the node 0.0.0 is anELCAwitness node of the node 0 in

Definition 4.6 Compact LCA (CLCA) Given l nodes, v1∈ S1,· · · , v l ∈ S l , u=

lca(v1, · · · , v l ) u is said to dominate v i if u = slca(S1,· · · , S i−1, {v i }, · · · , S l ) u is a CLCA

slca(S1, · · · , S l )andELCAnodes elca(S1,· · · , S l ), respectively Actually, the set ofCLCAnodes

Theorem 4.7 Given l nodes, v1 ∈ S1, · · · , v l ∈ S l , u = lca(v1, · · · , v l ) is aCLCAwith respect to

v1, · · · , v l , if and only if u ∈ elca(S1, · · · , S l ) with v1 , · · · , v l as witness nodes.

Proof First, we prove ⇒ by contradiction.Let u be aCLCAw.r.t v1,· · · , v l Assume that u is not an

ELCAwith v1,· · · , v l as witness nodes, then there must exist a i ∈ [1, l] and a x ∈ lca(S1,· · · , S l ),

with child(u, v i ) x Then child(u, v i ) slca(S1, · · · , S i−1, {v i }, · · · , S l ), which means that

Trang 4

Let u be an ELCA with witness nodes v1, · · · , v l Assume that u is not a CLCA with

slca(S1, · · · , S i−1, {v i }, S i+1, · · · , S l ) Then child(u, v i ) slca(S1,· · · , {v i }, · · · , S l ), which is

Theorem 4.8 [ Xu and Papakonstantinou , 2008 ] The relationship between LCA nodes,SLCAnodes, andELCAnodes, of l sets S1 , · · · , S l , is slca(S1 , · · · , S l ) ⊆ elca(S1, · · · , S l ) ⊆ lca(S1, · · · , S l ).

Given a list of l keywords Q = {k1, · · · , k l }, and an input XML tree T , the problem is to find a set

root node of the subtree, and M are match nodes; it should have at least one match node for each

(assume that M = v1,· · · , v m)

of the works in the literature, there exists an inverted index of Dewey IDs for each keyword Using the inverted index, for an l-keyword query, it is possible to get l lists S1, · · · , S l Each S i (1≤ i ≤ l)

of S i’s, i.e.,|S| = max1≤i≤l |S i | The algorithms work on the l lists S1, · · · , S l Below, we also use

slca(Q) and elca(Q) to denote slca(S1, · · · , S l ) and elca(S1, · · · , S l ), respectively Note that lists

In the following, we use d to denote the height of the XML tree, i.e., d is the maximum length

of all the Dewey IDs of the nodes in the XML tree Given two nodes u and v with their Dewey IDs,

we can find lca(u, v) in time O(d), based on the fact that lca(u, v) has a Dewey ID that is equal to the longest common prefix of pre(u) and pre(v) Note that lca(u, v) exists for any two nodes in

⊥ denotes a null node (value) Note that the preorder and postorder relationships between u and ⊥

are not defined

We first discuss some primitive functions used by the algorithms that we will present later

Assume that each set S is sorted in increasing order of Dewey ID.

Dewey ID that is less than or equal to pre(v), i.e lm(v, S)= arg maxu ∈S:u≤v pre(u) It returns

Trang 5

⊥, when there is no left match node The cost of the function is O(d log |S|), and it can be

time to compare two Dewey IDs.

Dewey ID that is greater than or equal to pre(v), i.e rm(v, S)= arg minu ∈S:u≥v pre(u) It

removeAncest or(S) = {v ∈ S | u ∈ S : v ≺ u} The cost of removeAncestor is O(d|S|), since S is sorted in increasing Dewey ID order.

With the Dewey IDs, comparing two nodes takes O(d) time, and computing lca of two nodes also takes O(d) time Note that there exists another encoding for XML tree, called interval encoding,

a preorder traversal, end is the largest start value among the nodes in the subtree rooted at that node, and level is the level of the node in XML tree Using interval encoding, comparing two nodes

parent of v for two nodes u and v But most of the works in the literature use only Dewey ID to encode nodes, so in the following, we only consider the Dewey ID encode, where comparing two nodes takes O(d) time.

4.2 SLCA-BASED SEMANTICS

an entity in the world If u is an ancestor of v, then we may understand that the entity represented by

(Class) belongs to the entity represented by 0 (School) For a keyword query, it is more desirable to return the most specific entities that contain all the keywords, i.e., among all the returned entities,

there should not exist any ancestor-descendant relationship between the root nodes t that represent

entities

In this section, we first show some properties of the slca function, which is essential for

efficient algorithms Then three efficient algorithms with different characteristics are shown to

Định dạng
Số trang	5
Dung lượng	133,15 KB