TheLCAof nodes v1and v2is denoted as lcav1, v2... of the works in the literature, there exists an inverted index of Dewey IDs for each keyword.. Note that lists In the following, we use
Trang 10.1 0.0
0.0.0 0.1.0 0.1.1
0.1.0.0 0.1.1.0
0.1.0.0.0 0.1.1.0.0 0.1.1.1.0
0.1.1.1 0.1.1.2
0.1.1.2.0 0.1.2.0.0
0.1.2.0
Dean
Instructor
John
Title
CS2A
Instructor
John
TA
Ben
Classes
0.1.2
0.1.2.1 0.1.2.2
0.1.2.2.0 0.1.2.1.0
Instructor Students
Class
Title
CS3A Ben John
0.1.3 0.1.4
0.1.4.0
0.1.4.0.0 0.1.3.0
0.1.3.0.0
Class
Title Title Class
CS5A CS4A
0.2
0.2.0
0.2.0.0
0.2.0.0.0 0.2.0.0.1
0.3
0.3.0 0.3.0
0.3.0.0 0.3.1.0
0.3.0.0.0 0.3.1.0.0
Participants Participants
Participants Autonet
Figure 4.1: Example XML documents [Xu and Papakonstantinou,2005]
• A node u is a sibling of node v if and only if pre(u) differs from pre(v) only in the last
component For example, 0.1.1.0 (Title) and 0.1.1.1 (Instructor) are sibling nodes, but 0.1.1 (Class) and 0.1.1.1 (Instructor) are not sibling nodes
• A node u is an ancestor of another node v if and only if pre(u) is a prefix of pre(v) For
example, 0.1 (Classes) is an ancestor of 0.1.2.0.0 ( John)
true
4.1.1 LCA, SLCA, ELCA, AND CLCA
[Guo et al.,2003], andCLCA[Li et al.,2007a], which are the basis of semantics of answer defini-tions
Definition 4.1 Lowest Common Ancestor (LCA) For any two nodes v1and v2, u is theLCA
of v1 and v2 if and only if: (1) u ≺ v1 and u ≺ v2, (2) for any u, if u ≺ v1 and u≺ v2, then
u u TheLCAof nodes v1and v2is denoted as lca(v1, v2) Note that lca(v1, v2)is the same as
lca(v2, v1)
Property 4.2 Given any three nodes v2, v1, v where v2< v1< v , lca(v, v2) lca(v, v1) Given
any three nodes v, v1, v2where v < v1< v2, lca(v, v2) lca(v, v1)
Trang 2x2
x3
u1
(a) lca(u1, u2)
lca(v1, v2)
x1
x2
x3
u2
v2
u1
v1
(b) lca(v1, v2)≺
lca(u1, u2)
x1
x3 x5 w2 w1 u2 u1 v2 v1
(c) lca(v1, v2) < ⊀ lca(u1, u2)
Figure 4.2: Different situations of lca(v1, v2) and lca(u1, u2)
Property 4.3 Given any two pairs of nodes (v1, v2) and (u1, u2) , with v1≤ u1 and v2≤ u2,
without loss of generality, we can assume that v1< v2 and u1< u2 Let lca(v1, v2)and lca(u1, u2)
1 if lca(v1, v2) ≥ lca(u1, u2) , then lca(u1, u2) lca(v1, v2), as shown in Figure 4.2(a),
2 if lca(v1, v2) < lca(u1, u2), then
• either lca(v1, v2) ≺ lca(u1, u2), as shown in Figure 4.2(b),
• or lca(v1, v2) ⊀ lca(u1, u2) , in which case for any w1, w2with u1≤ w1and u2 ≤ w2,
lca(v1, v2) ⊀ lca(w1, w2), as shown in Figure 4.2(c)
ofLCAfor more than two nodes Let lca(v1, · · · , v l )denote theLCAof nodes v1, · · · , v l, where
lca(v1, · · · , v l ) = lca(lca(v1, · · · , v l−1), v l ) for l > 2 TheLCAof sets of nodes, S1, · · · , S l, is
lca(S1, · · · , S l ) = {lca(v1, · · · , v l ) | v1 ∈ S1, · · · , v l ∈ S l}
{0, 0.1, 0.1.1, 0.1.2, 0.2, 0.2.0, 0.2.0.0}
Definition 4.4 Smallest LCA (SLCA) TheSLCAof l sets S1,· · · , S lis defined to be
slca(S1, · · · , S l ) = {v ∈ lca(S1, · · · , S l ) | ∀v∈ lca(S1, · · · , S l ), v ⊀ v}.
Trang 3Intuitively, it is the set of nodes in lca(S1, · · · , S l ) such that none of their descendants is in
lca(S1, · · · , S l )
A node v is called a SLCA of S1, · · · , S l if v ∈ slca(S1, · · · , S l ) Note that a node in
slca(S1, · · · , S l ) can not be an ancestor of any other node in slca(S1, · · · , S l ) Continuing the
0.1.2 (Class), and 0.2.0.0 (Participants)
Definition 4.5 Exclusive LCA (ELCA) TheELCAof l sets S1, · · · , S lis defined to be
elca(S1, · · · , S l )= {u | ∃v1∈ S1,· · · , v l ∈ S l , (u = lca(v1,· · · , v l )∧
∀i ∈ [1, l], x(x ∈ lca(S1, · · · , S l ) ∧ child(u, v i ) x))}
where child(u, v i ) is the child of u in the path from u to v i
A node u is called anELCAof l sets S1, · · · , S l if u ∈ elca(S1, · · · , S l ), i.e., if and only if
there exist l nodes v1∈ S1, · · · , v l ∈ S l , such that u = lca(v1, · · · , v l ) , and for every v i (1≤ i ≤ l)
lca(S1, · · · , S l ) The node v iis called anELCAwitness node of u in S i Note that, the witness node
elca(S1, S2) = {0, 0.1.1, 0.1.2, 0.2.0.0}, the node 0.0.0 is anELCAwitness node of the node 0 in
Definition 4.6 Compact LCA (CLCA) Given l nodes, v1∈ S1,· · · , v l ∈ S l , u=
lca(v1, · · · , v l ) u is said to dominate v i if u = slca(S1,· · · , S i−1, {v i }, · · · , S l ) u is a CLCA
slca(S1, · · · , S l )andELCAnodes elca(S1,· · · , S l ), respectively Actually, the set ofCLCAnodes
Theorem 4.7 Given l nodes, v1 ∈ S1, · · · , v l ∈ S l , u = lca(v1, · · · , v l ) is aCLCAwith respect to
v1, · · · , v l , if and only if u ∈ elca(S1, · · · , S l ) with v1 , · · · , v l as witness nodes.
Proof First, we prove ⇒ by contradiction.Let u be aCLCAw.r.t v1,· · · , v l Assume that u is not an
ELCAwith v1,· · · , v l as witness nodes, then there must exist a i ∈ [1, l] and a x ∈ lca(S1,· · · , S l ),
with child(u, v i ) x Then child(u, v i ) slca(S1, · · · , S i−1, {v i }, · · · , S l ), which means that
Trang 4Let u be an ELCA with witness nodes v1, · · · , v l Assume that u is not a CLCA with
slca(S1, · · · , S i−1, {v i }, S i+1, · · · , S l ) Then child(u, v i ) slca(S1,· · · , {v i }, · · · , S l ), which is
Theorem 4.8 [ Xu and Papakonstantinou , 2008 ] The relationship between LCA nodes,SLCAnodes, andELCAnodes, of l sets S1 , · · · , S l , is slca(S1 , · · · , S l ) ⊆ elca(S1, · · · , S l ) ⊆ lca(S1, · · · , S l ).
Given a list of l keywords Q = {k1, · · · , k l }, and an input XML tree T , the problem is to find a set
root node of the subtree, and M are match nodes; it should have at least one match node for each
(assume that M = v1,· · · , v m)
of the works in the literature, there exists an inverted index of Dewey IDs for each keyword Using the inverted index, for an l-keyword query, it is possible to get l lists S1, · · · , S l Each S i (1≤ i ≤ l)
of S i’s, i.e.,|S| = max1≤i≤l |S i | The algorithms work on the l lists S1, · · · , S l Below, we also use
slca(Q) and elca(Q) to denote slca(S1, · · · , S l ) and elca(S1, · · · , S l ), respectively Note that lists
In the following, we use d to denote the height of the XML tree, i.e., d is the maximum length
of all the Dewey IDs of the nodes in the XML tree Given two nodes u and v with their Dewey IDs,
we can find lca(u, v) in time O(d), based on the fact that lca(u, v) has a Dewey ID that is equal to the longest common prefix of pre(u) and pre(v) Note that lca(u, v) exists for any two nodes in
⊥ denotes a null node (value) Note that the preorder and postorder relationships between u and ⊥
are not defined
We first discuss some primitive functions used by the algorithms that we will present later
Assume that each set S is sorted in increasing order of Dewey ID.
Dewey ID that is less than or equal to pre(v), i.e lm(v, S)= arg maxu ∈S:u≤v pre(u) It returns
Trang 5⊥, when there is no left match node The cost of the function is O(d log |S|), and it can be
time to compare two Dewey IDs.
Dewey ID that is greater than or equal to pre(v), i.e rm(v, S)= arg minu ∈S:u≥v pre(u) It
removeAncest or(S) = {v ∈ S | u ∈ S : v ≺ u} The cost of removeAncestor is O(d|S|), since S is sorted in increasing Dewey ID order.
With the Dewey IDs, comparing two nodes takes O(d) time, and computing lca of two nodes also takes O(d) time Note that there exists another encoding for XML tree, called interval encoding,
a preorder traversal, end is the largest start value among the nodes in the subtree rooted at that node, and level is the level of the node in XML tree Using interval encoding, comparing two nodes
parent of v for two nodes u and v But most of the works in the literature use only Dewey ID to encode nodes, so in the following, we only consider the Dewey ID encode, where comparing two nodes takes O(d) time.
4.2 SLCA-BASED SEMANTICS
an entity in the world If u is an ancestor of v, then we may understand that the entity represented by
(Class) belongs to the entity represented by 0 (School) For a keyword query, it is more desirable to return the most specific entities that contain all the keywords, i.e., among all the returned entities,
there should not exist any ancestor-descendant relationship between the root nodes t that represent
entities
In this section, we first show some properties of the slca function, which is essential for
efficient algorithms Then three efficient algorithms with different characteristics are shown to