There are two other semantics to answer an l-keyword query on a relational database, namely distinct root semantics and distinct core semantics.. In the next chapter, we will further dis
Trang 1Algorithm 10Block-Pipelined(the keyword query Q, the top-k value k, the CN C)
1: t opk← ∅;Q← ∅
3: Q push(c, uscore(c))
4: whileQ max -uscore > score(topk[k], Q) do
6: if c.status = USCORE then
9: for i = 1 to s do
12: Q push(c, uscore(c))
15: else
17: output topk
In the above discussions, for an l-keyword query on a relational database, each result is an MTJNT This is referred to as the connected tree semantics There are two other semantics to answer an l-keyword query on a relational database, namely distinct root semantics and distinct core semantics In
this section, we will focus on how to answer keyword queries usingrdbmsgiven the schema graph
In the next chapter, we will further discuss how to answer keyword queries under different semantics
on a schema free database graph
Distinct Root Semantics: An l-keyword query finds a collection of tuples that contain all the
keywords and that are reachable from a root tuple (center) within a user-given distance (Dmax) The distinct root semantics implies that the same root tuple determines the tuples uniquely [Dalvi et al.,
2008;He et al.,2007;Hristidis et al.,2008;Li et al.,2008a;Qin et al.,2009a] Suppose that there
is a result rooted at tuple t r For any of the l keywords, say k i , there is a tuple t in the result that satisfies the following conditions: (1) t contains keyword k i , (2) among all tuples that contain k i, the
distance between t and t r is minimum3, and (3) the minimum distance between t and t r must be less than or equal to a user given parameterDmax
are the nodes shown at the top, and all root nodes are distinct For example, the rightmost result in
3 If there is a tie, then a tuple is selected with a predefined order among tuples in practice.
Trang 22.4 OTHER KEYWORD SEARCH SEMANTICS 35
w 4
a 3
Michelle p 2
XML
w 6
a 3
Michelle p 3
XML
c 1
p 1 Michelle p 2 XML
c 2
p 1 Michelle p 3 XML
a 3 Michelle
w 4
p 2 XML
p 1 Michelle
c 2
p 2 XML
p 2XML
w 4
a 3 Michelle
p 3XML
w 6
a 3 Michelle
a 1
w 1 w 2
p 1 Michelle p 2 XML
p 4
w 5 c 5
a 3 Michelle p 2 XML
(a) Distinct Root (Q= {Michelle, XML},Dmax= 2)
p 3 w 4 p 4
w 6 c 3 w 5 c 5
a 3
Michelle p 2
XML
p 2 w 6 p 4
w 4 c 3 w 5 c 4
a 3 Michelle p 3
XML
a 1 c 1 p 3
w 1 w 2 c 2 c 3
p 1 Michelle p 2
XML
p 2 c 2
c 1 c 3
p 1 Michelle p 3 XML
(b) Distinct Core (Q= {Michelle, XML},Dmax= 2)
Figure 2.16: Distinct Root/Core Semantics
Figure 2.16(a) shows that two nodes, a3(containing “Michelle”) and p2(containing “XML”), are
reachable from the root node p4withinDmax= 2 Under the distinct root semantics, the rightmost
result can be output as a set (p4, a3, p2), where the connections from the root node (p4) to the two
nodes can be ignored as discussed in BLINKS [He et al.,2007]
Distinct Core Semantics: An l-keyword query finds multi-center subgraphs, called communities
[Qin et al.,2009a,b] A community, C i (V , E) , is specified as follows V is a union of three subsets
of tuples, V = V c ∪ V k ∪ V p , where, V k is a set of keyword-tuples where a keyword-tuple v k ∈ V k
contains at least a keyword, and all l keywords in the given l-keyword query must appear in at least one keyword-tuple in V k ; V cis a set of center-tuples where there exists at least a sequence of connections
between v c ∈ V c and every v k ∈ V k such that dis(v c , v k )≤Dmax; and V p is a set of path-tuples
that appear on a shortest sequence of connections from a center-tuple v c ∈ V cto a keyword-tuple
v k ∈ V k if dis(v c , v k )≤Dmax Note that a tuple may serve several roles as keyword/center/path
tuples in a community E is a set of connections for every pair of tuples in V if they are connected over shortest paths from nodes in V c to nodes in V k A community, C i, is uniquely determined by
the set of keyword tuples, V k, which is called the core of the community, and denoted ascore(C i )
four unique cores are (a3, p2) , (a3, p3) , (p1, p2) , and (p1, p3), for the four communities from left to right, respectively The multi-centers for each of the communities are shown in the top For example,
for the rightmost community, the two centers are p2and c2
Trang 3It is important to note that the parameterDmaxused in the distinct core/root semantics is different from the parameterTmaxused in the connected tree semantics.Dmaxspecifies a range from a center (root tuple) in which a tuple containing a keyword can be possibly included in a result, andTmaxspecifies the maximum number of nodes to be included in a result
Distinct Core/Root inrdbms: We outline the approach to process l-keyword queries with a radius
(Dmax) based on the distinct core/root semantics In the first step, for each keyword k i, we compute
a temporal relation, P air i (t id i , dis i , T I D) , with three attributes, where both TID and tid i are
TIDs and dis i is the shortest distance between TID and tid i (dis(T I D, tid i )), which is less than or equal toDmax A tuple in P air i indicates that the T I D tuple is in the shortest distance of dis iwith
the tid i tuple that contains keyword k i In the second step, we join all temporal relations, P air i, for
1≤ i ≤ l, on attribute TID (center)
P air1.T I D =P air2.T I D P air2 · · · P air l−1 1
P air l−1 T I D =P air l T I D P air l (2.25)
Here, S is a 2l + 1 attribute relation, S(T ID, tid1, dis1,· · · , tid l , dis l )
Over the temporal relation S, we can obtain the multi-center communities (distinct core)
by grouping tuples on l attributes, tid1, tid2,· · · , tid l Consider query Q= {Michelle, XML}
Figure 2.16(b) is shown in Figure 2.17
TID tid1 dis1 tid2 dis2
Figure 2.17: A Multi-Center Community
Here, the distinct core consists of p1 and p3, where p1 contains keyword “Michelle” (k1)
and p3contains keyword “XML” (k2), and the four centers,{p1, p2, p3, c2}, are listed in the TID column Any center can reach all the tuples in the core,{p1, p3}, withinDmax The above does not
explicitly include the two nodes, c1and c3 in the rightmost community in Figure 2.16(b), which
can be maintained in an additional attribute by concatenating the TIDs, for example, p2.c1.p1and
p2.c3.p3 In a similar fashion, over the same temporal relation S, we can also obtain the distinct
andDmax= 2, the rightmost result in Figure 2.16(a) is shown in Figure 2.18
The distinct root is represented by the TID, and the rightmost result in Figure 2.16(a) is the
first of the two tuples, where a3contains keyword “Michelle” (k1) and p2contains keyword “XML”
Trang 42.4 OTHER KEYWORD SEARCH SEMANTICS 37 TID tid1 dis1 tid2 dis2
Figure 2.18: A Distinct Root Result
Figure 2.19: Distinct Core(left) and Distinct Root(right) (Q= {Michelle, XML}, Dmax = 2)
(k2) Note that a distinct root means a result is uniquely determined by the root As shown above,
there are two tuples with the same root p4 We select one of them using the aggregate function min The complete results for the distinct core/root results are given in Figure 2.19, for the same
Both tables have an attribute Gid that is for easy reference of the distinct core/root results The left
table shows the same content as the right table by grouping on TID in which the shadowed tuples are removed using thesqlaggregate function min to ensure the distinct root semantics
Naive Algorithms: Figure 2.20 outlines the two main steps for processing the distinct core/root
schema graph, G S, is in Figure 2.1, and the database is in Figure 2.2 In Figure 2.20, the left side
computes P air1and P air2temporal relations, for keyword k1 = “Michelle” and k2= “XML”, using
Trang 5Dmax = 2
Pair1
Pair2
S
Figure 2.20: An Overview (R1, R2, R3, and R4represent Author, Write, Paper, and Cite relations in Example 2.1)
projects, joins, unions, and group-by, and the right side joins P air1 and P air2 to compute the S
relation (Eq 2.25)
Let R1, R2, R3, and R4represent Author, Write, Paper, and Cite relations The P air1for
the keyword k1is produced in the following steps
P 0,1 ← T I D →tid1,0→dis1,∗(σ cont ain(k1) R1)
P 0,2 ← T I D →tid1,0→dis1,∗(σ cont ain(k1) R2)
P 0,3 ← T I D →tid1,0→dis1,∗(σ cont ain(k1) R3)
P 0,4 ← T I D →tid1, 0→dis1, ∗(σ cont ain(k1) R4) (2.26)
Here σ cont ain(k1) R j selects the tuples in R j that contain the keyword k1 Let Rj ← σ cont ain(k1) R j,
T I D →tid1,0→dis1,∗(R
j ) projects tuples from R j with all attributes (∗) by further adding two
at-tributes (renaming the attribute T I D to be tid1and adding a new attribute dis1with an initial value zero (this is supported insql)) For example, T I D →tid1,0→dis1,∗(σ cont ain(k1) R1)is translated into the followingsql
select TID as tid1 , 0 as dis1, TID, Name from Author as R1where contain(Title, Michelle)
The meaning of the temporal relation P 0,1 (t id1, dis1, T I D, N ame) is a set of R1relation tuples
(identified by T I D) that are in distance dis1= 0 from the tuples (identified by tid1) containing
keyword k1= “Michelle” The same is true for other P 0,j temporal relations as well After P 0,j are computed, 1≤ j ≤ 4, we compute P 1,j followed by P 2,j to obtain R j relation tuples that are in
distance 1 and distance 2 from the tuples containing keyword k1= “Michelle” (Dmax= 2) Note
that relation P d,j contains the set of tuples of R j that are in distance d from a tuple containing a