Keyword Search in Databases- P8 potx

There are two other semantics to answer an l-keyword query on a relational database, namely distinct root semantics and distinct core semantics.. In the next chapter, we will further dis

Trang 1

Algorithm 10Block-Pipelined(the keyword query Q, the top-k value k, the CN C)

1: t opk← ∅;Q← ∅

3: Q push(c, uscore(c))

4: whileQ max -uscore > score(topk[k], Q) do

6: if c.status = USCORE then

9: for i = 1 to s do

12: Q push(c, uscore(c))

15: else

17: output topk

In the above discussions, for an l-keyword query on a relational database, each result is an MTJNT This is referred to as the connected tree semantics There are two other semantics to answer an l-keyword query on a relational database, namely distinct root semantics and distinct core semantics In

this section, we will focus on how to answer keyword queries usingrdbmsgiven the schema graph

In the next chapter, we will further discuss how to answer keyword queries under different semantics

on a schema free database graph

Distinct Root Semantics: An l-keyword query finds a collection of tuples that contain all the

keywords and that are reachable from a root tuple (center) within a user-given distance (Dmax) The distinct root semantics implies that the same root tuple determines the tuples uniquely [Dalvi et al.,

2008;He et al.,2007;Hristidis et al.,2008;Li et al.,2008a;Qin et al.,2009a] Suppose that there

is a result rooted at tuple t r For any of the l keywords, say k i , there is a tuple t in the result that satisfies the following conditions: (1) t contains keyword k i , (2) among all tuples that contain k i, the

distance between t and t r is minimum3, and (3) the minimum distance between t and t r must be less than or equal to a user given parameterDmax

are the nodes shown at the top, and all root nodes are distinct For example, the rightmost result in

3 If there is a tie, then a tuple is selected with a predefined order among tuples in practice.

Trang 2

2.4 OTHER KEYWORD SEARCH SEMANTICS 35

w 4

a 3

Michelle p 2

XML

w 6

a 3

Michelle p 3

XML

c 1

p 1 Michelle p 2 XML

c 2

a 3 Michelle

w 4

p 2 XML

p 1 Michelle

c 2

p 2 XML

p 2XML

w 4

a 3 Michelle

p 3XML

w 6

a 3 Michelle

a 1

w 1 w 2

p 4

w 5 c 5

a 3 Michelle p 2 XML

(a) Distinct Root (Q= {Michelle, XML},Dmax= 2)

p 3 w 4 p 4

w 6 c 3 w 5 c 5

a 3

Michelle p 2

XML

p 2 w 6 p 4

w 4 c 3 w 5 c 4

a 3 Michelle p 3

XML

a 1 c 1 p 3

w 1 w 2 c 2 c 3

p 1 Michelle p 2

XML

p 2 c 2

c 1 c 3

(b) Distinct Core (Q= {Michelle, XML},Dmax= 2)

Figure 2.16: Distinct Root/Core Semantics

Figure 2.16(a) shows that two nodes, a3(containing “Michelle”) and p2(containing “XML”), are

reachable from the root node p4withinDmax= 2 Under the distinct root semantics, the rightmost

result can be output as a set (p4, a3, p2), where the connections from the root node (p4) to the two

nodes can be ignored as discussed in BLINKS [He et al.,2007]

Distinct Core Semantics: An l-keyword query finds multi-center subgraphs, called communities

[Qin et al.,2009a,b] A community, C i (V , E) , is specified as follows V is a union of three subsets

of tuples, V = V c ∪ V k ∪ V p , where, V k is a set of keyword-tuples where a keyword-tuple v k ∈ V k

contains at least a keyword, and all l keywords in the given l-keyword query must appear in at least one keyword-tuple in V k ; V cis a set of center-tuples where there exists at least a sequence of connections

between v c ∈ V c and every v k ∈ V k such that dis(v c , v k )≤Dmax; and V p is a set of path-tuples

that appear on a shortest sequence of connections from a center-tuple v c ∈ V cto a keyword-tuple

v k ∈ V k if dis(v c , v k )≤Dmax Note that a tuple may serve several roles as keyword/center/path

tuples in a community E is a set of connections for every pair of tuples in V if they are connected over shortest paths from nodes in V c to nodes in V k A community, C i, is uniquely determined by

the set of keyword tuples, V k, which is called the core of the community, and denoted ascore(C i )

four unique cores are (a3, p2) , (a3, p3) , (p1, p2) , and (p1, p3), for the four communities from left to right, respectively The multi-centers for each of the communities are shown in the top For example,

for the rightmost community, the two centers are p2and c2

Trang 3

It is important to note that the parameterDmaxused in the distinct core/root semantics is different from the parameterTmaxused in the connected tree semantics.Dmaxspecifies a range from a center (root tuple) in which a tuple containing a keyword can be possibly included in a result, andTmaxspecifies the maximum number of nodes to be included in a result

Distinct Core/Root inrdbms: We outline the approach to process l-keyword queries with a radius

(Dmax) based on the distinct core/root semantics In the first step, for each keyword k i, we compute

a temporal relation, P air i (t id i , dis i , T I D) , with three attributes, where both TID and tid i are

TIDs and dis i is the shortest distance between TID and tid i (dis(T I D, tid i )), which is less than or equal toDmax A tuple in P air i indicates that the T I D tuple is in the shortest distance of dis iwith

the tid i tuple that contains keyword k i In the second step, we join all temporal relations, P air i, for

1≤ i ≤ l, on attribute TID (center)

P air1.T I D =P air2.T I D P air2 · · · P air l−1 1

P air l−1 T I D =P air l T I D P air l (2.25)

Here, S is a 2l + 1 attribute relation, S(T ID, tid1, dis1,· · · , tid l , dis l )

Over the temporal relation S, we can obtain the multi-center communities (distinct core)

by grouping tuples on l attributes, tid1, tid2,· · · , tid l Consider query Q= {Michelle, XML}

Figure 2.16(b) is shown in Figure 2.17

TID tid1 dis1 tid2 dis2

Figure 2.17: A Multi-Center Community

Here, the distinct core consists of p1 and p3, where p1 contains keyword “Michelle” (k1)

and p3contains keyword “XML” (k2), and the four centers,{p1, p2, p3, c2}, are listed in the TID column Any center can reach all the tuples in the core,{p1, p3}, withinDmax The above does not

explicitly include the two nodes, c1and c3 in the rightmost community in Figure 2.16(b), which

can be maintained in an additional attribute by concatenating the TIDs, for example, p2.c1.p1and

p2.c3.p3 In a similar fashion, over the same temporal relation S, we can also obtain the distinct

andDmax= 2, the rightmost result in Figure 2.16(a) is shown in Figure 2.18

The distinct root is represented by the TID, and the rightmost result in Figure 2.16(a) is the

first of the two tuples, where a3contains keyword “Michelle” (k1) and p2contains keyword “XML”

Trang 4

2.4 OTHER KEYWORD SEARCH SEMANTICS 37 TID tid1 dis1 tid2 dis2

Figure 2.18: A Distinct Root Result

Figure 2.19: Distinct Core(left) and Distinct Root(right) (Q= {Michelle, XML}, Dmax = 2)

(k2) Note that a distinct root means a result is uniquely determined by the root As shown above,

there are two tuples with the same root p4 We select one of them using the aggregate function min The complete results for the distinct core/root results are given in Figure 2.19, for the same

Both tables have an attribute Gid that is for easy reference of the distinct core/root results The left

table shows the same content as the right table by grouping on TID in which the shadowed tuples are removed using thesqlaggregate function min to ensure the distinct root semantics

Naive Algorithms: Figure 2.20 outlines the two main steps for processing the distinct core/root

schema graph, G S, is in Figure 2.1, and the database is in Figure 2.2 In Figure 2.20, the left side

computes P air1and P air2temporal relations, for keyword k1 = “Michelle” and k2= “XML”, using

Trang 5

Dmax = 2

Pair1

Pair2

S

Figure 2.20: An Overview (R1, R2, R3, and R4represent Author, Write, Paper, and Cite relations in Example 2.1)

projects, joins, unions, and group-by, and the right side joins P air1 and P air2 to compute the S

relation (Eq 2.25)

Let R1, R2, R3, and R4represent Author, Write, Paper, and Cite relations The P air1for

the keyword k1is produced in the following steps

P 0,1 ← T I D →tid1,0→dis1,∗(σ cont ain(k1) R1)

P 0,2 ← T I D →tid1,0→dis1,∗(σ cont ain(k1) R2)

P 0,3 ← T I D →tid1,0→dis1,∗(σ cont ain(k1) R3)

P 0,4 ← T I D →tid1, 0→dis1, ∗(σ cont ain(k1) R4) (2.26)

Here σ cont ain(k1) R j selects the tuples in R j that contain the keyword k1 Let Rj ← σ cont ain(k1) R j,

T I D →tid1,0→dis1,∗(R

j ) projects tuples from R j with all attributes (∗) by further adding two

at-tributes (renaming the attribute T I D to be tid1and adding a new attribute dis1with an initial value zero (this is supported insql)) For example, T I D →tid1,0→dis1,∗(σ cont ain(k1) R1)is translated into the followingsql

select TID as tid1 , 0 as dis1, TID, Name from Author as R1where contain(Title, Michelle)

The meaning of the temporal relation P 0,1 (t id1, dis1, T I D, N ame) is a set of R1relation tuples

(identified by T I D) that are in distance dis1= 0 from the tuples (identified by tid1) containing

keyword k1= “Michelle” The same is true for other P 0,j temporal relations as well After P 0,j are computed, 1≤ j ≤ 4, we compute P 1,j followed by P 2,j to obtain R j relation tuples that are in

distance 1 and distance 2 from the tuples containing keyword k1= “Michelle” (Dmax= 2) Note

that relation P d,j contains the set of tuples of R j that are in distance d from a tuple containing a

Định dạng
Số trang	5
Dung lượng	193,93 KB