Keyword Search in Databases- P9 pdf

certain keyword, and its two attributes, tid l and dis l , explicitly indicate that it is about keyword k l.The details of computing P1,j for R j, 1≤ j ≤ 4, are given below.. We further

Trang 1

certain keyword, and its two attributes, tid l and dis l , explicitly indicate that it is about keyword k l.

The details of computing P1,j for R j, 1≤ j ≤ 4, are given below.

P 1,1 ← P 0,2 .T I D →tid1,1→dis1,R1.∗(P 0,2 1

P 0,2 AI D =R1.T I D R1)

P 1,2 ← P 0,1 .T I D →tid1,1→dis1,R2.∗(P 0,1 1

P 0,1 T I D =R2.AI D R2)∪

P 0,3 .T I D →tid1,1→dis1,R2.∗(P 0,3 1

P 0,3 .T I D =R2.P I D R2)

P 1,3 ← P 0,2 T I D →tid1,1→dis1,R3.∗(P 0,2 1

P 0,2 .P I D =R3.T I D R3)∪

P 0,4 .T I D →tid1,1→dis1,R3.∗(P 0,4 1

P 0,4 .P I D1=R3.T I D R3)∪

P 0,4 .T I D →tid1,1→dis1,R3.∗(P 0,4 1

P 0,4 P I D2=R3.T I D R3)

P 1,4 ← P 0,3 .T I D →tid1,1→dis1,R4.∗(P 0,3 1

P 0,3 T I D =R4.P I D1R4)∪

P 0,3 .T I D →tid1,1→dis1,R4.∗(P 0,3 1

P 0,3 T I D =R4.P I D2R4) (2.27)

Here, each join/project corresponds to a foreign key reference – an edge in schema graph G S.The idea

is to compute P d,j based on P d −1,i if there is an edge between R j and R i in G S Consider P1,3 for R3,

it computes P1,3 by union of three joins (P0,2 1 R3∪ P0,4 1 R3∪ P0,4 1 R3), because there is one

foreign key reference between R3 (Paper) and R2(Write), and two foreign key references between

R3and R4 (Cite) This ensures that all R j tuples that are with distance d from a tuple containing

a keyword k l can be computed Continuing the example, to compute P2,j for R j, 1≤ j ≤ 4, for keyword k1, we replace every P d,j in Eq 2.27 with P d +1,j and replace “1→ dis1” with “2 → dis1”.

The process repeatsDmaxtimes

Suppose that we have computed P d,j for 0≤ d ≤Dmaxand 1≤ j ≤ 4, for keyword k1=

“Michelle” We further compute the shortest distance between a R j tuple and a tuple

con-taining k1 using union, group-by, and sql aggregate function min First, we perform project,

P d,j ← T I D,t id1,dis1P d,j Therefore, every P d,j relation has the same tree attributes Second,

for R j , we compute the shortest distance from a R j tuple to a tuple containing keyword k1 using

group-by () andsqlaggregate function min

G j ←T I D,t id1 min(dis1) (P 0,j ∪ P1,j ∪ P2,j ) (2.28)

where, the left side of group-by () is group-by attributes, and the right side is thesql aggregate function Finally,

Here, P air1 records all tuples that are shortest distance away from a tuple containing keyword k1,

withinDmax Note that G i ∩ G j = ∅, because G i and G j are tuples identified with TIDs from

Ri and R j relations and TIDs are unique in the database as assumed We can compute P air2 for keyword k2= “XML” following the same procedure as indicated in Eq 2.26-Eq 2.29 Once

all P air1 and P air2are computed, we can easily compute distinct core/root results based on the

relation S ← P air1 1 P air2(Eq 2.25)

Trang 2

Algorithm 11 Pair(G S , ki,Dmax, R1, · · · , R n)

Input: Schema G S , keyword k i,Dmax, n relations R1 , · · · , R n

Output:P air i with 3 attributes: T I D, tid i , dis i

1: for j = 1 to n do

2: P 0,j ←R j .T I D →tid i ,0→dis i , R j .∗(σ cont ain(k i ) R j )

3: G j ←t id i ,dis i ,T I D (P 0,j )

4: for d= 1 toDmaxdo

5: for j = 1 to n do

6: P d,j ← ∅

7: for all (R j , R l ) ∈ E(G S ) ∨ (R l , R j ) ∈ E(G S )do

8: ←P d −1,l .T I D →tid i , d →dis i , R j .∗(P d −1,l 1 R j )

9:  ← σ (t id i ,T I D) /∈t idi ,T I D (G j ) ()

10: P d,j ← P d,j ∪ 

11: G j ← G j ∪ t id i ,dis i ,T I D ()

12: P air i ← G1 ∪ G2 ∪ · · · ∪ G n

13: return P air i

Computing group-by () withsqlaggregate function min: Consider Eq 2.28, the group-by

can be computed by virtually pushing Recall that all P d,j relations, for 1≤ d ≤Dmax, have the

same schema, and P d,j maintains R j tuples that are in distance d from a tuple containing a keyword.

We use two pruning rules to reduce the number of temporal tuples computed

Rule-1: If the same (tid i , T I D) value appears in two different P d,j and P d,j, then the shortest

distance between tid i and T I D must be in P d,j but not P d,j , if d< d Therefore, Eq 2.28 can be computed as follows

G j ← P0,j

G j ← G j ∪ (σ (t id1,T I D) /∈t id1,T ID (G j ) P 1,j )

G j ← G j ∪ (σ (t id1,T I D) /∈t id1,T ID (G j ) P 2,j ) (2.30)

Here, σ (t id1,T I D) /∈

t id1,T ID (G j ) P 2,j selects P2,j tuples where their (tid1 , T I D)does not appear in

G j ; yet, in other words, there does not exist a shortest path between tid1 and T I D before.

Rule-2: If there exists a shortest path between tid i and T I D value pair, say, dis i (t id i , T I D) = d,

then there is no need to compute any tuple connections between the tid i and T I D pair, because all

those will be removed later by group-by andsqlaggregate function min In Eq 2.27, every P1,j,

1≤ j ≤ 4, can be further reduced as P1,j ← σ (t id1,T I D) /∈t id1,T ID (P 0,j ) P 1,j

The algorithm Pair() is given in Algorithm 11, which computes P air i for keyword k i It first

computes all the initial P0,j relations (refer to Eq 2.26) and initializes G j relations (refer to the

first equation in Eq 2.30) in lines 1-3 Second, it computes P d,j for every 1≤ d ≤Dmaxand every

Trang 3

Algorithm 12 DC-Naive(R1, · · · , R n, GS, Q,Dmax)

Input: n relations R1, R2, · · · , R n , schema graph G S, and

l -keyword, Q = {k1 , k2, · · · , k l}, and radiusDmax

Output:Relation with 2l + 1 attributes named T ID, tid1, dis1, · · · , tid l , dis l

1: for i = 1 to l do

2: P air i ← Pair(G S , k i,Dmax, R1, · · · , R n )

3: S ← P air1 1 P air21· · ·1 P air l

4: Sort S by tid1, tid2, · · · , tid l

5: return S

k1

Dmax

k2 Dmax

Center nodes

(a) From keywords to centers

k 1

k 2

Center nodes

(b) From centers to key-words

k

Dmax

k

Dmax

c

Dmax

t u

dis(k1dis(c, t , t u) u)

t v

dis(c, t dis(k v) 2, t v)

(c) Project Relations

relation R j, 1≤ j ≤ n, in two “for loops” (lines 4-5) In lines 7-11, it computes P d,j based on the

foreign key references in the schema graph G S, referencing to Eq 2.27 and Eq 2.30, using the two

rules, Rule-1 and Rule-2 In our example, to compute P air1, it calls Pair(G S , k1,Dmax, R1, R2, R3,

R4), where k1= “Michelle”,Dmax= 2, and the 4 relations R j, 1≤ j ≤ 4.

The naive algorithm DC-Naive() to compute distinct cores is outlined in Algorithm 12

DR-Naive() that computes distinct roots can be implemented in the same way as DC-DR-Naive() by replacing

line 4 in Algorithm 12 with 2 group-bys as follows: X←T I D min(dis1) →dis1, ··· ,min(dis l ) →dis l S, and

S←T I D,dis1, ··· ,dis l min(tid1) →tid1, ··· ,min(tid l ) →tid l (S 1 X)

Three-Phase Database Reduction: We now discuss a three-phase reduction approach to project a

relational database RDB’ out of RDB with which we compute multi-center communities (distinct

core semantics) In other words, in the three-phase reduction, we significantly prune the tuples from

an RDB that do not participate in any communities We also show that we can fast compute distinct

root results using the same subroutine used in the three-phase reduction

Trang 4

Algorithm 13 DC(R1, R2, · · · , R n, GS , Q,Dmax)

Input: n relations R1, R2, · · · , R n , with schema graph G S, and

an l-keyword query, Q = {k1 , k2, · · · , k l}, and radiusDmax

Output:Relation with 2l + 1 attributes named TID, tid1, dis1, · · · , tid l , dis l

1: for i = 1 to l do

2: {G1,i , · · · , G n,i } ← PairRoot(G S ,k i,Dmax,R1,· · · ,R n , σ cont ain(k i ) R1,· · · ,σcont ain(k i ) R n )

3: for j = 1 to n do

4: R j,i ← R j G j,i

5: for j = 1 to n do

6: Y j ← G j,11 G j,21· · ·1 G j,l

7: X j ← R j Y j

8: for i = 1 to l do

9: {W1,i , · · · , W n,i } ← PairRoot(G S , k i,Dmax, R1,i, · · · , R n,i , X1, · · · , X n)

10: for j = 1 to n do

11: P at h j,i ← G j,i 1 G j,i T I D =W j,i T I D W j,i

12: P at h j,i ←T I D, G j,i .dis i →d ki ,W j,i .dis i →d r (P at h j,i )

13: P at h j,i ← σ d ki +d r≤Dmax(P at h j,i )

14: R

j,i ← R j,i P ath j,i

15: for i = 1 to l do

16: P air i ← Pair(R

1,i , R 2,i ,· · · , R

n,i , G S , k i,Dmax) 17: S ← P air1 1 P air21· · ·1 P air l

18: Sort S by tid1, tid2, · · · , tid l

19: return S

Figure 2.21 outlines the main ideas for processing an l-keyword query, Q = {k1 , k2, · · · , k l}, with a user-givenDmax, against an RDB with a schema graph G S

The first reduction phase (from keyword to center): We consider a keyword k ias a virtual node,

called a keyword-node, and we take a keyword-node, k i , as a center to compute all tuples in an RDB that are reachable from k i withinDmax A tuple t withinDmaxfrom a virtual keyword-node k i

means that tuple t can reach at least a tuple containing k iwithinDmax LetG ibe the set of tuples in

RDB that can reach at least a tuple containing keyword kiwithinDmax, for 1≤ i ≤ l Based on all

G i, we can computeY=G11 G21· · ·1 G l, which is the set of center-nodes that can reach every

keyword-node k i, 1≤ i ≤ l, withinDmax.Y is illustrated as the shaded area in Figure 2.21(a) for

l= 2 Obviously, a center appears in a multi-center community must appear inY

The second reduction phase (from center to keyword): In a similar fashion, we consider a virtual

center-node A tuple t withinDmaxfrom a virtual center-node means that t is reachable from a

tuple inY withinDmax We compute all tuples that are reachable fromY withinDmax LetW i

Trang 5

Algorithm 14 PairRoot(G S, ki ,Dmax, R1, · · · , R n, I1, · · · , I n)

Input: Schema graph G S , keyword k i,Dmax, n relations R1 , R2, · · · , R n , and n initial

relations I1 , I2, · · · , I n

Output:n relations G 1,i , · · · , G n,i each has 3 attributes: TID, tid i , dis i

1: for j = 1 to n do

2: P 0,j ←I j .T I D →tid i ,0→dis i ,I j .∗(I j )

3: G j,i ←t id i ,dis i ,T I D (P 0,j )

4: for d= 1 toDmaxdo

5: for j = 1 to n do

6: P d,j ← ∅

7: for all (R j , R l ) ∈ E(G S ) ∨ (R l , R j ) ∈ E(G S )do

8: ←P d −1,l .T I D →tid i ,d →dis i ,R j .∗(P d −1,l 1 R j )

9: ←R j ∗ min(tid i ), min(dis i ) ()

10:  ← σ T I D /∈T I D (G j,i ) ()

11: P d,j ← P d,j ∪ 

12: G j,i ← G j,i ∪ t id i ,dis i ,T I D ()

13: return{G1,i , · · · , G n,i}

Algorithm 15 DR(R1, R2, · · · , R n , GS , Q,Dmax)

Input: n relations R1, R2, · · · , R n , with schema graph G S, and

an l-keyword query, Q = {k1 , k2, · · · , k l}, and radiusDmax

Output:Relation with 2l + 1 attributes named TID, tid1, dis1, · · · , tid l , dis l

1: for i = 1 to l do

2: {G1,i , · · · , G n,i } ← PairRoot(G S , k i,Dmax, R1, · · · , R n , σ cont ain(k i ) R1,· · · , σ cont ain(k i ) R n )

3: for j = 1 to n do

4: Sj ← G j,11 Gj,21· · ·1 Gj,l

5: S ← S1 ∪ S2 ∪ · · · ∪ S n

6: return S

be the set of tuples inG i that can be reached from a center inYwithinDmax, for 1≤ i ≤ l Note

thatW i⊆G i When l= 2,W1 andW2 are illustrated as the shaded areas on left and right in Figure 2.21(b), respectively Obviously, only the tuples that contain a keyword withinDmaxfrom a center are possible to appear in the final result as keyword tuples

The third reduction phase (project DB): We project an RDB’ out of the RDB, which is sufficient

to compute all multi-center communities by joinG i 1 W i, for 1≤ i ≤ l Consider a tuple in G i,

which contains a TID twith a distance to the virtual keyword-node k i , denoted as dis(t, k i ), and consider a tuple inW i , which contains a TID twith a distance to the virtual center-node c, denoted

Định dạng
Số trang	5
Dung lượng	192,43 KB