Keyword Search in Databases- P14 docx

Since each returned node, except the root node, has exactly one incoming edge, it will form a tree... More precisely, it first generates a set of N Q-subtrees that are candidate answers,

Trang 1

Algorithm 21MinWeightQSubtree(G, P)

Input: a data graph G, and a set of PT constraints P

Output: a minimum weightQ-subtreeunder constraintsP

1: return T , if P consists of a single RPT T

2: return⊥, ifPconsists of a single NPT

3: letT be obtained from P by replacing each N ∈P |NP T with the PT consisting of only the node root (N)

4: Initialize T min← ⊥, where ⊥ has a weight ∞

5: for all subsetsT t opofT, such that|T t op| ≥ 2 do

6: G1← G − ∪ T∈P V (T )+ ∪T∈T t op {root(T )}

7: T t op ←MinWeightSuperTree(G1, T t op )

8: if T t op = ⊥ then

9: T+

t op ← T t op∪T t op

10: let G2be obtained from G by removing all the incoming edges to root (T t op+ )

11: T ←MinWeightSuperTree(G2, P\T t op ∪ {T t op+ })

12: T min ← min{T min , T}

13: return T min

But, this will take exponential time, as line 5 inMinWeightQSubtree In the following, we

in-troduce approaches to find a (θ + 1)-approximate minimum weightQ-subtreeunder constraints

P

Algorithm 22 (AppWeightQSubtree [Kimelfeld and Sagiv, 2006b]) finds a (θ +

1)-approximation of the minimum weight Q-subtree in polynomial time under data-and-query complexity Recall that the number of NPTs of P is no more than 2 AppWeightQSubtree mainly consists of three steps

1 A minimum weightQ-subtreeis found by callingMinWeightQSubtreeif there are no RPTs inP (line 1) Otherwise, it finds a θ-approximation of minimum weight super tree of

P , T app, by callingAppWeightSuperTree(line 2) Return T app if it is aQ-subtreeor⊥ (line 3)

2 Find a reduced tree T min (lines 4-5) The weight of T minis guaranteed to be smaller than the minimum weightQ-subtree under constraintsP if one exists Note that this step can be accomplished by a single call to procedureMinWeightQSubtree, by adding to Ga virtual

node v and an edge to each root (T ) for all T ∈P |RP T, and callingMinWeightQSubtree

(G, P |NP T ∪ {v}) If T minis⊥, then there is noQ-subtreesatisfying the constraintsP(line 6)

Trang 2

3.3 STEINER TREE-BASED KEYWORD SEARCH 65 Algorithm 22AppWeightQSubtree(G, P)

Output: a (θ + 1)-approximation of minimum weightQ-subtreeunder constraintsP

1: returnMinWeightQSubtree(G, P ), ifP |RP T = ∅

2: T app ←AppWeightSuperTree(G, P )

3: return T app , if T app = ⊥ or T app is reduced

4: let Gbe obtained from G by removing all the edges u, v, where v is a non-root node of some

T ∈P andu, v is not an edge in T

5: T min= minT∈P |RP T MinWeightQSubtree(G, P |NP T ∪ {T })

6: return⊥, if T min= ⊥

7: r ← root(T min )

8: if r belongs to a subtree T ∈P |RP T then

9: r ← root(T )

10: G app ← T min ∪ T app

11: remove from G app all incoming edges of r

12: for all v ∈ V (G app ) that have two incoming edges e1∈ E(T min ) and e2∈ E(T app ) do

13: remove e2from G app

14: delete from G app all structural nodes v, such that no keyword is reachable from v

15: return G app

3 Union T app and T min , and remove redundant nodes and edges to get an (θ + 1)-approximate

Q-subtree (line 7-14) Note that, all the edges in T min are kept during the removal of redundant nodes and edges

The general idea ofAppWeightQSubtreeis that it first finds a θ-approximate super tree

ofP , denoted T app If T appdoes not exist, then there is noQ-subtreeunder constraintsP If T app

is reduced, then it is a θ-approximation of the minimum weightQ-subtree Otherwise, it finds

another subtree T min, which is guaranteed to be reduced If there is aQ-subtreeunder constraints

P , then T minmust exist and its weight must be smaller than the minimumQ-subtreebecause a subtree of the minimumQ-subtreesatisfies Line 5 Let r denote the root of T min; there are three

cases: either r is the root of a NPT in P , or r is the root node of a RPT in P , or r is not in P If r

is the root of a NPT inP , then it must have at least two children, otherwise the root of T appmust have an incoming edge (guaranteed by Line 5), as it is the root of one NPT inP If both T appand

T minexist, then from lines 7-14, it can get aQ-subtree Since each returned node, except the root node, has exactly one incoming edge, it will form a tree

Theorem 3.9 [ Kimelfeld and Sagiv , 2006b ] Consider a data graph G with n nodes and m edges Let

Q = {k1, · · · , k l } be a keyword query and P be a set of PT constraints, such that leaves(P ) = Q and P has at most c NPTs.AppWeightQSubtreefinds a (θ + 1)-approximation of the minimum weight

Trang 3

Q-subtreethat satisfies P in time O(f + 4c+1n+ 3c+1((l + log n)n + m)), where θ and f are the

approximation ratio and runtime, respectively, ofAppWeightSuperTree.

Finding 2-approximate minimum heightQ-subtreeunderP: Although MinWeightQSub-tree and AppWeightQSubtree can enumerate Q-subtrees in exact (or approximate) rank order, they are based on repeated computations of steiner trees (or approximate steiner trees) under inclusion and exclusion constraints; therefore, they are not practical.Golenberg et al.[2008] propose

to decoupleQ-subtreeranking step fromQ-subtreegeneration More precisely, it first generates

a set of N Q-subtrees that are candidate answers, by incorporating a much easier rank function

than the steiner tree weight function, and then generates a set of k final answers which are ranked

according to a more complex ranking function The ranking function used is based on the height, where the height of a tree is the maximum among the shortest distances to each keyword node, i.e.,

height (T )= maxl

i=1dist (root (T ), k i ) Ranking in increasing height order is very correlated to the desired ranking [Golenberg et al.,2008], so an enumeration algorithm is proposed to generate Q-subtrees in 2-approximation rank order with respect to the height ranking function

The general idea is the same as that of enumerating Q-subtrees in (θ + 1)-approximate

order with respect to the weight function, i.e.AppWeightQSubtree It also usesEnumTreePD

as the outer enumeration algorithm, and it implements the sub-routineQ-subtree() by returning

a 2-approximation of minimum heightQ-subtreeunder constraintsP Finding an approximate tree with respect to height ranking function under constraintsPis much easier than with the weight ranking function, i.e.,AppWeightQSubtree It also consists of three steps: (1) find a minimum height super tree ofP , T sup , and return T supif it is reduced or equal to⊥, (2) otherwise, find another

reduced subtree T minwhose height is guaranteed to be no larger than that of the minimum height Q-subtreeif one exists, (3) return the union of T sup and T minafter removing redundant edges and nodes

The algorithm to find a minimum height super tree ofT is shown in Algorithm 23 ( Min-HeightSuperTree[Golenberg et al.,2008]).The general idea is the same asBackwardSearch,

by creating an iterator for each leaf node ofT (lines 3-5) During each execution of Line 6, it first

finds the iterator I v, whose next node to be returned is the one with the shortest distance to its

source Let u to be that node If u has been returned from all the other iterators (line 9), it means that the shortest paths from u to all the leaf nodes in T have been computed The union of these shortest paths is a tree with minimum height to include all the leaf nodes ofT But there is one problem that remains to be solved: the tree returned must be a super tree ofT All the edgesu, v from G, where v is a non-root node of some T ∈T andu, v is not an edge in T (line 1), can be

removed since in a tree every node can have at most one incoming edge andu, v must be included.

This operation makes sure that for every non-root ofT, the incoming edge inT is included Also, the root of the tree returned can not be a non-root ofT, which can be checked by Line 9 Then, the tree returned byMinHeightSuperTreeis a minimum height super tree ofT in G.

Trang 4

3.3 STEINER TREE-BASED KEYWORD SEARCH 67 Algorithm 23MinHeightSuperTree(G, T)

Input: a data graph G, and a set of PT constraints T

Output: a minimum height super tree ofT in G.

1: remove all the edgesu, v from G, where v is a non-root node of some T ∈ T andu, v is not an edge in T

2: I t H eap ← ∅

3: for each leave node, v ∈ leaves( T ) do

4: Create a single source shortest path iterator, I v , with v as the source node

5: I t H eap.insert(I v ) , the priority of I vis the distance of the next node it will return

6: while I tH eap= ∅ do

7: I v ← ItHeap.pop()

8: u ← I v next()

9: if u has been returned from all the other iterators and u is not a non-root node of any tree

T ∈T then

10: return the subtree which is the union of the shortest paths from u to each leaf node of T

11: if I vhas more nodes to return then

12: I t H eap.insert(I v )

13: return⊥

a

h

a

e

u

S h

P1

T2

P2

P1

Figure 3.8: Approximating a minimum heightQ-subtree[Golenberg et al.,2008]

If the tree T app found by MinHeightSuperTree is non-reduced, then it needs to find

another reduced tree T min , and the root of T appmust be the root of one NPT inP; without loss of

generality, we assume it to be P1 Note that, with respect to the height ranking function, it can not use the idea ofMinWeightQSubtreeto find minimum heightQ-subtree There are two cases

to consider depending on whether the following holds: in a minimum heightQ-subtreeA mthat satisfiesP, there is a path from the root to a single node PT inPthat does not use any edge ofP

Trang 5

Algorithm 24AppHeightQSubtree(G D,P)

Output: a 2-approximation of minimum heightQ-subtreeunder constraintsP

1: return ⊥, if Pconsists of a single NPT

2: T app ←MinHeightSuperTree(G, P )

3: return T , if T = ⊥ or T is aQ-subtree

4: if there exists single node RPTs inP then

5: construct the tree T1

6: if there exists two non-single node PTs inP then

7: construct the tree T2

8: return⊥, if T1= ⊥ and T2= ⊥

9: T min ← minimum height subtree among T1and T2

10: construct aQ-subtreeT from T app and T min

11: return T

The two cases are shown in Figure 3.8 Essentially, A mmust contain a subtree that looks like either

T1or T2 We discuss these cases below

T1describes the following situation: (1) one single node PT (e.g., keyword node h) is reachable from the root of A m through a path S hthat does not use any edge ofP ; and (2) P1is reachable from

the root of A m through a path S p that does not include any edge appearing on S h Let G v denote

the graph obtained from G by deleting all the non-root nodes of PTs in P , and G edenote the graph

obtained from G by deleting all edges u, v where v is a non-root node of a PT in P andu, v is

not inP For each single node PT inP , e.g., the keyword node h, it can find the minimum height subtree T h by concurrently running two iterators of Dijkstra’s algorithm, one with h as the source and works on G v , the other with root (P1) as the source and works on G e T1is the minimum height subtree among all the found subtrees

T2applies only whenP contains two non-single PTs, P1and P2, where P2can be either NPT

or RPT If P2is a NPT, then T2can not use any edge fromP , so it can be found in the graph G v

Otherwise, P2is a RPT, the root of T2can be the root of P2 Then it needs to build a new graph

Gfrom G as follows: (1) remove all the edges entering into non-root nodes of P2and are not in P2

itself (i.e., it is handled as in the construction of G e ); (2) remove all the non-root nodes of P1(i.e.,

it is handled as in the construction of G v ) In G, T2can be found by two iterators using Dijkstra’s algorithm

Theorem 3.10 [ Golenberg et al , 2008 ] Given a data graph G with n nodes and m edges, let Q=

{k1, · · · , k l } be a keyword query and P be a set of PT constraints, such that leaves(P ) = Q and P has

at most two non-single node PTs.AppHeightQSubtreefinds a 2-approximation of the minimum heightQ-subtreethat satisfies P in time O(l(n log n + m)).

Định dạng
Số trang	5
Dung lượng	127,12 KB