1. Trang chủ
  2. » Công Nghệ Thông Tin

Keyword Search in Databases- P14 docx

5 175 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 5
Dung lượng 127,12 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Since each returned node, except the root node, has exactly one incoming edge, it will form a tree... More precisely, it first generates a set of N Q-subtrees that are candidate answers,

Trang 1

Algorithm 21MinWeightQSubtree(G, P)

Input: a data graph G, and a set of PT constraints P

Output: a minimum weightQ-subtreeunder constraintsP

1: return T , if P consists of a single RPT T

2: return⊥, ifPconsists of a single NPT

3: letT be obtained from P by replacing each NP |NP T with the PT consisting of only the node root (N)

4: Initialize T min← ⊥, where ⊥ has a weight ∞

5: for all subsetsT t opofT, such that|T t op| ≥ 2 do

6: G1← G − ∪ TP V (T )+ ∪TT t op {root(T )}

7: T t op ←MinWeightSuperTree(G1, T t op )

8: if T t op = ⊥ then

9: T+

t op ← T t opT t op

10: let G2be obtained from G by removing all the incoming edges to root (T t op+ )

11: T ←MinWeightSuperTree(G2, P\T t op ∪ {T t op+ })

12: T min ← min{T min , T}

13: return T min

But, this will take exponential time, as line 5 inMinWeightQSubtree In the following, we

in-troduce approaches to find a (θ + 1)-approximate minimum weightQ-subtreeunder constraints

P

Algorithm 22 (AppWeightQSubtree [Kimelfeld and Sagiv, 2006b]) finds a (θ +

1)-approximation of the minimum weight Q-subtree in polynomial time under data-and-query complexity Recall that the number of NPTs of P is no more than 2 AppWeightQSubtree mainly consists of three steps

1 A minimum weightQ-subtreeis found by callingMinWeightQSubtreeif there are no RPTs inP (line 1) Otherwise, it finds a θ-approximation of minimum weight super tree of

P , T app, by callingAppWeightSuperTree(line 2) Return T app if it is aQ-subtreeor⊥ (line 3)

2 Find a reduced tree T min (lines 4-5) The weight of T minis guaranteed to be smaller than the minimum weightQ-subtree under constraintsP if one exists Note that this step can be accomplished by a single call to procedureMinWeightQSubtree, by adding to Ga virtual

node v and an edge to each root (T ) for all TP |RP T, and callingMinWeightQSubtree

(G, P |NP T ∪ {v}) If T minis⊥, then there is noQ-subtreesatisfying the constraintsP(line 6)

Trang 2

3.3 STEINER TREE-BASED KEYWORD SEARCH 65 Algorithm 22AppWeightQSubtree(G, P)

Input: a data graph G, and a set of PT constraints P

Output: a (θ + 1)-approximation of minimum weightQ-subtreeunder constraintsP

1: returnMinWeightQSubtree(G, P ), ifP |RP T = ∅

2: T app ←AppWeightSuperTree(G, P )

3: return T app , if T app = ⊥ or T app is reduced

4: let Gbe obtained from G by removing all the edges u, v, where v is a non-root node of some

TP andu, v is not an edge in T

5: T min= minTP |RP T MinWeightQSubtree(G, P |NP T ∪ {T })

6: return⊥, if T min= ⊥

7: r ← root(T min )

8: if r belongs to a subtree TP |RP T then

9: r ← root(T )

10: G app ← T min ∪ T app

11: remove from G app all incoming edges of r

12: for all v ∈ V (G app ) that have two incoming edges e1∈ E(T min ) and e2∈ E(T app ) do

13: remove e2from G app

14: delete from G app all structural nodes v, such that no keyword is reachable from v

15: return G app

3 Union T app and T min , and remove redundant nodes and edges to get an (θ + 1)-approximate

Q-subtree (line 7-14) Note that, all the edges in T min are kept during the removal of redundant nodes and edges

The general idea ofAppWeightQSubtreeis that it first finds a θ-approximate super tree

ofP , denoted T app If T appdoes not exist, then there is noQ-subtreeunder constraintsP If T app

is reduced, then it is a θ-approximation of the minimum weightQ-subtree Otherwise, it finds

another subtree T min, which is guaranteed to be reduced If there is aQ-subtreeunder constraints

P , then T minmust exist and its weight must be smaller than the minimumQ-subtreebecause a subtree of the minimumQ-subtreesatisfies Line 5 Let r denote the root of T min; there are three

cases: either r is the root of a NPT in P , or r is the root node of a RPT in P , or r is not in P If r

is the root of a NPT inP , then it must have at least two children, otherwise the root of T appmust have an incoming edge (guaranteed by Line 5), as it is the root of one NPT inP If both T appand

T minexist, then from lines 7-14, it can get aQ-subtree Since each returned node, except the root node, has exactly one incoming edge, it will form a tree

Theorem 3.9 [ Kimelfeld and Sagiv , 2006b ] Consider a data graph G with n nodes and m edges Let

Q = {k1, · · · , k l } be a keyword query and P be a set of PT constraints, such that leaves(P ) = Q and P has at most c NPTs.AppWeightQSubtreefinds a (θ + 1)-approximation of the minimum weight

Trang 3

Q-subtreethat satisfies P in time O(f + 4c+1n+ 3c+1((l + log n)n + m)), where θ and f are the

approximation ratio and runtime, respectively, ofAppWeightSuperTree.

Finding 2-approximate minimum heightQ-subtreeunderP: Although MinWeightQSub-tree and AppWeightQSubtree can enumerate Q-subtrees in exact (or approximate) rank order, they are based on repeated computations of steiner trees (or approximate steiner trees) under inclusion and exclusion constraints; therefore, they are not practical.Golenberg et al.[2008] propose

to decoupleQ-subtreeranking step fromQ-subtreegeneration More precisely, it first generates

a set of N Q-subtrees that are candidate answers, by incorporating a much easier rank function

than the steiner tree weight function, and then generates a set of k final answers which are ranked

according to a more complex ranking function The ranking function used is based on the height, where the height of a tree is the maximum among the shortest distances to each keyword node, i.e.,

height (T )= maxl

i=1dist (root (T ), k i ) Ranking in increasing height order is very correlated to the desired ranking [Golenberg et al.,2008], so an enumeration algorithm is proposed to generate Q-subtrees in 2-approximation rank order with respect to the height ranking function

The general idea is the same as that of enumerating Q-subtrees in (θ + 1)-approximate

order with respect to the weight function, i.e.AppWeightQSubtree It also usesEnumTreePD

as the outer enumeration algorithm, and it implements the sub-routineQ-subtree() by returning

a 2-approximation of minimum heightQ-subtreeunder constraintsP Finding an approximate tree with respect to height ranking function under constraintsPis much easier than with the weight ranking function, i.e.,AppWeightQSubtree It also consists of three steps: (1) find a minimum height super tree ofP , T sup , and return T supif it is reduced or equal to⊥, (2) otherwise, find another

reduced subtree T minwhose height is guaranteed to be no larger than that of the minimum height Q-subtreeif one exists, (3) return the union of T sup and T minafter removing redundant edges and nodes

The algorithm to find a minimum height super tree ofT is shown in Algorithm 23 ( Min-HeightSuperTree[Golenberg et al.,2008]).The general idea is the same asBackwardSearch,

by creating an iterator for each leaf node ofT (lines 3-5) During each execution of Line 6, it first

finds the iterator I v, whose next node to be returned is the one with the shortest distance to its

source Let u to be that node If u has been returned from all the other iterators (line 9), it means that the shortest paths from u to all the leaf nodes in T have been computed The union of these shortest paths is a tree with minimum height to include all the leaf nodes ofT But there is one problem that remains to be solved: the tree returned must be a super tree ofT All the edgesu, v from G, where v is a non-root node of some TT andu, v is not an edge in T (line 1), can be

removed since in a tree every node can have at most one incoming edge andu, v must be included.

This operation makes sure that for every non-root ofT, the incoming edge inT is included Also, the root of the tree returned can not be a non-root ofT, which can be checked by Line 9 Then, the tree returned byMinHeightSuperTreeis a minimum height super tree ofT in G.

Trang 4

3.3 STEINER TREE-BASED KEYWORD SEARCH 67 Algorithm 23MinHeightSuperTree(G, T)

Input: a data graph G, and a set of PT constraints T

Output: a minimum height super tree ofT in G.

1: remove all the edgesu, v from G, where v is a non-root node of some T ∈ T andu, v is not an edge in T

2: I t H eap ← ∅

3: for each leave node, v ∈ leaves( T ) do

4: Create a single source shortest path iterator, I v , with v as the source node

5: I t H eap.insert(I v ) , the priority of I vis the distance of the next node it will return

6: while I tH eap= ∅ do

7: I v ← ItHeap.pop()

8: u ← I v next()

9: if u has been returned from all the other iterators and u is not a non-root node of any tree

TT then

10: return the subtree which is the union of the shortest paths from u to each leaf node of T

11: if I vhas more nodes to return then

12: I t H eap.insert(I v )

13: return

a

h

a

e

u

S h

P1

T2

P2

P1

Figure 3.8: Approximating a minimum heightQ-subtree[Golenberg et al.,2008]

If the tree T app found by MinHeightSuperTree is non-reduced, then it needs to find

another reduced tree T min , and the root of T appmust be the root of one NPT inP; without loss of

generality, we assume it to be P1 Note that, with respect to the height ranking function, it can not use the idea ofMinWeightQSubtreeto find minimum heightQ-subtree There are two cases

to consider depending on whether the following holds: in a minimum heightQ-subtreeA mthat satisfiesP, there is a path from the root to a single node PT inPthat does not use any edge ofP

Trang 5

Algorithm 24AppHeightQSubtree(G D,P)

Input: a data graph G, and a set of PT constraints P

Output: a 2-approximation of minimum heightQ-subtreeunder constraintsP

1: return ⊥, if Pconsists of a single NPT

2: T app ←MinHeightSuperTree(G, P )

3: return T , if T = ⊥ or T is aQ-subtree

4: if there exists single node RPTs inP then

5: construct the tree T1

6: if there exists two non-single node PTs inP then

7: construct the tree T2

8: return⊥, if T1= ⊥ and T2= ⊥

9: T min ← minimum height subtree among T1and T2

10: construct aQ-subtreeT from T app and T min

11: return T

The two cases are shown in Figure 3.8 Essentially, A mmust contain a subtree that looks like either

T1or T2 We discuss these cases below

T1describes the following situation: (1) one single node PT (e.g., keyword node h) is reachable from the root of A m through a path S hthat does not use any edge ofP ; and (2) P1is reachable from

the root of A m through a path S p that does not include any edge appearing on S h Let G v denote

the graph obtained from G by deleting all the non-root nodes of PTs in P , and G edenote the graph

obtained from G by deleting all edges u, v where v is a non-root node of a PT in P andu, v is

not inP For each single node PT inP , e.g., the keyword node h, it can find the minimum height subtree T h by concurrently running two iterators of Dijkstra’s algorithm, one with h as the source and works on G v , the other with root (P1) as the source and works on G e T1is the minimum height subtree among all the found subtrees

T2applies only whenP contains two non-single PTs, P1and P2, where P2can be either NPT

or RPT If P2is a NPT, then T2can not use any edge fromP , so it can be found in the graph G v

Otherwise, P2is a RPT, the root of T2can be the root of P2 Then it needs to build a new graph

Gfrom G as follows: (1) remove all the edges entering into non-root nodes of P2and are not in P2

itself (i.e., it is handled as in the construction of G e ); (2) remove all the non-root nodes of P1(i.e.,

it is handled as in the construction of G v ) In G, T2can be found by two iterators using Dijkstra’s algorithm

Theorem 3.10 [ Golenberg et al , 2008 ] Given a data graph G with n nodes and m edges, let Q=

{k1, · · · , k l } be a keyword query and P be a set of PT constraints, such that leaves(P ) = Q and P has

at most two non-single node PTs.AppHeightQSubtreefinds a 2-approximation of the minimum heightQ-subtreethat satisfies P in time O(l(n log n + m)).

Ngày đăng: 05/07/2014, 23:22