Since each returned node, except the root node, has exactly one incoming edge, it will form a tree... More precisely, it first generates a set of N Q-subtrees that are candidate answers,
Trang 1Algorithm 21MinWeightQSubtree(G, P)
Input: a data graph G, and a set of PT constraints P
Output: a minimum weightQ-subtreeunder constraintsP
1: return T , if P consists of a single RPT T
2: return⊥, ifPconsists of a single NPT
3: letT be obtained from P by replacing each N ∈P |NP T with the PT consisting of only the node root (N)
4: Initialize T min← ⊥, where ⊥ has a weight ∞
5: for all subsetsT t opofT, such that|T t op| ≥ 2 do
6: G1← G − ∪ T∈P V (T )+ ∪T∈T t op {root(T )}
7: T t op ←MinWeightSuperTree(G1, T t op )
8: if T t op = ⊥ then
9: T+
t op ← T t op∪T t op
10: let G2be obtained from G by removing all the incoming edges to root (T t op+ )
11: T ←MinWeightSuperTree(G2, P\T t op ∪ {T t op+ })
12: T min ← min{T min , T}
13: return T min
But, this will take exponential time, as line 5 inMinWeightQSubtree In the following, we
in-troduce approaches to find a (θ + 1)-approximate minimum weightQ-subtreeunder constraints
P
Algorithm 22 (AppWeightQSubtree [Kimelfeld and Sagiv, 2006b]) finds a (θ +
1)-approximation of the minimum weight Q-subtree in polynomial time under data-and-query complexity Recall that the number of NPTs of P is no more than 2 AppWeightQSubtree mainly consists of three steps
1 A minimum weightQ-subtreeis found by callingMinWeightQSubtreeif there are no RPTs inP (line 1) Otherwise, it finds a θ-approximation of minimum weight super tree of
P , T app, by callingAppWeightSuperTree(line 2) Return T app if it is aQ-subtreeor⊥ (line 3)
2 Find a reduced tree T min (lines 4-5) The weight of T minis guaranteed to be smaller than the minimum weightQ-subtree under constraintsP if one exists Note that this step can be accomplished by a single call to procedureMinWeightQSubtree, by adding to Ga virtual
node v and an edge to each root (T ) for all T ∈P |RP T, and callingMinWeightQSubtree
(G, P |NP T ∪ {v}) If T minis⊥, then there is noQ-subtreesatisfying the constraintsP(line 6)
Trang 23.3 STEINER TREE-BASED KEYWORD SEARCH 65 Algorithm 22AppWeightQSubtree(G, P)
Input: a data graph G, and a set of PT constraints P
Output: a (θ + 1)-approximation of minimum weightQ-subtreeunder constraintsP
1: returnMinWeightQSubtree(G, P ), ifP |RP T = ∅
2: T app ←AppWeightSuperTree(G, P )
3: return T app , if T app = ⊥ or T app is reduced
4: let Gbe obtained from G by removing all the edges u, v, where v is a non-root node of some
T ∈P andu, v is not an edge in T
5: T min= minT∈P |RP T MinWeightQSubtree(G, P |NP T ∪ {T })
6: return⊥, if T min= ⊥
7: r ← root(T min )
8: if r belongs to a subtree T ∈P |RP T then
9: r ← root(T )
10: G app ← T min ∪ T app
11: remove from G app all incoming edges of r
12: for all v ∈ V (G app ) that have two incoming edges e1∈ E(T min ) and e2∈ E(T app ) do
13: remove e2from G app
14: delete from G app all structural nodes v, such that no keyword is reachable from v
15: return G app
3 Union T app and T min , and remove redundant nodes and edges to get an (θ + 1)-approximate
Q-subtree (line 7-14) Note that, all the edges in T min are kept during the removal of redundant nodes and edges
The general idea ofAppWeightQSubtreeis that it first finds a θ-approximate super tree
ofP , denoted T app If T appdoes not exist, then there is noQ-subtreeunder constraintsP If T app
is reduced, then it is a θ-approximation of the minimum weightQ-subtree Otherwise, it finds
another subtree T min, which is guaranteed to be reduced If there is aQ-subtreeunder constraints
P , then T minmust exist and its weight must be smaller than the minimumQ-subtreebecause a subtree of the minimumQ-subtreesatisfies Line 5 Let r denote the root of T min; there are three
cases: either r is the root of a NPT in P , or r is the root node of a RPT in P , or r is not in P If r
is the root of a NPT inP , then it must have at least two children, otherwise the root of T appmust have an incoming edge (guaranteed by Line 5), as it is the root of one NPT inP If both T appand
T minexist, then from lines 7-14, it can get aQ-subtree Since each returned node, except the root node, has exactly one incoming edge, it will form a tree
Theorem 3.9 [ Kimelfeld and Sagiv , 2006b ] Consider a data graph G with n nodes and m edges Let
Q = {k1, · · · , k l } be a keyword query and P be a set of PT constraints, such that leaves(P ) = Q and P has at most c NPTs.AppWeightQSubtreefinds a (θ + 1)-approximation of the minimum weight
Trang 3Q-subtreethat satisfies P in time O(f + 4c+1n+ 3c+1((l + log n)n + m)), where θ and f are the
approximation ratio and runtime, respectively, ofAppWeightSuperTree.
Finding 2-approximate minimum heightQ-subtreeunderP: Although MinWeightQSub-tree and AppWeightQSubtree can enumerate Q-subtrees in exact (or approximate) rank order, they are based on repeated computations of steiner trees (or approximate steiner trees) under inclusion and exclusion constraints; therefore, they are not practical.Golenberg et al.[2008] propose
to decoupleQ-subtreeranking step fromQ-subtreegeneration More precisely, it first generates
a set of N Q-subtrees that are candidate answers, by incorporating a much easier rank function
than the steiner tree weight function, and then generates a set of k final answers which are ranked
according to a more complex ranking function The ranking function used is based on the height, where the height of a tree is the maximum among the shortest distances to each keyword node, i.e.,
height (T )= maxl
i=1dist (root (T ), k i ) Ranking in increasing height order is very correlated to the desired ranking [Golenberg et al.,2008], so an enumeration algorithm is proposed to generate Q-subtrees in 2-approximation rank order with respect to the height ranking function
The general idea is the same as that of enumerating Q-subtrees in (θ + 1)-approximate
order with respect to the weight function, i.e.AppWeightQSubtree It also usesEnumTreePD
as the outer enumeration algorithm, and it implements the sub-routineQ-subtree() by returning
a 2-approximation of minimum heightQ-subtreeunder constraintsP Finding an approximate tree with respect to height ranking function under constraintsPis much easier than with the weight ranking function, i.e.,AppWeightQSubtree It also consists of three steps: (1) find a minimum height super tree ofP , T sup , and return T supif it is reduced or equal to⊥, (2) otherwise, find another
reduced subtree T minwhose height is guaranteed to be no larger than that of the minimum height Q-subtreeif one exists, (3) return the union of T sup and T minafter removing redundant edges and nodes
The algorithm to find a minimum height super tree ofT is shown in Algorithm 23 ( Min-HeightSuperTree[Golenberg et al.,2008]).The general idea is the same asBackwardSearch,
by creating an iterator for each leaf node ofT (lines 3-5) During each execution of Line 6, it first
finds the iterator I v, whose next node to be returned is the one with the shortest distance to its
source Let u to be that node If u has been returned from all the other iterators (line 9), it means that the shortest paths from u to all the leaf nodes in T have been computed The union of these shortest paths is a tree with minimum height to include all the leaf nodes ofT But there is one problem that remains to be solved: the tree returned must be a super tree ofT All the edgesu, v from G, where v is a non-root node of some T ∈T andu, v is not an edge in T (line 1), can be
removed since in a tree every node can have at most one incoming edge andu, v must be included.
This operation makes sure that for every non-root ofT, the incoming edge inT is included Also, the root of the tree returned can not be a non-root ofT, which can be checked by Line 9 Then, the tree returned byMinHeightSuperTreeis a minimum height super tree ofT in G.
Trang 43.3 STEINER TREE-BASED KEYWORD SEARCH 67 Algorithm 23MinHeightSuperTree(G, T)
Input: a data graph G, and a set of PT constraints T
Output: a minimum height super tree ofT in G.
1: remove all the edgesu, v from G, where v is a non-root node of some T ∈ T andu, v is not an edge in T
2: I t H eap ← ∅
3: for each leave node, v ∈ leaves( T ) do
4: Create a single source shortest path iterator, I v , with v as the source node
5: I t H eap.insert(I v ) , the priority of I vis the distance of the next node it will return
6: while I tH eap= ∅ do
7: I v ← ItHeap.pop()
8: u ← I v next()
9: if u has been returned from all the other iterators and u is not a non-root node of any tree
T ∈T then
10: return the subtree which is the union of the shortest paths from u to each leaf node of T
11: if I vhas more nodes to return then
12: I t H eap.insert(I v )
13: return⊥
a
h
a
e
u
S h
P1
T2
P2
P1
Figure 3.8: Approximating a minimum heightQ-subtree[Golenberg et al.,2008]
If the tree T app found by MinHeightSuperTree is non-reduced, then it needs to find
another reduced tree T min , and the root of T appmust be the root of one NPT inP; without loss of
generality, we assume it to be P1 Note that, with respect to the height ranking function, it can not use the idea ofMinWeightQSubtreeto find minimum heightQ-subtree There are two cases
to consider depending on whether the following holds: in a minimum heightQ-subtreeA mthat satisfiesP, there is a path from the root to a single node PT inPthat does not use any edge ofP
Trang 5Algorithm 24AppHeightQSubtree(G D,P)
Input: a data graph G, and a set of PT constraints P
Output: a 2-approximation of minimum heightQ-subtreeunder constraintsP
1: return ⊥, if Pconsists of a single NPT
2: T app ←MinHeightSuperTree(G, P )
3: return T , if T = ⊥ or T is aQ-subtree
4: if there exists single node RPTs inP then
5: construct the tree T1
6: if there exists two non-single node PTs inP then
7: construct the tree T2
8: return⊥, if T1= ⊥ and T2= ⊥
9: T min ← minimum height subtree among T1and T2
10: construct aQ-subtreeT from T app and T min
11: return T
The two cases are shown in Figure 3.8 Essentially, A mmust contain a subtree that looks like either
T1or T2 We discuss these cases below
T1describes the following situation: (1) one single node PT (e.g., keyword node h) is reachable from the root of A m through a path S hthat does not use any edge ofP ; and (2) P1is reachable from
the root of A m through a path S p that does not include any edge appearing on S h Let G v denote
the graph obtained from G by deleting all the non-root nodes of PTs in P , and G edenote the graph
obtained from G by deleting all edges u, v where v is a non-root node of a PT in P andu, v is
not inP For each single node PT inP , e.g., the keyword node h, it can find the minimum height subtree T h by concurrently running two iterators of Dijkstra’s algorithm, one with h as the source and works on G v , the other with root (P1) as the source and works on G e T1is the minimum height subtree among all the found subtrees
T2applies only whenP contains two non-single PTs, P1and P2, where P2can be either NPT
or RPT If P2is a NPT, then T2can not use any edge fromP , so it can be found in the graph G v
Otherwise, P2is a RPT, the root of T2can be the root of P2 Then it needs to build a new graph
Gfrom G as follows: (1) remove all the edges entering into non-root nodes of P2and are not in P2
itself (i.e., it is handled as in the construction of G e ); (2) remove all the non-root nodes of P1(i.e.,
it is handled as in the construction of G v ) In G, T2can be found by two iterators using Dijkstra’s algorithm
Theorem 3.10 [ Golenberg et al , 2008 ] Given a data graph G with n nodes and m edges, let Q=
{k1, · · · , k l } be a keyword query and P be a set of PT constraints, such that leaves(P ) = Q and P has
at most two non-single node PTs.AppHeightQSubtreefinds a 2-approximation of the minimum heightQ-subtreethat satisfies P in time O(l(n log n + m)).