Managing and Mining Graph Data part 42 pdf

For an undirected graph, the degree of a node is the number of edges incident to the node.. Another problem that can be reduced to computing statistics of an array is the triangle counti

Trang 1

Other variants of the streaming model also exist For example, the W-stream model [15] allows the algorithm to write to (annotate) the stream during each pass These annotations can then be utilized by the algorithm in the successive passes Another variant [1] augments the streaming model by adding a sorting primitive

3 Statistics and Counting Triangles

In this section, we describe a set of problems that involve graphs but es-sentially can be reduced to problems whose input is an array presented as

a stream of the array elements (or as a sequence of increments to the el-ements) For example, the array 𝑎 = [2, 1, 3] can be given as a stream {(𝑎[1] + 1), (𝑎[3] + 1), (𝑎[3] + 1), (𝑎[2] + 1), (𝑎[1] + 1), (𝑎[3] + 1)}

As-suming all the entries of the array take value 0 at the beginning of the stream, after the operations in the stream, we obtain the array𝑎

There are many streaming algorithms that computes, for this array, statistics such as frequency moments [3, 24, 31], heavy hitters [13, 10], and construct succinct data structures that support queries such as range queries [38] These algorithms can be directly applied once the graph problem is reduced to the corresponding problem of an array

We consider these problems involving the degree of the graph nodes For

an undirected graph, the degree of a node is the number of edges incident

to the node One may view that there is a virtual array 𝐷 associated with

each graph such that 𝐷[𝑖] is the degree of the 𝑖-th node In the streaming

setting, a stream of edges translates to updates to the array𝐷 For example, the

stream {(1, 2), (4, 8), (2, 7) } means the operation sequence: {(𝐷[1] + 1), (𝐷[2] + 1), (𝐷[4] + 1), (𝐷[8] + 1), (𝐷[2] + 1), (𝐷[7] + 1), } (The degree

array can be extended to directed graph, where we may have one out-degree array and one in-degree array.)

The frequency moment problem is to compute the k-th moment 𝑓𝑘 =

∑𝑛

𝑖=1(𝐷[𝑖])𝑘of the node degrees The heavy hitter problem is to report, after seeing the graph stream, the nodes having the largest degrees The range query requires to construct a succinct representation of the array (one that is much smaller in size than the array), from which∑𝑘

𝑖=𝑗𝐷[𝑖], given j and k as query

input, can be calculated

Cormode and Muthu show [14] that these problems can be solved using cor-responding streaming algorithms that work for an array They further provide algorithms for these problems when the graph is a multigraph, but the degree

of a node is defined to count only the distinct edges (e.g if the stream for a multigraph has edges(1, 2), (2, 5), (1, 2), the degree of the node 1 is 1, not 2

and the degree of the node 2 is 2, not 3.) The details of the algorithms are out

of the scope of this survey We refer readers to [14] and the aforementioned

Trang 2

literatures for streaming algorithms that compute statistics and other queries for an array

The node degree of a graph is also related to the entropy 𝐻 of an

un-biased random walk on the graph [9] In particular, 𝐻 is defined to be

𝐻 = 2∣𝐸∣1 ∑𝑛

𝑖=1𝐷[𝑖] log 𝐷[𝑖] A streaming algorithm that computes the

en-tropy for an array, of which the𝑖-th entry represents the frequency of the 𝑖-th

element in a set is given in [9] The authors showed that the algorithm can be applied to compute the entropy when the array is the node-degree array𝐷 for a

graph, and therefore the entropy of an unbiased random walk can be calculated for a graph stream They also extended the algorithm to multigraphs where only distinct edges are counted for the degree

Another problem that can be reduced to computing statistics of an array

is the triangle counting problem, i.e., to find the number of triangles in an undirected graph We describe here the reduction introduced by Bar-Yossef

et al [6] Similar to the earlier problems, there is a virtual array𝑃 associated

with the graph Each entry in the array corresponds to an (unordered) triple

of the graph nodes e.g., if 𝑣𝑖, 𝑣𝑗, 𝑣𝑘 are three nodes in the graph, there is

an entry 𝑃 [(𝑖, 𝑗, 𝑘)] in the array corresponds to the triple {𝑣𝑖, 𝑣𝑗, 𝑣𝑘} The

value of the entry counts how many of the three pairs {𝑣𝑖, 𝑣𝑗}, {𝑣𝑖, 𝑣𝑘}, and {𝑣𝑗, 𝑣𝑘} are actual edges in the graph There are 4 possible values for the

entries 0, 1, 2, and 3 Let𝑇0,𝑇1,𝑇2, and𝑇3be the number of entries that take the corresponding value Clearly, 𝑇3 is exactly the number of triangles in the graph (We will abuse the notation and also use𝑇𝑖to denote the set of triples whose entry value is𝑖.)

Different from the reduction described earlier, an edge in the graph stream here maps into updates of multiple entries in the array If we see an edge

(𝑢, 𝑣), it means (𝑃 [(𝑢, 𝑣, 𝑠)] + 1) for all nodes 𝑠 ∕= 𝑢, 𝑣 Now consider the

frequency moments of the array𝑓𝑘 = ∑

𝑡(𝑃 [𝑡])𝑘 It can be decomposed into

𝑓𝑘 = 𝑇1⋅ 1𝑘+ 𝑇2⋅ 2𝑘+ 𝑇3 ⋅ 3𝑘 because each entry with value 1 contributes

1𝑘 to𝑓𝑘, with value 2, 2𝑘 and with value 3, 3𝑘 We can have the following equations:

⎛

⎝ 𝑓𝑓01

𝑓2

⎞

⎠ =

⎛

⎝ 1 1 11 2 3

1 4 9

⎞

⎠ ⋅

⎛

⎝ 𝑇𝑇12

𝑇3

⎞

⎠

Using streaming algorithms one can estimate 𝑓0, 𝑓2 𝑓1 can be easily ob-tained from the stream Solving the above equation then gives us the estimate

of 𝑇3 (Although the size of the virtual array is larger than the size of the graph stream, e.g., a stream of𝑚 edges corresponds to an array with 𝑚(𝑛− 2)

entries, the estimate algorithms often use space logarithmic to the size of the array Therefore, the memory space needed is not significantly affected by the reduction.)

Trang 3

In [6], Bar-Yossef et al.also proposed improved streaming

frequency-moment estimate algorithms Using the reduction and their frequency-frequency-moment estimation, they show that for𝜖, 𝛿 > 0, the number of triangles in a graph can

be estimated within𝜖 error (i.e., the estimate is bounded between (1−𝜖)𝑇3and

(1 + 𝜖)𝑇3) with at least1− 𝛿 probability The algorithm uses space

𝑠 = 𝑂

( 1

𝜖3 ⋅ log1𝛿 ⋅

(

𝑇1+ 𝑇2+ 𝑇3

𝑇3

)3

⋅ log 𝑛 )

and poly(𝑠) process time for each edge When the stream is an incident stream,

they show that, the number of triangles can be(𝜖, 𝛿)-estimated using space

𝑂

(

1

𝜖2 ⋅ log1𝛿 ⋅

(

𝑇3+ 𝑇2

𝑇3

)2

⋅ log 𝑛 + 𝑑𝑚𝑎𝑥log 𝑛

)

where𝑑𝑚𝑎𝑥is the maximum degree

In a follow-up work, Jowhari and Ghodsi [33] introduced several estimators for the number of triangles One estimator uses sequences of random numbers

in a way similar to [3] Let 𝑅 be an array of uniform, ±1-valued random

numbers, i.e., 𝑃 (𝑅[𝑖] = 1) = 𝑃 (𝑅[𝑖] = −1) = 0.5 and 𝐸(𝑅[𝑖]) = 0

The random numbers in the array are 12-wise independent A family of such random arrays can be constructed using the BCH code [3] in log-space While the edges stream by, one computes 𝑍 = ∑

(𝑖,𝑗)∈𝐸𝑅[𝑖]𝑅[𝑗] 𝑋 = 𝑍3/6 is

then an estimator for the number of triangles in the graph This is so because

𝐸(𝑅𝑘[𝑖]) = 0 for odd 𝑘 and the numbers in 𝑅 are 12-wise independent After

the expansion of𝑋, the expectations of the terms all evaluate to zero except

those in form of6𝑅2[𝑖]𝑅2[𝑗]𝑅2[𝑘], which correspond to the triangles Jowhari

and Ghodsi showed that the variance of the estimator can be controlled such that only𝑂(𝜖12⋅log1𝛿⋅(𝑚3+𝑚𝐶4 +𝐶6

𝑇 2 +1)⋅log 𝑛) space and per-edge processing

time is needed for an(𝜖, 𝛿)-estimation (𝐶𝑘is the number of cycles of length𝑘

in the graph.) Another two sample-based estimators are also proposed in [33]

Buriol et al.also proposed sample-based algorithms for counting triangles

in [8] We present one of their algorithms in Algorithm 13.1

𝛽 is a{0, 1}-valued random variable whose expectation is 3𝑇3

𝑇 1 +2𝑇 2 +3𝑇 3 Be-cause𝑇1+ 2𝑇2+ 3𝑇3 = 𝑚(𝑛− 2), (Consider the triples consist of two end

nodes of an edge plus one node from the other𝑛− 2 There are 𝑚(𝑛 − 2) such

combinations On the other hand, this way of counting counts each triple in𝑇1

once, triples in𝑇2twice and triples in𝑇3three times Hence the equality.) 𝑇3 can be estimated using a set of samples of𝛽 For making (𝜖, 𝛿)-estimation, this

algorithm uses𝑂((𝜖12 ⋅ log1𝛿⋅𝑇1 +𝑇 2 +𝑇 3

𝑇 3 ) memory space and constant expected

per-edge process time

Buriol et al.further showed that Algorithm 13.1 can be modified into a

one-pass algorithm The uniform sampling of the edges can be done in one one-pass by

Trang 4

Algorithm 13.1: Sample Triangle

1st pass: Count the number of edges in the graph.

1

2nd pass: Sample an edge (𝑢, 𝑣) uniformly Choose a node 𝑤 uniformly 2

from𝑉 ∖ {(𝑢, 𝑣)}

3rd pass:

3

if Both (𝑢, 𝑤) and (𝑣, 𝑤) are actual edges in the stream then

4

𝛽 = 1

5

else

6

𝛽 = 0

7

end

8

return 𝛽

9

reservoir sampling [43] One difference here is that edges (𝑢, 𝑤) and (𝑣, 𝑤)

may arrive before(𝑢, 𝑣) in the stream When (𝑢, 𝑣) gets selected as a sample,

we have missed(𝑢, 𝑤) and (𝑣, 𝑤) and would not detect 𝑢, 𝑣, 𝑤 as an triangle

This happens when(𝑢, 𝑣) is not the first edge of the triangle in the stream and

it reduces the expectation of𝛽 by a factor of 3 Sample-based algorithms are

also proposed in [8] for incidence streams

A matching in a graph is a set of edges without common nodes For an un-weighted graph, the maximum matching problem is to find a matching having the largest cardinality (number of edges) For a weighted graph, the problem

is to find a matching whose edges give the largest weight sum We survey un-weighted and un-weighted matching algorithms for graph streams in the following sections

4.1 Unweighted Matching

An early algorithm for approximating unweighted bipartite matching in the streaming model is given in [22] We describe the algorithm here It is easy to see that a maximal matching (A matching no more edge can be added because every edge outside the match share a vertex with some edge in the matching.) can be constructed in one pass over the graph stream

Given a matching𝑀 for a bipartite graph 𝐺 = (𝐿∪ 𝑅, 𝐸), a length-3

aug-menting path for an edge𝑒 = (𝑢, 𝑣) ∈ 𝑀, 𝑢 ∈ 𝐿 and 𝑣 ∈ 𝑅, is a quadruple (𝑤𝑙, 𝑢, 𝑣, 𝑤𝑟) such that (𝑢, 𝑤𝑙), (𝑤𝑟, 𝑣) ∈ 𝐸, and 𝑤𝑙and 𝑤𝑟 are free vertices

We call𝑤𝑙 and𝑤𝑟the wing-tips of the augmenting path,(𝑢, 𝑤𝑙) the left wing

and(𝑤𝑟, 𝑣) the right wing A set of simultaneously augmentable length-3

aug-menting paths is a set of length-3 augaug-menting paths that are vertex disjoint.

Trang 5

Algorithm 13.2: Find Augmenting Paths

Input: a graph 𝐺 = (𝐿 ∪ 𝑅, 𝐸), a matching 𝑀 for 𝐺 and a parameter

0 < 𝛿 < 1

while true do

1

In one pass, find a maximal set of disjoint left wings If the number of

2

left wings found is≤ 𝛿𝑀, terminate

In a second pass, for the edges in𝑀 with left wings, find a maximal

3

set of disjoint right wings

In a third pass we identify the set of vertices that are

4

endpoints of a matched edge that got a left wing, or

the wing tips of a matched edge that got both wings, or

endpoints of a matched edge that is no longer 3 augmentable

We remember these vertices and in subsequent passes, we ignore any edge incident on one of these vertices

end

5

Given a bipartite graph and a matching in the graph, the subroutine in Al-gorithm 13.2 finds a set of simultaneously augmentable length-3 augmenting paths It will be used in the main algorithm that computes the matching for a bipartite graph

Let 𝑋 be a maximum-sized set of simultaneously augmentable length-3

augmenting paths for the maximal matching 𝑀 Let 𝛼 = ∣𝑀∣∣𝑋∣ It is shown

in [22] that Algorithm 13.2 finds at least 𝛼∣𝑀∣−2𝛿∣𝑀∣3 simultaneously aug-mentable length-3 augmenting paths in3/𝛿 passes

The main matching algorithm increases the size of a matching by repeatedly finding a set of simultaneously augmentable length-3 augmenting paths and augmenting the matching using these paths

The for-loop in Algorithm 13.3 runs⌈𝑙𝑜𝑔8/9log 6𝜖⌉ times During each run, the

subroutine described in Algorithm 13.2 needs to go through the input graph stream3/𝛿 passes Therefore, Algorithm 13.3 in total goes through the stream

𝑂(

log 1/𝜖

𝜖

)

passes Each call to the subroutine will find a set of simultane-ously augmentable length-3 augmenting paths which increases the size of the matching The final matching size reaches at least(2/3− 𝜖) of the maximum

matching The algorithm processes each edge in𝑂(1) time in each pass except

the first pass, in which the bipartition is found The storage space required by the algorithm is𝑂(𝑛 log 𝑛)

Trang 6

Algorithm 13.3: Unweighted Bipartite Matching

Input: a bipartite graph 𝐺 = (𝐿 ∪ 𝑅, 𝐸) and a parameter 0 < 𝜖 < 1/3.

In one pass, find a maximal matching𝑀 and the bipartition of 𝐺

1

for 𝑘 = 1, 2, , ⌈ log 6𝜖

𝑙𝑜𝑔8/9⌉ do 2

Run Algorithm 13.2 with𝐺, 𝑀 and 𝛿 = 2−3𝜖𝜖

3

for each 𝑒 = (𝑢, 𝑣) ∈ 𝑀 for which an augmenting path (𝑤𝑙, 𝑢, 𝑣, 𝑤𝑟)

4

is found by algorithm 13.2do

remove(𝑢, 𝑣) from 𝑀 and add (𝑢, 𝑤𝑙) and (𝑤𝑟, 𝑣) to 𝑀

5

end

6

end

7

Figure 13.1 Layered Auxiliary Graph Left, a graph with a matching (solid edges); Right, a layered

auxiliary graph (An illustration, not constructed from the graph on the left The solid edges show potential augmenting paths.)

In [35], McGregor introduced an improved algorithm to find augmenting paths in an unweighted graph for which a maximal match has been constructed Given the original input graph𝐺 and a matching 𝑀 , McGregor constructed an

auxiliary graph 𝐺𝐴 to help searching for augment paths Fig 13.1 gives an example of one auxiliary graph The auxiliary graph is a layered graph with a small number,𝑘+2, of layers It is derived as follows: Let 𝐿0, 𝐿1, , 𝐿𝑘+1be the layers in𝐺𝐴 The free nodes in𝐺, i.e the nodes that haven’t been covered

by an edge in𝑀 , are randomly projected to be nodes in 𝐿0or𝐿𝑘+1 The edges

in𝑀 are projected to be a node in 𝐺𝐴 and this node is randomly assigned to

be in a layer of𝐿1, 𝐿2, , 𝐿𝑘 There is an edge between a node𝑥∈ 𝐿𝑖 (that corresponding to (𝑣1, 𝑣2) ∈ 𝑀) and a node 𝑦 ∈ 𝐿𝑖 −1 (that corresponding to

(𝑣3, 𝑣4) ∈ 𝑀) if (𝑣2, 𝑣3) ∈ 𝐺 With this construction, an (𝑖 + 1)-length path

in𝐺𝐴can be mapped to a(2𝑖 + 1)-length augmenting path for 𝑀 in 𝐺

Identifying a set of augmenting paths for𝑀 in 𝐺 now is transformed to find

a set of node-disjoint paths in 𝐺𝐴 Because one doesn’t have enough space

to store the whole graph 𝐺 in the streaming model, normally, the auxiliary

graph 𝐺𝐴cannot be stored as a whole graph neither However, the nodes in

Trang 7

𝐺𝐴can be stored While the algorithm passes through the input stream of𝐺,

the edges in𝐺𝐴also gets revealed Hence, the problem boils down to find a near-maximal set of node-disjoint paths in𝐺𝐴

A search algorithm was proposed in [35] for this purpose The algorithm finds a maximal matching between layers𝐿𝑖 −1and𝐿𝑖 Let𝑆𝑖 ∈ 𝐿𝑖be the set

of nodes involved in this matching The algorithm then goes ahead to find a maximal matching between𝑆𝑖and𝐿𝑖+1 It continues in this fashion to grow a set of node-disjoint paths Clearly, the size of𝑆𝑖may decrease while𝑖 increases

and may become empty before the last layer is reached To avoid this, the path growth process may backtrack if the size of 𝑆𝑖 becomes too small The backtrack is done by marking the nodes in 𝑆𝑖 as deadends, removing them from𝐺𝐴and continuing path growth in the remaining of𝐺𝐴

For a particular𝐺𝐴construction and path growth, the resulting set of paths may be small However, the𝐺𝐴construction is random because the nodes cor-responding to the edges in𝑀 are randomly assigned to the layers A matching

algorithm is given in [35] that is similar to Algorithm 13.3 in structure but uti-lizes the𝐺𝐴-based augmenting-path search It is shown that, with high proba-bility, this algorithm finds a matching in𝑂𝜖(1) (a function of 𝜖 and a constant

is𝜖 is constant) passes whose size is at least 1+𝜖1 of the maximum matching

4.2 Weighted Matching

The streaming version of the problem was first studied in [22] where a streaming algorithm (Algorithm 13.4) was proposed The algorithm uses only one pass over the stream and manages to find a matching which is at least 16 of the optimal size

Algorithm 13.4: Weighted Matching

Maintain a matching𝑀 at all times

1

while there are edges in the stream do

2

Let𝑒 be the next edge in the stream and 𝑤(𝑒) be the weight of 𝑒;

3

Let𝑤(𝐶) be the sum of the weights of the edges in

4

𝐶 ={𝑒′∣𝑒′ ∈ 𝑀 and 𝑒′and𝑒 share an end point} (𝑤(𝐶) = 0 if 𝐶 is

empty.)

if 𝑤(𝑒) > 2𝑤(𝐶) then

5

update𝑀 ← 𝑀 ∪ {𝑒} ∖ 𝐶

6

else

7

ignore𝑒

8

end

9

end

10

The following property of Algorithm 13.4 is shown in [22]

Trang 8

Theorem 13.2 In 1 pass and 𝑂(𝑛 log 𝑛) storage, Algorithm 13.4 constructs

a weighted matching that is at least 1

6 of the optimal size.

Proof: For any set of edges 𝑆, let 𝑤(𝑆) = ∑𝑒∈𝑆𝑤(𝑒) We say that an edge

is selected if it is ever part of 𝑀 We say that an edge is dropped if it was

selected early but later replaced from𝑀 (step 6 in Algorithm 13.4) by a new

heavier edge This new edge replaces the dropped edge We say an edge is a survivor if it is selected and never dropped Let the set of survivors be 𝑆 The

weight of the matching we find is therefore𝑤(𝑆)

For each survivor 𝑒, let the Trail of Drops leading to this edge be 𝑇 (𝑒) =

𝐶1 ∪ 𝐶2 ∪ where 𝐶0 = {𝑒}, 𝐶1 = {the edges replaced by 𝑒}, and

𝐶𝑖 = ∪𝑒 ′ ∈𝐶 𝑖−1{the edges replaced by 𝑒′} We have 𝑤(𝑇 (𝑒)) ≤ 𝑤(𝑒) This

is because for each replacing edge 𝑒, 𝑤(𝑒) is at least twice the cost of the

re-placed edges, and an edge has at most one replacing edge Hence, for all 𝑖, 𝑤(𝐶𝑖)≥ 2𝑤(𝐶𝑖+1) and

2𝑤(𝑇 (𝑒)) =∑

𝑖 ≥1

2𝑤(𝐶𝑖)≤∑

𝑖 ≥0

𝑤(𝐶𝑖) = 𝑤(𝑇 (𝑒)) + 𝑤(𝑒)

Now consider the optimal solution that includes edgesopt ={𝑜1, 𝑜2, }

We are going to charge the costs of the edges inopt to the survivors and their

trail of drops, ∪𝑒∈𝑆𝑇 (𝑒) ∪ {𝑒} We hold an edge 𝑒 in this set accountable

to 𝑜 ∈ opt if either 𝑒 = 𝑜 or if 𝑜 wasn’t selected because 𝑒 was in 𝑀 when

𝑜 arrived Note that, in the second case, it is possible for two edges to be

accountable to𝑜 If only one edge is accountable for 𝑜 then we charge 𝑤(𝑜) to

𝑒 If two edges 𝑒1 and𝑒2 are accountable for𝑜, then we charge 𝑤(𝑜)𝑤(𝑒1 )

𝑤(𝑒 1 )+𝑤(𝑒 2 ) to

𝑒1 and 𝑤(𝑜)𝑤(𝑒2 )

𝑤(𝑒 1 )+𝑤(𝑒 2 ) to𝑒2 In either case, the amount charged by𝑜 to any edge

𝑒 is at most 2𝑤(𝑒)

We now redistribute these charges as follows: (for distinct 𝑢1, 𝑢2, 𝑢3) if

𝑒 = (𝑢1, 𝑣) gets charged by 𝑜 = (𝑢2, 𝑣), and 𝑒 subsequently gets replaced by

𝑒′ = (𝑢3, 𝑣), we transfer the charge from 𝑒 to 𝑒′ Note that we maintain the property that the amount charged by𝑜 to any edge 𝑒 is at most 2𝑤(𝑒) because 𝑤(𝑒′) ≥ 𝑤(𝑒) What this redistribution of charges achieves is that now every

edge in a trail of drops is only charged by one edge in opt Survivors can,

however, be charged by two edges inopt We charge 𝑤(opt) to the survivors

and their trails of drops, and hence

𝑤(opt)≤∑

𝑒∈𝑆

(2𝑤(𝑇 (𝑒)) + 4𝑤(𝑒))

Because𝑤(𝑇 (𝑒))≤ 𝑤(𝑒),

∑

𝑒∈𝑆

(2𝑤(𝑇 (𝑒)) + 4𝑤(𝑒))≤ 6𝑤(𝑆)

Trang 9

and the theorem follows □

The condition on line 5 of Algorithm 13.4 can be generalized to be𝑤(𝑒) > (1 + 𝛾)𝑤(𝐶), 𝐶 = {𝑒′∣𝑒′ ∈ 𝑀 and 𝑒′ and𝑒 share an end point} By setting

𝛾 appropriately and repeating Algorithm 13.4 until the improvement yielded

falls below some threshold, a matching can be constructed [35] in𝑂𝜖(1) passes

whose size is at least 2+𝜖1 of the maximum matching

Another improvement for weighted matching was made recently by Zelke [46] Zelke’s algorithm is also based on Algorithm 13.4, but incorpo-rates some improvements In particular, the algorithm stores a few edges that have been in𝑀 in the past but were replaced later, to potentially reinsert them

into𝑀 in the future Such edges are called in [46] the “shadow edges." With

shadow edges, when a new edge arrives in the stream, besides the (two) edges that sharing the endpoints with the new edge, a few other edges (edges in𝑀 as

well as the shadow edges) in the vincinity of the new edge can be examined to find potential augmenting path This improves the approximation from 1/5.82 (by an algorithm in [35]) to 1/5.58

5 Graph Distance

We consider the shortest-path distance in a graph The shortest path between two vertices in a graph is the path that has the smallest number of edges (for

an unweighted graph) or the smallest sum of the weights of the path edges (for

a weighted graph) There may be more than one such shortest path

A structure often used in approximating graph distance is the graph span-ner [39, 11, 18] An undirected graph 𝐺 = (𝑉, 𝐸) induces a metric space

𝒰 in which the vertex set 𝑉 serves as the set of points, and the

shortest-path distances serve as the distances between the points The graph spanner

𝐺′ = (𝑉, 𝐻), 𝐻 ⊆ 𝐸, is a sparse skeleton of the graph 𝐺 whose induced

metric space 𝒰′ is a close approximation of the metric space𝒰 of the graph

𝐺 That is, the distance between two vertices in 𝐺′is not far from the distance between the same two vertices in𝐺 For example, a subgraph 𝐺′ = (𝑉, 𝐻),

𝐻 ⊆ 𝐸 is a (multiplicative) 𝑡-spanner of the graph 𝐺, if for every pair of

ver-tices𝑢, 𝑣 ∈ 𝑉 , 𝑑𝑖𝑠𝑡𝐺 ′(𝑢, 𝑣) ≤ 𝑡 ⋅ 𝑑𝑖𝑠𝑡𝐺(𝑢, 𝑣) (where 𝑑𝑖𝑠𝑡𝐺(𝑢, 𝑣) stands for

the distance between the vertices𝑢 and 𝑣 in the graph 𝐺) The stretch factor

of a spanner is the parameter(s) that determines how close the spanner approx-imates the distances in the original graph, e.g., in the case of a𝑡-spanner, the

parameter𝑡

Clearly, if a spanner can be constructed for a massive graph, one can approx-imate the node distance in the graph using the spanner Because the spanner

is much smaller than the original graph, it can often be stored in the main memory In fact, an early application of spanners is to maintain a succinct rep-resentation of the routing information [39, 11] Instead of the original network

Trang 10

graph, spanners are passed and stored by the routers for calculating the routing paths Besides distances, the diameter of a graph can be approximated using the spanner diameter

In [22], Feigenbaum et al.gave a simple streaming algorithm for

spanner-construction by adapting the technique of [4] It displays a certain connection

between the girth of a graph and the spanner (The girth of a graph is the length

of the shortest cycle in the graph.) However, in the worst case, the algorithm needs more than 𝑂(𝑛) time to process an edge Such a processing time is

prohibitively high for the streaming model

For an unweighted graph, the algorithm of [22] in one pass constructs a

(log 𝑛/ log log 𝑛)-spanner 𝑆: Because a graph whose girth is larger than 𝑘

have at most ⌈𝑛1+2/(𝑘 −2)⌉ edges [7, 17, 2], the algorithm constructs 𝑆 by

adding an edge in the stream to𝑆 if the edge does not cause a cycle of length

less than log 𝑛/ log log 𝑛 in the 𝑆 constructed so far Otherwise, the edge is

ignored Note that for each ignored edge, there is a path 𝑃 of length at most log 𝑛/ log log 𝑛 in 𝑆 that connects the two endpoints of this edge Any shortest

path in the original graph that uses this edge can be replaced by a path in𝑆 that

uses𝑃 Therefore, 𝑆 is a log 𝑛/ log log 𝑛 spanner of the original graph

For a weighted graph, however, the construction in [4] requires sorting the edges according to their weights, which is difficult in the streaming model Instead of sorting, a geometric grouping technique is used in [22] to extend the spanner construction for unweighted graphs to a construction for weighted graphs This technique is similar to the one used in [12] Let 𝜔𝑚𝑖𝑛 be the minimum weight and 𝜔𝑚𝑎𝑥 be the maximum weight We divide the range

[𝜔𝑚𝑖𝑛, 𝜔𝑚𝑎𝑥] into intervals of the form [(1 + 𝜖)𝑖𝜔𝑚𝑖𝑛, (1 + 𝜖)𝑖+1𝜔𝑚𝑖𝑛) and

round all the weights in the interval[(1+𝜖)𝑖𝜔𝑚𝑖𝑛, (1+𝜖)𝑖+1𝜔𝑚𝑖𝑛) down to (1+ 𝜖)𝑖𝜔𝑚𝑖𝑛 For each induced graph𝐺𝑖 = (𝑉, 𝐸𝑖), where 𝐸𝑖is the set of edges in

𝐸 whose weight is in the interval [(1+𝜖)𝑖𝜔𝑚𝑖𝑛, (1+𝜖)𝑖+1𝜔𝑚𝑖𝑛), a spanner can

be constructed in parallel using the above construction for unweighted graphs The union of the spanners for all the 𝐺𝑖, 𝑖 ∈ {0, 1, , log(1+𝜖) 𝜔𝜔𝑚𝑎𝑥𝑚𝑖𝑛 − 1},

forms a spanner for the graph 𝐺 Note that this can be done without prior

knowledge of𝜔𝑚𝑖𝑛and𝜔𝑚𝑎𝑥 The goal is to break the range[𝜔𝑚𝑖𝑛, 𝜔𝑚𝑎𝑥] into

a small number of intervals Given any value 𝜔 ∈ [𝜔𝑚𝑖𝑛, 𝜔𝑚𝑎𝑥], we can use

the set of intervals of the form[(1 + 𝜖)𝑖𝜔, (1 + 𝜖)𝑖+1𝜔) and [(1+𝜖)𝜔𝑖+1,(1+𝜖)𝜔 𝑖)

Therefore, we can determine the intervals without the prior knowledge of𝜔𝑚𝑖𝑛

and𝜔𝑚𝑎𝑥

5.1 Distance Approximation using Multiple Passes

Elkin and Zhang gave a multiple-pass streaming spanner construction

in [21] This algorithm builds an additive spanner A subgraph 𝐺′ = (𝑉, 𝐻)

Định dạng
Số trang	10
Dung lượng	1,89 MB