Lecture Notes for Chapter 21: Data Structures for Disjoint Sets 21-3• Need to update the representative pointer for every node on x’s list.. Great heuristics • Union by rank: make the ro
Trang 121-2 Lecture Notes for Chapter 21: Data Structures for Disjoint Sets
Analysis:
• Since MAKE-SETcounts toward total # of operations, m ≥ n.
• Can have at most n − 1 UNIONoperations, since after n− 1 UNIONs, only 1set remains
• Assume that the Þrst n operations are MAKE-SET(helpful for analysis, usuallynot really necessary)
Application: dynamic connected components.
For a graph G = (V, E), vertices u, v are in same connected component if and
only if there’s a path between them
• Connected components partition vertices into equivalence classes
CONNECTED-COMPONENTS(V, E)
for each vertexv ∈ V
do MAKE-SET(v)
for each edge(u, v) ∈ E
do if FIND-SET(u) = FIND-SET(v)
then UNION(u, v)
SAME-COMPONENT(u, v)
if FIND-SET(u) = FIND-SET(v)
then returnTRUE
else returnFALSE
Note: If actually implementing connected components,
• each vertex needs a handle to its object in the disjoint-set data structure,
• each object in the disjoint-set data structure needs a handle to its vertex
Linked list representation
• Each set is a singly linked list
• Each list node has Þelds for
• the set member
• pointer to the representative
• next
• List has head (pointer to representative) and tail.
MAKE-SET: create a singleton list
FIND-SET: return pointer to representative
UNION: a couple of ways to do it
1 UNION(x, y): append x’s list onto end of y’s list Use y’s tail pointer to Þnd
the end
Trang 2Lecture Notes for Chapter 21: Data Structures for Disjoint Sets 21-3
• Need to update the representative pointer for every node on x’s list.
• If appending a large list onto a small list, it can take a while
Operation # objects updated
Amortized time per operation= (n).
2 Weighted-union heuristic: Always append the smaller list to the larger list.
A single union can still take(n) time, e.g., if both sets have n/2 members.
Theorem
With weighted union, a sequence of m operations on n elements takes
O (m + n lg n) time.
Sketch of proof Each MAKE-SETand FIND-SETstill takes O (1) How many
times can each object’s representative pointer be updated? It must be in thesmaller set each time
times updated size of resulting set
• 1 tree per set Root is representative
• Each node points only to its parent
Trang 321-4 Lecture Notes for Chapter 21: Data Structures for Disjoint Sets
c
b
f d g
f c
b
d g
UNION(e,g)
• MAKE-SET: make a single-node tree
• UNION: make one root a child of the other
• FIND-SET: follow pointers to the root
Not so good—could get a linear chain of nodes
Great heuristics
• Union by rank: make the root of the smaller tree (fewer nodes) a child of the
root of the larger tree
• Don’t actually use size.
• Use rank, which is an upper bound on height of node.
• Make the root with the smaller rank into a child of the root with the largerrank
• Path compression: Find path= nodes visited during FIND-SETon the trip tothe root Make all nodes on the Þnd path direct children of root
a b
Trang 4Lecture Notes for Chapter 21: Data Structures for Disjoint Sets 21-5
FIND-SETmakes a pass up to Þnd the root, and a pass down as recursion unwinds
to update each node on Þnd path to point directly to root
Trang 5Solutions for Chapter 21:
Data Structures for Disjoint Sets
Solution to Exercise 21.2-3
We want to show that we can assign O (1) charges to MAKE-SETand FIND-SET
and an O (lg n) charge to UNION such that the charges for a sequence of these
operations are enough to cover the cost of the sequence—O (m +n lg n), according
to the theorem When talking about the charge for each kind of operation, it ishelpful to also be able to talk about the number of each kind of operation
Consider the usual sequence of m MAKE-SET, UNION, and FIND-SEToperations,
n of which are MAKE-SET operations, and let l < n be the number of UNION
operations (Recall the discussion in Section 21.1 about there being at most n− 1
UNIONoperations.) Then there are n MAKE-SEToperations, l UNIONoperations,
and m − n − l FIND-SEToperations
The theorem didn’t separately name the number l of UNIONs; rather, it bounded
the number by n If you go through the proof of the theorem with l UNIONs, you
get the time bound O (m −l +l lg l) = O(m +l lg l) for the sequence of operations.
That is, the actual time taken by the sequence of operations is at most c (m + l lg l),
for some constant c.
Thus, we want to assign operation charges such that
(MAKE-SETcharge) · n
+(FIND-SETcharge) · (m − n − l)
+(UNIONcharge) · l
≥ c(m + l lg l) ,
so that the amortized costs give an upper bound on the actual costs
The following assignments work, where cis some constant≥ c:
Trang 6Solutions for Chapter 21: Data Structures for Disjoint Sets 21-7
Solution to Exercise 21.2-5
Let’s call the two lists A and B, and suppose that the representative of the new list will be the representative of A Rather than appending B to the end of A, instead splice B into A right after the Þrst element of A We have to traverse B to update representative pointers anyway, so we can just make the last element of B point to the second element of A.
Solution to Exercise 21.3-3
You need to Þnd a sequence of m operations on n elements that takes (m lg n)
time Start with n MAKE-SETs to create singleton sets{x1} , {x2} , , {x n} Next
perform the n− 1 UNIONoperations shown below to create a single set whose tree
Trang 721-8 Solutions for Chapter 21: Data Structures for Disjoint Sets
FIND-SET operations, runs in O (m) time The key observation is that once a
node x appears on a Þnd path, x will be either a root or a child of a root at all times
thereafter
We use the accounting method to obtain the O (m) time bound We charge a
MAKE-SET operation two dollars One dollar pays for the MAKE-SET, and one
dollar remains on the node x that is created The latter pays for the Þrst time that x
appears on a Þnd path and is turned into a child of a root
We charge one dollar for a LINKoperation This dollar pays for the actual linking
of one node to another
We charge one dollar for a FIND-SET This dollar pays for visiting the root andits child, and for the path compression of these two nodes, during the FIND-SET.All other nodes on the Þnd path use their stored dollar to pay for their visitationand path compression As mentioned, after the FIND-SET, all nodes on the Þndpath become children of a root (except for the root itself), and so whenever theyare visited during a subsequent FIND-SET, the FIND-SEToperation itself will payfor them
Since we charge each operation either one or two dollars, a sequence of m tions is charged at most 2m dollars, and so the total time is O (m).
opera-Observe that nothing in the above argument requires union by rank Therefore, we
get an O (m) time bound regardless of whether we use union by rank.
Solution to Exercise 21.4-4
Clearly, each MAKE-SETand LINKoperation takes O (1) time Because the rank
of a node is an upper bound on its height, each Þnd path has length O (lg n), which
in turn implies that each FIND-SET takes O (lg n) time Thus, any sequence of
m MAKE-SET, LINK, and FIND-SET operations on n elements takes O (m lg n)
time It is easy to prove an analogue of Lemma 21.7 to show that if we convert a
sequence of mMAKE-SET, UNION, and FIND-SEToperations into a sequence of
m MAKE-SET, LINK, and FIND-SEToperations that take O (m lg n) time, then the
sequence of m MAKE-SET, UNION, and FIND-SET operations takes O (mlg n )
time
Solution to Exercise 21.4-5
Professor Dante is mistaken Take the following scenario Let n = 16, and make
16 separate singleton sets using MAKE-SET Then do 8 UNIONoperations to linkthe sets into 8 pairs, where each pair has a root with rank 0 and a child with rank 1.Now do 4 UNIONs to link pairs of these trees, so that there are 4 trees, each with aroot of rank 2, children of the root of ranks 1 and 0, and a node of rank 0 that is thechild of the rank-1 node Now link pairs of these trees together, so that there aretwo resulting trees, each with a root of rank 3 and each containing a path from aleaf to the root with ranks 0, 1, and 3 Finally, link these two trees together, so that
Trang 8Solutions for Chapter 21: Data Structures for Disjoint Sets 21-9
there is a path from a leaf to the root with ranks 0, 1, 3, and 4 Let x and y be the nodes on this path with ranks 1 and 3, respectively Since A1(1) = 3, level(x) = 1,
and since A0(3) = 4, level(y) = 0 Yet y follows x on the Þnd path.
Solution to Exercise 21.4-6
First,α(22047− 1) = min {k : A k (1) ≥ 2047} = 3, and 22047− 1 1080
Second, we need that 0 ≤ level(x) ≤ α(n) for all nonroots x with rank[x] ≥ 1.
With this deÞnition ofα(n), we have A α(n) (rank[x]) ≥ A α(n) (1) ≥ lg(n + 1) >
lg n ≥ rank(p[x]) The rest of the proof goes through with α(n) replacing α(n).
Solution to Problem 21-1
a For the input sequence
4, 8, E, 3, E, 9, 2, 6, E, E, E, 1, 7, E, 5 ,
the values in the extracted array would be 4 , 3, 2, 6, 8, 1.
The following table shows the situation after the ith iteration of the for loop
when we use OFF-LINE-MINIMUM on the same input (For this input, n = 9
and m—the number of extractions—is 6).
b We want to show that the array extracted returned by OFF-LINE-MINIMUMis
correct, meaning that for i = 1, 2, , m, extracted[ j] is the key returned by the j th EXTRACT-MINcall
We start with n INSERT operations and m EXTRACT-MIN operations Thesmallest of all the elements will be extracted in the Þrst EXTRACT-MIN after
its insertion So we Þnd j such that the minimum element is in K j, and put the
minimum element in extracted[ j ], which corresponds to the EXTRACT-MIN
after the minimum element insertion
Now we reduce to a similar problem with n− 1 INSERToperations and m− 1
EXTRACT-MIN operations in the following way: the INSERT operations are
Trang 921-10 Solutions for Chapter 21: Data Structures for Disjoint Sets
the same but without the insertion of the smallest that was extracted, and the
EXTRACT-MIN operations are the same but without the extraction that tracted the smallest element
ex-Conceptually, we unite Ij and Ij+1, removing the extraction between them andalso removing the insertion of the minimum element from Ij∪ Ij+1 Uniting Ijand Ij+1is accomplished by line 6 We need to determine which set is K l, rather
than just using K j+1 unconditionally, because K j+1 may have been destroyedwhen it was united into a higher-indexed set by a previous execution of line 6.Because we process extractions in increasing order of the minimum value
found, the remaining iterations of the for loop correspond to solving the
re-duced problem
There are two other points worth making First, if the smallest remaining ement had been inserted after the last EXTRACT-MIN (i.e., j = m + 1), then
el-no changes occur, because this element is el-not extracted Second, there may be
smaller elements within the K j sets than the the one we are currently lookingfor These elements do not affect the result, because they correspond to ele-ments that were already extracted, and their effect on the algorithm’s execution
is over
c To implement this algorithm, we place each element in a disjoint-set forest.
Each root has a pointer to its K i set, and each K i set has a pointer to the root of
the tree representing it All the valid sets K i are in a linked list
Before OFF-LINE-MINIMUM, there is initialization that builds the initial sets K i
according to the Ii sequences
• Line 2 (“determine j such that i ∈ K j ”) turns into j ← FIND-SET(i).
• Line 5 (“let l be the smallest value greater than j for which set K l exists”)
turns into K l ← next[K j]
• Line 6 (“K l ← K j ∪ K l , destroying K j ”) turns into l ← LINK( j, l) and
remove K j from the linked list
To analyze the running time, we note that there are n elements and that we have
the following disjoint-set operations:
• n MAKE-SEToperations
• at most n− 1 UNIONoperations before starting
• n FIND-SEToperations
• at most n LINKoperations
Thus the number m of overall operations is O (n) The total running time is O(m α(n)) = O(n α(n)).
[The “tight bound” wording that this question uses does not refer to an totically tight” bound Instead, the question is merely asking for a bound that isnot too “loose.”]
Trang 10“asymp-Solutions for Chapter 21: Data Structures for Disjoint Sets 21-11
Solution to Problem 21-2
a Denote the number of nodes by n, and let n = (m + 1)/3, so that m = 3n − 1 First, perform the n operations MAKE-TREE(v1), MAKE-TREE(v2),
, MAKE-TREE(v n ) Then perform the sequence of n − 1 GRAFToperations
GRAFT(v1, v2), GRAFT(v2, v3), , GRAFT(v n−1, v n ); this sequence produces
a single disjoint-set tree that is a linear chain of n nodes with v n at the rootand v1 as the only leaf Then perform FIND-DEPTH(v1) repeatedly, n times.
The total number of operations is n + (n − 1) + n = 3n − 1 = m.
Each MAKE-TREEand GRAFToperation takes O (1) time Each FIND-DEPTH
operation has to follow an n-node Þnd path, and so each of the n FIND-DEPTH
operations takes (n) time The total time is n · (n) + (2n − 1) · O(1) =
(n2) = (m2).
b MAKE-TREEis like MAKE-SET, except that it also sets the d value to 0:
MAKE-TREE(v) p[v] ← v rank[ v] ← 0 d[v] ← 0
It is correct to set d[ v] to 0, because the depth of the node in the single-node
disjoint-set tree is 0, and the sum of the depths on the Þnd path for v consists
if and only if p[ v] = p[p[v]]) in the disjoint-set forest, we don’t have to
re-curse; instead, we just return p[ v] Second, when we do recurse, we save the
pointer p[ v] into a new variable y Third, when we recurse, we update d[v] by
adding into it the d values of all nodes on the Þnd path that are no longer proper
Trang 1121-12 Solutions for Chapter 21: Data Structures for Disjoint Sets
ancestors ofv after path compression; these nodes are precisely the proper
an-cestors ofv other than the root Thus, as long as v does not start out the FIND
-ROOTcall as either the root or a child of the root, we add d[y] into d[ v] Note
that d[y] has been updated prior to updating d[ v], if y is also neither the root
nor a child of the root
FIND-DEPTH Þrst calls FIND-ROOT to perform path compression and updatepseudodistances Afterward, the Þnd path fromv consists of either just v (if v
is a root) or justv and p[v] (if v is not a root, in which case it is a child of the
root after path compression) In the former case, the depth ofv is just d[v], and
in the latter case, the depth is d[ v] + d[p[v]].
d Our procedure for GRAFTis a combination of UNIONand LINK:
pression and update pseudodistances on the Þnd paths from r and v We then
call FIND-DEPTH(v), saving the depth of v in the variable z (Since we have
just compressedv’s Þnd path, this call of FIND-DEPTHtakes O (1) time.) Next,
we emulate the action of LINK, by making the root (r orv) of smaller rank a
child of the root of larger rank; in case of a tie, we make ra child ofv
If v has the smaller rank, then all nodes in r’s tree will have their depths
in-creased by the depth ofv plus 1 (because r is to become a child of v) Altering
the psuedodistance of the root of a disjoint-set tree changes the computed depth
of all nodes in that tree, and so adding z + 1 to d[r] accomplishes this update
for all nodes in r’s disjoint-set tree Since v will become a child of r in thedisjoint-set forest, we have just increased the computed depth of all nodes inthe disjoint-set tree rooted at v by d[r] These computed depths should not
have changed, however Thus, we subtract off d[r] from d[ v], so that the sum
d[v]+ d[r] after makingva child of requals d[ v] before makingva child
of r
On the other hand, if r has the smaller rank, or if the ranks are equal, then r
becomes a child ofv in the disjoint-set forest In this case,v remains a root
in the disjoint-set forest afterward, and we can leave d[ v] alone We have to
update d[r], however, so that after making r a child ofv, the depth of each
node in r’s disjoint-set tree is increased by z + 1 We add z + 1 to d[r], but we
Trang 12Solutions for Chapter 21: Data Structures for Disjoint Sets 21-13
also subtract out d[ v], since we have just made ra child ofv Finally, if the
ranks of randvare equal, we increment the rank ofv, as is done in the LINK
procedure
e The asymptotic running times of MAKE-TREE, FIND-DEPTH, and GRAFTareequivalent to those of MAKE-SET, FIND-SET, and UNION, respectively Thus,
a sequence of m operations, n of which are MAKE-TREE operations, takes
(m α(n)) time in the worst case.
Trang 14Lecture Notes for Chapter 22:
Elementary Graph Algorithms
Graph representation
Given graph G = (V, E).
• May be either directed or undirected
• Two common ways to represent for algorithms:
1 Adjacency lists
2 Adjacency matrix
When expressing the running time of an algorithm, it’s often in terms of both|V |
and|E| In asymptotic notation—and only in asymptotic notation—we’ll drop the cardinality Example: O (V + E).
[The introduction to Part VI talks more about this.]
Adjacency lists
Array Adj of |V | lists, one per vertex.
Vertex u’s list has all vertices v such that (u, v) ∈ E (Works for both directed and
undirected graphs.)
Example: For an undirected graph:
3 4 5
1 2 3 4 5
1 2 2
4 5
Time: to list all vertices adjacent to u: (degree(u)).
Time: to determine if (u, v) ∈ E: O(degree(u)).
Trang 1522-2 Lecture Notes for Chapter 22: Elementary Graph Algorithms
Example: For a directed graph:
3
1 2 3 4
2 4
4
Adj
3 4
Same asymptotic space and time
1 2 3 4 5 1
2 3 4 5
1 2 3 4 1
2 3 4
Space: (V2).
Time: to list all vertices adjacent to u: (V ).
Time: to determine if (u, v) ∈ E: (1).
Can store weights instead of bits for weighted graph
We’ll use both representations in these lecture notes
Breadth-Þrst search
Input: Graph G = (V, E), either directed or undirected, and source vertex s ∈ V
Output: d[ v] = distance (smallest # of edges) from s to v, for all v ∈ V
In book, alsoπ[v] = u such that (u, v) is last edge on shortest path s ; v.
• u is v’s predecessor.
• set of edges{(π[v], v) : v = s} forms a tree.
Later, we’ll see a generalization of breadth-Þrst search, with edge weights Fornow, we’ll keep it simple
• Compute only d[ v], not π[v] [See book for π[v].]
• Omitting colors of vertices [Used in book to reason about the algorithm We’llskip them here.]
Trang 16Lecture Notes for Chapter 22: Elementary Graph Algorithms 22-3
Idea: Send a wave out from s.
• First hits all vertices 1 edge from s.
• From there, hits all vertices 2 edges from s.
• Etc
Use FIFO queue Q to maintain wavefront.
• v ∈ Q if and only if wave has hit v but has not come out of v yet.
3
Can show that Q consists of vertices with d values.
i i i i i + 1 i + 1 i + 1
• Only 1 or 2 values
• If 2, differ by 1 and all smallest are Þrst
Since each vertex gets a Þnite d value at most once, values assigned to vertices are
monotonically increasing over time
Actual proof of correctness is a bit trickier See book
BFS may not reach all vertices
Time= O(V + E).
• O(V ) because every vertex enqueued at most once.
• O(E) because every vertex dequeued at most once and we examine (u, v) only
when u is dequeued Therefore, every edge examined at most once if directed,
at most twice if undirected
Trang 1722-4 Lecture Notes for Chapter 22: Elementary Graph Algorithms
Depth-Þrst search
Input: G = (V, E), directed or undirected No source vertex given!
Output: 2 timestamps on each vertex:
• d[ v] = discovery time
• f [ v] = Þnishing time
These will be useful for other algorithms later on
Can also computeπ[v] [See book.]
Will methodically explore every edge.
• Start over from different vertices as necessary
As soon as we discover a vertex, explore from it
• Unlike BFS, which puts a vertex on a queue so that we explore from it later
As DFS progresses, every vertex has a color:
• WHITE= undiscovered
• GRAY= discovered, but not Þnished (not done exploring from it)
• BLACK= Þnished (have found everything reachable from it)
Discovery and Þnish times:
• Unique integers from 1 to 2|V |.
time ← time +1
f [u] ← time Þnish u
Trang 18Lecture Notes for Chapter 22: Elementary Graph Algorithms 22-5
Example: [Go through this example, adding in the d and f values as they’re
com-puted Show colors as they change Don’t put in the edge types yet.]
12 1
4 3
11 8
6 5
16 13
15 14
Time= (V + E).
• Similar to BFS analysis
• , not just O, since guaranteed to examine every vertex and edge.
DFS forms a depth-Þrst forest comprised of > 1 depth-Þrst trees Each tree is
made of edges(u, v) such that u is gray and v is white when (u, v) is explored.
Theorem (Parenthesis theorem)
[Proof omitted.]
For all u , v, exactly one of the following holds:
1 d[u] < f [u] < d[v] < f [v] or d[v] < f [v] < d[u] < f [u] and neither of u
andv is a descendant of the other.
2 d[u] < d[v] < f [v] < f [u] and v is a descendant of u.
3 d[ v] < d[u] < f [u] < f [v] and u is a descendant of v.
So d[u] < d[v] < f [u] < f [v] cannot happen.
Like parentheses:
• OK: ( ) [ ] ( [ ] ) [ ( ) ]
• Not OK: ( [ ) ] [ ( ] )
Corollary
v is a proper descendant of u if and only if d[u] < d[v] < f [v] < f [u].
Theorem (White-path theorem)
[Proof omitted.]
v is a descendant of u if and only if at time d[u], there is a path u ; v consisting
of only white vertices (Except for u, which was just colored gray.)
Trang 1922-6 Lecture Notes for Chapter 22: Elementary Graph Algorithms
ClassiÞcation of edges
• Tree edge: in the depth-Þrst forest Found by exploring (u, v).
• Back edge: (u, v), where u is a descendant of v.
• Forward edge: (u, v), where v is a descendant of u, but not a tree edge.
• Cross edge: any other edge Can go between vertices in same depth-Þrst tree
or in different depth-Þrst trees
[Now label the example from above with edge types.]
In an undirected graph, there may be some ambiguity since (u, v) and (v, u) are
the same edge Classify by the Þrst type above that matches
Directed acyclic graph (dag)
A directed graph with no cycles
Good for modeling processes and structures that have a partial order:
• a > b and b > c ⇒ a > c.
• But may have a and b such that neither a > b nor b > c.
Can always make a total order (either a > b or b > a for all a = b) from a partial
order In fact, that’s what a topological sort will do
Example: dag of dependencies for putting on goalie equipment: [Leave on board,but show without discovery and Þnish times Will put them in later.]
skates
18/21
19/20
batting glove chest pad
sweater mask catch glove
7/14
8/13 9/12
Trang 20Lecture Notes for Chapter 22: Elementary Graph Algorithms 22-7
Lemma
A directed graph G is acyclic if and only if a DFS of G yields no back edges.
Proof ⇒ : Show that back edge ⇒ cycle
Suppose there is a back edge(u, v) Then v is ancestor of u in depth-Þrst forest.
Therefore, there is a pathv ; u, so v ; u → v is a cycle.
⇐ : Show that cycle ⇒ back edge
Suppose G contains cycle c Let v be the Þrst vertex discovered in c, and let (u, v)
be the preceding edge in c At time d[ v], vertices of c form a white path v ; u
(sincev is the Þrst vertex discovered in c) By white-path theorem, u is descendant
ofv in depth-Þrst forest Therefore, (u, v) is a back edge. (lemma)
Topological sort of a dag: a linear ordering of vertices such that if (u, v) ∈ E,
then u appears somewhere before v (Not like sorting numbers.)
TOPOLOGICAL-SORT(V, E)
call DFS(V, E) to compute Þnishing times f [v] for all v ∈ V
output vertices in order of decreasing Þnish times
Don’t need to sort by Þnish times
• Can just output vertices as they’re Þnished and understand that we want the
reverse of this list.
• Or put them onto the front of a linked list as they’re Þnished When done, the
list contains vertices in topologically sorted order
Time: (V + E).
Do example.[Now write discovery and Þnish times in goalie equipment example.]
Trang 2122-8 Lecture Notes for Chapter 22: Elementary Graph Algorithms
Correctness: Just need to show if (u, v) ∈ E, then f [v] < f [u].
When we explore(u, v), what are the colors of u and v?
• u is gray.
• Isv gray, too?
• No, because then v would be ancestor of u.
⇒ (u, v) is a back edge.
⇒ contradiction of previous lemma (dag has no back edges)
• Isv white?
• Then becomes descendant of u.
By parenthesis theorem, d[u] < d[v] < f [v] < f [u].
• Isv black?
• Thenv is already Þnished.
Since we’re exploring(u, v), we have not yet Þnished u.
Therefore, f [ v] < f [u].
Strongly connected components
Given directed graph G = (V, E).
A strongly connected component (SCC) of G is a maximal set of vertices C ⊆ V such that for all u , v ∈ C, both u ; v and v ; u.
Example: [Just show SCC’s at Þrst Do DFS a little later.]