We begin by thinking of the vertices as being divided into three classes: tree or visited vertices, those connected together by paths that we’ve trav-ersed; fringe vertices, those adjac
Trang 1Graph Traversal Algorithms
Depth-first search is a member of a family of graph traversal algorithms that are quite natural when viewed nonrecursively Any one of these methods can
be used to solve the simple connectivity problems posed in the last chapter
In this section, we’ll see how one program can be used to implement graph traversal methods with quite different characteristics, merely by changing the value of one variable This method will be used to solve several graph problems
in the chapters which follow
Consider the analogy of traveling through a maze Any time we face a choice of several vertices to visit, we can only go along a path t,o one of them,
so we must “save” the others to visit later One way to implement a program based on this idea is to remove the recursion from the recursive depth-first algorithm given in the previous chapter This would result in a program that saves, on a stack, the point in the adjacency list of the vertex being visited
at which the search should resume after all vertices connected to previous vertices on the adjacency list have been visited Instead of examining this implementation in more detail, we’ll look at a more general framework which encompasses several algorithms
We begin by thinking of the vertices as being divided into three classes:
tree (or visited) vertices, those connected together by paths that we’ve
trav-ersed; fringe vertices, those adjacent to tree vertices but not yet visited; and unseen vertices, those that haven’t been encountered at all yet To search a connected component of a graph systematically (implement the visit procedure
of the previous chapter), we begin with one vertex on the fringe, all others unseen, and perform the following step until all vertices have been visited:
“move one vertex (call it X) from the fringe to the tree, and put any unseen vertices adjacent to x on the fringe.” Graph traversal methods differ in how
it is decided which vertex should be moved from the fringe to the tree For depth-first search, we always want to choose the vertex from the fringe that was most recently encountered This can be implemented by always moving the first vertex on the fringe to the tree, then putting the unseen vertices adjacent to that vertex (x) at the front of the fringe and moving vertices adjacent to z which happen to be already on the fringe to the front (Note carefully that a completely different traversal method results if
we leave untouched the vertices adjacent to x which are already on the fringe.) For example, consider the undirected graph given at the beginning of this chapter The following table shows the contents of the fringe each time
a vertex is moved to the tree; the corresponding search tree is shown at the right:
Trang 2CHAPTER 30 394
A
G
E
D
F
C
H
I
J
M
L
K
B
A
G B C F
H J L B
J L B
K B
B
In this algorithm, the fringe essentially operates as a pushdown stack: we remove a vertex (call it 5) from the beginning of the fringe, then go through x’s edge list, adding unseen vertices to the beginning of the fringe, and moving fringe vertices to the beginning Note that this is not strictly a stack, since
we use the additional operation of moving a vertex to the beginning The algorithm can be efficiently implemented by maintaining the fringe as a linked list and keeping pointers into this list in an array indexed by vertex number: we’ll omit this implementation in favor of a program that can implement other traversal strategies as well
This gives a different depth-first search tree than the one drawn in the biconnectivity section above because x’s edges are put on the stack one at a time, so their order is reversed when the stack is popped The same tree as for recursive depth-first search would result if we were to append all of z’s edge list on the front of the stack, then go through and delete from the stack any other occurrences of vertices from x’s edge list The reader might find it interesting to compare this process with the result of directly removing the recursion from the visit procedure of the previous chapter
By itself, the fringe table does not give enough information to reconstruct the search tree In order to actually build the tree, it is necessary to store, with each node in the fringe table, the name of its father in the tree This
is available when the node is entered into the table (it’s the node that caused the entry), it is changed whenever the node moves to the front of the fringe, and it is needed when the node is removed from the table (to determine where
in the tree to attach it)
A second classic traversal method derives from maintaining the fringe
as a queue: always pick the least recently encountered vertex This can be
maintained by putting the unseen vertices adjacent to x at the end of the fringe
Trang 3in the general strategy above This method is called breadth-first search: first
we visit a node, then all the nodes adjacent to it, then all the nodes adjacent
to those nodes, etc This leads to the following fringe table and search tree for our example graph:
A
F
C
B
G
E
D
L
J
H
M
K
I
A
F C B G
C B G E
B G E D
G E D
E D L J
D L J H
L J H
K I
I
D
H
We remove a vertex (call it X) from the beginning of the fringe, then go through z’s edge list, putting unseen vertices at the end of the fringe Again,
an efficient implementation is available using a linked list representation for the fringe, but we’ll omit this in favor of a more general method
A fundamental feature of this general graph traversal strategy is that the fringe is actually operating as a priority queue: the vertices can be assigned priorities with the property that the “highest priority” vertex is the one moved from the fringe to the tree That is, we can directly use the priority queue routines from Chapter 11 to implement a general graph searching program For the applications in the next chapter, it is convenient to assign the highest priority to the lowest value, so we assume that the inequalities are switched
in the programs from Chapter 11 The following program is a “priority first search” routine for a graph represented with adjacency lists (so it is most appropriate for sparse graphs)
Trang 4396 CHAPTER 30
procedure sparsepfs;
var now, k: integer;
t: link;
begin
now:=O;
for k:=l to V do
begin vaI[k] :=unseen; dad[k] :=O end;
pqconstruct;
repeat
k:=pqremove;
if val[k]=unseen then
begin val[k] :=O; now:=now+l end
t:=adj[k];
while t<>z do
begin
if val[tt.v]=unseen then now:=now+l;
if onpq(t1.v) and (val[tf.v]>priority) then
begin pqchange(tt.v, priority); dad[tt.v] :=k end;
t:=tf.next
end
until pqempty;
end;
(The functions onpq and pqempty are priority queue utility routines which are easily implemented additions to the set of programs given in Chapter 11: pqempty returns true if the priority queue is empty; onpq returns true if the given vertex is currently on the priority queue.) Below and in Chapter 31, we’ll see how the substitution of various expressions for priority in this program yields several classical graph traversal algorithms Specifically, the program operates as follows: first, we give all the vertices the sentinel value unseen (which could be maxi&) and initialize the dad array, which is used to store the search tree Next we construct an indirect priority queue containing all the vertices (this construction is trivial because the values are initially all the same) In the terminology above, tree vertices are those which are not on the priority queue, unseen vertices are those on the priority queue with value unseen, and fringe vertices are the others on the priority queue With these conventions understood, the operation of the program is straightforward: it repeatedly removes the highest priority vertex from the queue and puts it on the tree, then updates the priorities of all fringe or unseen vertices connected
to that vertex
If all vertices on the priority queue are unseen, then no vertex previously
Trang 5encountered is connected to any vertex on the queue: that is, we’re entering
a new connected component This is automatically handled by the priority queue mechanism, so there is no need for a separate visit procedure inside a main procedure But note that maintaining the proper value of now is more complicated than for the recursive depth-first search program of the previous chapter The convention of this program is to leave the val entry unseen and zero for the root of the depth-first search tree for each connected component:
it might be more convenient to set it to zero or some other value (for example, now) for various applications
Now, recall that now increases from 1 to V during the execution of the
algorithm so it can be used to assign unique priorities to the vertices If we change the two occurrences of priority in sparsepfs to V-now, we get
depth-first search, because newly encountered nodes have the highest priority If
we use now for priority we get breadth-first search, because old nodes have
the highest priority These priority assignments make the priority queues operate like stacks and queues as described above (Of course, if we were only interested in using depth-first or breadth-first search, we would use a direct implementation for stacks or queues, not priority queues as in sparsepfs.)
In the next chapter, we’ll see that other priority assignments lead to other classical graph algorithms
The running time for graph traversal when implemented in this way depends on the method of implementing the priority queue In general, we have to do a priority queue operation for each edge and for each vertex, so the worst case running time should be proportional to (E + V) log V if the priority
queue is implemented as a heap as indicated However, we’ve already noted that for both depth-first and breadth-first search we can take advantage of the fact that each new priority is the highest or the lowest so far encountered
to get a running time proportional to E + V Also, other priority queue
implementations might sometimes be appropriate: for example if the graph is dense then we might as well simply keep the priority queue as an unordered array This gives a worst case running time proportional to E + V2 (or just V2), since each edge simply requires setting or resetting a priority, but each
vertex now requires searching through the whole queue to find the highest priority vertex An implementation which works in this way is given in the next chapter
The difference between depth-first and breadth-first search is quite evi-dent when a large graph is considered The diagram at left below shows the edges and nodes visited when depth-first search is halfway through the maze graph of the previous chapter starting at the upper left corner; the diagram
at right is the corresponding picture for breadth-first search:
Trang 6398 CHAPTER 30
Tree nodes are blackened in these diagrams, fringe nodes are crossed, and unseen nodes are blank Depth-first search “explores” the graph by looking for new vertices far away from the start point, taking closer vertices only when dead ends are encountered; breadth-first search completely covers the area close to the starting point, moving farther away only when everything close has been looked at Depth-first search is appropriate for one person looking for something in a maze because the “next place to look” is always close by; breadth-first search is more like a group of people looking for something by fanning out in all directions
Beyond these operational differences, it is interesting to reflect on the fundamental differences in the implementations of these methods Depth-first search is very simply expressed recursively (because its underlying data struc-ture is a stack), and breadth-first search admits to a very simple nonrecursive implementation (because its underlying data structure is a queue) But we’ve seen that the true underlying data structure for graph algorithms is a priority queue, and this admits a wealth of interesting properties to consider Again, we’ll see more examples in the next chapter
Union-Find Algorithms
In some applications we wish to know simply whether a vertex x is connected
to a vertex y in a graph; the actual path connecting them may not be relevant This problem has been carefully studied in recent years; some efficient algorithms have been developed which are of independent interest because they can also be used for processing sets (collections of objects).
Graphs correspond to sets of objects in a natural way, with vertices corresponding to objects and edges have the meaning “is in the same set as.” Thus, the sample graph in the previous chapter corresponds to the sets {A B
C D E F G}, {H I} and {J K L M} Eac connected component correspondsh
Trang 7to a different set For sets, we’re interested in the fundamental question 3s
x in the same set as y?” This clearly corresponds to the fundamental graph question “is vertex x connected to vertex y?”
Given a set of edges, we can build an adjacency list representation of the corresponding graph and use depth-first search to assign to each vertex the index of its connected component, so the questions of the form “is x connected
to y?” can be answered with just two array accesses and a comparison The extra twist in the methods that we consider here is that they are dynamic: they can accept new edges arbitrarily intermixed with questions and answer the questions correctly using the information received From the correspondence with the set problem, the addition of a new edge is called a union operation, and the queries are called jind operations
Our objective is to write a function which can check if two vertices x and
y are in the same set (or, in the graph representation, the same connected component) and, if not, can put them in the same set (put an edge between them in the graph) Instead of building a direct adjacency list or other representation of the graph, we’ll gain efficiency by using an internal structure specifically oriented towards supporting the union and find operations The internal structure that we will use is a forest of trees, one for each connected component We need to be able to find out if two vertices belong to the same tree and to be able to combine two trees to make one It turns out that both
of these operations can be implemented efficiently
To illustrate the way this algorithm works, we’ll look at the forest con-structed when the edges from the sample graph that we’ve been using in this chapter are processed in some arbitrary order Initially, all nodes are in separate trees Then the edge AG causes a two node tree to be formed, with
A at the root (This choice is arbitrary -we could equally well have put G at the root.) The edges AB and AC add B and C to this tree in the same way, leaving
The edges LM, JM, JL, and JK build a tree containing J, K, L, and M that has a slightly different structure (note that JL doesn’t contribute anything, since LM and JM put L and J in the same component), and the edges ED,
FD, and HI build two more trees, leaving the forest:
Trang 8400 CHAPTER 30
This forest indicates that the edges processed to this point describe a graph with four connected components, or, equivalently, that the set union opera-tions processed to this point have led to four sets {A B C G}, {J K L M}, {D E F} and {H I} Now the edge FE doesn’t contribute anything to the structure, since F and E are in the same component, but the edge AE combines the first two trees; then GC doesn’t contribute anything, but GH and JG result
in everything being combined into one tree:
It must be emphasized that, unlike depth-first search trees, the only relationship between these union-find trees and the underlying graph with the given edges is that they divide the vertices into sets in the same way For example, there is no correspondence between the paths that connect nodes in the trees and the paths that connect nodes in the graph
We know that we can represent these trees in exactly the same way that
we represented graph search trees: we keep an array of integers dad [l v] which contains, for each vertex, the index of its father (with a 0 entry for nodes which are at the root of a tree) To find the father of a vertex j, we simply set j : =dad b] , and to find the root of the tree to which j belongs, we repeat this operation until reaching 0 The union and find operations are then
Trang 9very easily implemented:
function find(x, y: integer; union: boolean): boolean; var i, j: integer;
begin i:=x; while dad[i]>O do i:=dad[i];
j:=y; while dadb]>O do j:=dadlj];
if union and (i<>j) then dadb] :=i;
find:=(i<>j)
end ;
This function returns true if the two given vertices are in the same component
In addition, if they are not in the same component and the union flag is set, they are put into the same component The method used is simple: Use the dad array to get to the root of the tree containing each vertex, then check to see if they are the same To merge the tree rooted at j with the tree rooted
at i, we simply set dadb]= .I as shown in the following table:
A B C D E F G H I J K L M
J G : J A A E F A A A H J J L
L G : J A A E F A A A H J J L *
An asterisk at the right indicates that the vertices are already in the same component at the time the edge is processed As usual, we are assuming
Trang 10402 CHAPTER 30
that we have available functions index and name to translate between ver-tex names and integers between 1 and V each table entry is the name of the corresponding dad array entry Also, for example, the function call f?nd(index(x), index(y), f 1a se would be used to test whether a vertex named)
x is in the same component as a vertex named y (without introducing an edge between them)
The algorithm described above has bad worst-case performance because the trees formed could be degenerate For example, taking the edges Al3 BC
CD DE EF FG GH HI IJ YZ in that order will produce a long chain with Z pointing to Y, Y pointing to X, etc This kind of structure takes time proportional to V2 to build, and has time proportional to V for an average
equivalence test
Several methods have been suggested to deal with this problem One natural method, which may have already occurred to the reader, is to try to
do the “right” thing when merging two trees rather than arbitrarily setting dadIj]=i When a tree rooted at i is to be merged with a tree rooted at j, one
of the nodes must remain a root and the other (and all its descendants) must
go one level down in the tree To minimize the distance to the root for the most nodes, it makes sense to take as the root the node with more descendant,s This idea, called weight balancing, is easily implemented by maintaining the
size of each tree (number of descendants of the root) in the dad array entry
for each root node, encoded as a nonpositive number so that the root node can be detected when traveling up the tree in find.
Ideally, we would like every node to point directly to the root of its tree
No matter what strategy we use, achieving this ideal would require examining
at least all the nodes in the smaller of the two trees to be merged, and this could be quite a lot compared to the relatively few nodes on the path to the root that find usually examines But we can approach the ideal by making
all the nodes that we do examine point to the root! This seems like a drastic step at first blush, but it is relatively easy to do, and it must be remembered that there is nothing sacrosanct about the structure of these trees: if they can be modified to make the algorithm more efficient, we should do so This method, called path compression, is easily implemented by making another
pass through each tree after the root has been found, and setting the dad entry of each vertex encountered along the way to point to the root
The combination of weight balancing and path compression ensures that the algorithms will run very quickly The following implementation shows that the extra code involved is a small price to pay to guard against degenerate cases