Paths, Searches, and Distances

In a directed graph, apathcan be defined as an ordered list of nodes, where consecutive nodes in the list need to be connected by an edge. Mathematically, in a directed graphG=(V , E), a pathP between nodesxandy is a listP =p1, p2, ..., pn, wherep1 =x,pn=y, and all pairs of consecutive nodes onP,(pi, pi+1),∀i∈1, . . . , n−1 belong toE. Note that if the graph is undirected, either(pi, pi+1)∈Eor(pi+1, pi)∈E.

In a graph, two nodess andvare considered to beconnectedif there is a valid path between both. In this case, nodevis said to be reachable froms. In many situations, it is useful to compute all the nodes that can be reached from a source nodes, i.e. all nodes for which there are paths starting insand reaching the target node.

It is important to notice that, in graphs, given nodess andv, there might be different valid paths connecting the two nodes, i.e. starting ins and reachingv. In many cases, it is important

to consider just one, that is typically theshortest path. This is defined as the path connecting the two nodes with the shorter length, i.e. the minimum number of intermediate nodes.

Thedistancebetween the two nodess andvis given by the number of edges contained on the shortest path between these two nodes.

To be able to address the definitions put forward in this section, we need to develop algorithms that are able to traverse the graph, starting in a given source node, gathering the visited nodes. There are mainly two alternative strategies that can be used to address this task:

• Breadth-first search(BFS): starts by the source node, then visits all its successors, fol- lowed by their successors, until all possible nodes are visited;

• Depth-first search(DFS): starts by the source node, explores its first successor, then its first successor, until no further exploration is possible, and then backtracks to explore further alternatives.

The code block below shows two methods to address each of these strategies, both returning the set of reachable nodes from a given source vertex. Note that the functions are quite similar, using a list of visited nodes to return the result (res) and a list of nodes to be handled (l).

The way this working list is handled makes the difference of the two versions, since it implements two different data structures: in the BFS it implements a queue (last in, last out), while in the DFS it implements a stack (first in, first out), thus changing the order of the results.

The two functions are tested with a simple example, that is graphically illustrated in Fig.13.2, where the two strategies are clearly distinguished.

d e f reachable_bfs (s e l f, v):

l = [v]

res = []

w h i l e l e n(l) > 0:

node = l. pop (0)

i f node != v: res . append ( node ) f o r elem i n s e l f. graph [ node ]:

i f elem n o t i n res and elem n o t i n l:

l. append ( elem ) r e t u r n res

d e f reachable_dfs (s e l f, v):

l = [v]

res = []

w h i l e l e n(l) > 0:

node = l. pop (0)

Figure 13.2: Illustration of the two strategies for traversing a graph with an example. In the upper part, the breadth-first strategy is shown, and in the bottom the depth-first search is il- lustrated.

i f node != v: res . append ( node ) s = 0

f o r elem i n s e l f. graph [ node ]:

i f elem n o t i n res and elem n o t i n l:

l. insert (s , elem ) s += 1

r e t u r n res

i f __name__ == " __main__ ":

gr2 = MyGraph ({1:[2 ,3 ,4] ,

2:[5 ,6] ,3:[6 ,8] ,4:[8] ,5:[7] ,6:[] ,7:[] ,8:[]}) p r i n t( gr2 . reachable_bfs (1) )

p r i n t( gr2 . reachable_dfs (1) )

Based on these strategies, we can devise a strategy to calculate the distance between two nodes, or similarly the shortest path between them. Looking at the two previous strategies, it becomes clear that the one suitable in this case is the BFS, since in this case all nodes at a distance ofnfrom the source are always covered before the nodes at a distancen+1, as it is clear looking at Fig.13.2.

Thus, the two following functions, in the next code chunk, show how to implement the distance and shortest path algorithms, in the class we have been building in this chapter. In both cases, the strategy is similar to thereachable_bfsfunction above. In the case of the distance, the working list keeps already visited nodes and their distances to the source, while in the case of the shortest path this list keeps the visited nodes and the path to the source node (in this case in the form of a list). The main difference, in this case, is that we will stop the search process when the destination node is reached. In both these functions returnNone, if the destination node is unreachable.

d e f distance (s e l f, s , d):

i f s == d: r e t u r n 0 l = [(s ,0) ]

visited = [s]

w h i l e l e n(l) > 0:

node , dist = l. pop (0)

f o r elem i n s e l f. graph [ node ]:

i f elem == d: r e t u r n dist + 1 e l i f elem n o t i n visited :

l. append (( elem , dist +1) ) visited . append ( elem ) r e t u r n None

d e f shortest_path (s e l f, s , d):

i f s == d: r e t u r n 0 l = [(s ,[]) ]

visited = [s]

w h i l e l e n(l) > 0:

node , preds = l. pop (0)

f o r elem i n s e l f. graph [ node ]:

i f elem == d: r e t u r n preds +[ node , elem ] e l i f elem n o t i n visited :

l. append (( elem , preds +[ node ])) visited . append ( elem )

r e t u r n None

i f __name__ == " __main__ ":

gr2 = MyGraph ({1:[2 ,3 ,4] ,

2:[5 ,6] ,3:[6 ,8] ,4:[8] ,5:[7] ,6:[] ,7:[] ,8:[]})

p r i n t( gr2 . distance (1 ,7)) p r i n t( gr2 . shortest_path (1 ,7) ) p r i n t( gr2 . distance (1 ,8)) p r i n t( gr2 . shortest_path (1 ,8) ) p r i n t( gr2 . distance (6 ,1)) p r i n t( gr2 . shortest_path (6 ,1) )

We can also define a function that combines the full BFS search with the distance, gathering all reachable nodes and their distance from the source.

d e f reachable_with_dist (s e l f, s):

res = []

l = [(s ,0) ]

w h i l e l e n(l) > 0:

node , dist = l. pop (0)

i f node != s: res . append (( node , dist )) f o r elem i n s e l f. graph [ node ]:

i f n o t is_in_tuple_list (l , elem ) and n o t is_in_tuple_list (res , elem ):

l. append (( elem , dist +1) ) r e t u r n res

d e f is_in_tuple_list (tl , val ):

res = F a l s e f o r (x ,y) i n tl:

i f val == x: r e t u r n True r e t u r n res

i f __name__ == " __main__ ":

gr2 = MyGraph ({1:[2 ,3 ,4] ,

2:[5 ,6] ,3:[6 ,8] ,4:[8] ,5:[7] ,6:[] ,7:[] ,8:[]}) p r i n t( gr2 . reachable_with_dist (1) )

Note that, in some cases, there might be no path between the nodes, a situation where the distance is considered to be infinite. If all pairs of nodes have a finite distance, the graph is said to bestrongly connected. This definition is implemented by the following function.

d e f is_connected (s e l f):

total = l e n(s e l f. graph . keys ()) − 1 f o r v i n s e l f. graph . keys ():

reachable_v = s e l f. reachable_bfs (v)

i f (l e n( reachable_v ) < total ): r e t u r n F a l s e r e t u r n True

Genes: Discrete Units of Genetic Information

Biological Sequences: Representations and Basic Algorithms