Representing Networks With Graphs

14.2.1 A Python Class for Metabolic Networks

As stated above, we will represent networks using graphs assuming the representation put forward in the previous chapter. As before, we will provide a Python implementation of the

mentioned concepts, in the form of a class to represent metabolic networks, which will extend the core class to represent directed graphs presented before.

It is important to notice that metabolic networks encompassing both types of nodes (reactions and metabolites), as shown in Fig.14.1A, bring an additional requirement, since the graph, in this case, has two different types of nodes. This is a particular case of a class of graphs named bipartite graphs.

Formally, a graphG=(V , E)is abipartite graph, ifV can be split into two disjoint setsV1

andV2 (V1∪V2=V andV1∩V2= ∅), andEonly contains pairs where the first element belongs toV1and the second belongs toV2, i.e. there are no edges connecting nodes of the same node set. It is easy to check that metabolic networks (metabolite-reaction) are a special case of a bipartite graph whereV1represents the set of metabolites andV2the set of reactions.

To represent metabolic networks, considering all the types of networks illustrated by Fig.14.1, we will define the classMetabolicNetwork, which is a sub-class of the class MyGraph. As shown below, this will add to the base class a variable specifying the type of network (the options will be “metabolite-reaction”, “metabolite-metabolite” and “reaction- reaction”, which match the networks depicted in Fig.14.1A, B and C, respectively), an attribute (a dictionary) which will keep the nodes of each type (in the case it is a bipartite

“metabolite-reaction” graph), and a Boolean variable specifying if the reversible reactions are represented split in their two directions (as it is the case of the networks in Fig.14.1D and E) or as a single reaction.

from MyGraph i m p o r t MyGraph

c l a s s MetabolicNetwork ( MyGraph ):

d e f __init__ (s e l f, network_type = " metabolite−reaction ", split_rev = F a l s e):

MyGraph . __init__ (s e l f, {})

s e l f. net_type = network_type

s e l f. node_types = {}

i f network_type == " metabolite−reaction ":

s e l f. node_types [" metabolite "] = []

s e l f. node_types [" reaction "] = []

s e l f. split_rev = split_rev

To add some content to these networks, we can insert nodes and edges “by hand”, i.e. using the methods defined in the parent class. In this case, we will implement a new method to allow to add a node to a bipartite graph, specifying its type. This is illustrated in the code chunk below where the network in Fig.14.1A is created and printed.

c l a s s MetabolicNetwork ( MyGraph ):

(...)

d e f add_vertex_type (s e l f, v , nodetype ):

s e l f. add_vertex (v)

s e l f. node_types [ nodetype ]. append (v) d e f get_nodes_type (s e l f, node_type ):

i f node_type i n s e l f. node_types : r e t u r n s e l f. node_types [ node_type ] e l s e: r e t u r n None

d e f test1 () :

m = MetabolicNetwork (" metabolite−reaction ") m. add_vertex_type (" R1 "," reaction ")

m. add_vertex_type (" R2 "," reaction ") m. add_vertex_type (" R3 "," reaction ") m. add_vertex_type (" M1 "," metabolite ") m. add_vertex_type (" M2 "," metabolite ") m. add_vertex_type (" M3 "," metabolite ") m. add_vertex_type (" M4 "," metabolite ") m. add_vertex_type (" M5 "," metabolite ") m. add_vertex_type (" M6 "," metabolite ") m. add_edge (" M1 "," R1 ")

m. add_edge (" M2 "," R1 ") m. add_edge (" R1 "," M3 ") m. add_edge (" R1 "," M4 ") m. add_edge (" M4 "," R2 ") m. add_edge (" M6 "," R2 ") m. add_edge (" R2 "," M3 ") m. add_edge (" M4 "," R3 ") m. add_edge (" M5 "," R3 ") m. add_edge (" R3 "," M6 ") m. add_edge (" R3 "," M4 ") m. add_edge (" R3 "," M5 ") m. add_edge (" M6 "," R3 ") m. print_graph ()

p r i n t(" Reactions : ", m. get_nodes_type (" reaction ") )

p r i n t(" Metabolites : ", m. get_nodes_type (" metabolite ") ) test1 ()

A more useful alternative that allows to work with large-scale networks is to load information from files and build the network based on the file’s content. Here, we will assume information is given in a text file with a reaction per row, encoded in the form shown in the boxes in the right-hand side of Fig.14.1.

A functionload_from_fileis provided in theMetabolicNetworkclass to read the file and build the desired network (shown in the following code). This function loads information from the file and creates the network according to the network type provided by the attribute in the class, and also taking into account the flag that specifies how to handle reversible reactions.

Note the use of thesplitfunction, which applied over a stringstrand providing a separator string (actually a regular expression), splitsstrin several strings cutting in every occurrence of the separator, being the result a list of the resulting sub-strings (typically named astokens).

d e f load_from_file (s e l f, filename ):

rf = open( filename )

gmr = MetabolicNetwork (" metabolite−reaction ") f o r line i n rf :

i f ":" i n line :

tokens = line . split (":") reac_id = tokens [0]. strip ()

gmr . add_vertex_type ( reac_id , " reaction ") rline = tokens [1]

e l s e: r a i s e Exception (" Invalid line :") i f " <=>" i n rline :

left , right = rline . split (" <=>") mets_left = left . split ("+") f o r met i n mets_left :

met_id = met . strip ()

i f met_id n o t i n gmr . graph :

gmr . add_vertex_type ( met_id , " metabolite ") i f s e l f. split_rev :

gmr . add_vertex_type ( reac_id +" _b ", " reaction ") gmr . add_edge ( met_id , reac_id )

gmr . add_edge ( reac_id +" _b ", met_id )

e l s e:

gmr . add_edge ( met_id , reac_id ) gmr . add_edge ( reac_id , met_id ) mets_right = right . split ("+")

f o r met i n mets_right : met_id = met . strip ()

i f met_id n o t i n gmr . graph :

gmr . add_vertex_type ( met_id , " metabolite ") i f s e l f. split_rev :

gmr . add_edge ( met_id , reac_id +" _b ") gmr . add_edge ( reac_id , met_id ) e l s e:

gmr . add_edge ( met_id , reac_id ) gmr . add_edge ( reac_id , met_id ) e l i f "=>" i n line :

left , right = rline . split ("=>") mets_left = left . split ("+") f o r met i n mets_left :

met_id = met . strip ()

i f met_id n o t i n gmr . graph :

gmr . add_vertex_type ( met_id , " metabolite ") gmr . add_edge ( met_id , reac_id )

mets_right = right . split ("+") f o r met i n mets_right :

met_id = met . strip ()

i f met_id n o t i n gmr . graph :

gmr . add_vertex_type ( met_id , " metabolite ") gmr . add_edge ( reac_id , met_id )

e l s e: r a i s e Exception (" Invalid line :") i f s e l f. net_type == " metabolite−reaction ":

s e l f. graph = gmr . graph

s e l f. node_types = gmr . node_types

e l i f s e l f. net_type == " metabolite−metabolite ":

s e l f. convert_metabolite_net( gmr )

e l i f s e l f. net_type == " reaction−reaction ":

s e l f. convert_reaction_graph( gmr ) e l s e: s e l f. graph = {}

d e f convert_metabolite_net(s e l f, gmr ):

f o r m i n gmr . node_types [" metabolite "]:

s e l f. add_vertex (m)

sucs = gmr . get_successors (m) f o r s i n sucs :

sucs_r = gmr . get_successors (s) f o r s2 i n sucs_r :

i f m != s2:

s e l f. add_edge (m , s2 ) d e f convert_reaction_graph(s e l f, gmr ):

f o r r i n gmr . node_types [" reaction "]:

s e l f. add_vertex (r)

sucs = gmr . get_successors (r) f o r s i n sucs :

sucs_r = gmr . get_successors (s) f o r s2 i n sucs_r :

i f r != s2: s e l f. add_edge (r , s2)

Note that the information is loaded and a “metabolite-reaction” is created; if another network type is required, it is then converted using the two provided auxiliary methods. This is a rea- sonable approach, since the best way to create networks with reactions or metabolites only is to create the bipartite graph firstly. Indeed, we will connect two metabolitesM1 andM2 (in the “metabolite-metabolite” network) if they share a reactionRas neighbor (Ris a successor ofM1 and a predecessor ofM2). Also, when building a “reaction-reaction” network, two reactionsR1 andR2will be connected if a metaboliteMis produced byR1(successor) and consumed byR2(predecessor).

To allow checking the behavior of these functions, a simple example is provided in the file

“example-net.txt”. This file has the following content:

R1: M1 + M2 => M3 + M4 R2: M4 + M6 => M3 R3: M4 + M5 <=> M6

Thus, the metabolic system in the file is the same as provided by Fig.14.1. In the example below, we use this file to create all the different types of networks represented in this figure.

d e f test2 () :

p r i n t(" metabolite−reaction network :")

mrn = MetabolicNetwork (" metabolite−reaction ") mrn . load_from_file (" example−net . txt ")

mrn . print_graph ()

p r i n t(" Reactions : ", mrn . get_nodes_type (" reaction ") ) p r i n t(" Metabolites : ", mrn . get_nodes_type (" metabolite ") ) p r i n t()

p r i n t(" metabolite−metabolite network :")

mmn = MetabolicNetwork (" metabolite−metabolite ") mmn . load_from_file (" example−net . txt ")

mmn . print_graph () p r i n t()

p r i n t(" reaction−reaction network :")

rrn = MetabolicNetwork (" reaction−reaction ") rrn . load_from_file (" example−net . txt ")

rrn . print_graph () p r i n t()

p r i n t(" metabolite−reaction network ( splitting reversible ):") mrsn = MetabolicNetwork (" metabolite−reaction ", True)

mrsn . load_from_file (" example−net . txt ") mrsn . print_graph ()

p r i n t()

p r i n t(" reaction−reaction network ( splitting reversible ):") rrsn = MetabolicNetwork (" reaction−reaction ", True)

rrsn . load_from_file (" example−net . txt ") rrsn . print_graph ()

p r i n t() test2 ()

14.2.2 An Example Metabolic Network for a Real Organism

To illustrate some of the concepts put forward in this chapter with a real world scenario, we will create a metabolic network for a known model organism, the bacteriumEscherichia coli.

As a basis for this network, we will consider one of the most popular metabolic networks reconstructed by the research community, theiJR904metabolic model by Reed and co- workers [133].

The reactions included in this model are provided, in the same format discussed in the previous section, in the file “ecoli.txt” (available in the book’s website). The following code allows to load this file and create each of the three types of networks used in this work, allowing to confirm that the network has 931 reactions and 761 metabolites, resulting in over 5000 edges in the “metabolite-reaction” network and over 130,000 in the “reaction-reaction” network.

This larger and more realistic network will be used in the next sections to illustrate some re- sults of the topological analysis functions.

d e f test3 () :

p r i n t(" metabolite−reaction network :")

ec_mrn = MetabolicNetwork (" metabolite−reaction ") ec_mrn . load_from_file (" ecoli . txt ")

p r i n t( ec_mrn . size ())

p r i n t(" metabolite−metabolite network :")

ec_mmn = MetabolicNetwork (" metabolite−metabolite ") ec_mmn . load_from_file (" ecoli . txt ")

p r i n t( ec_mmn . size ())

p r i n t(" reaction−reaction network :")

ec_rrn = MetabolicNetwork (" reaction−reaction ") ec_rrn . load_from_file (" ecoli . txt ")

p r i n t( ec_rrn . size ()) test3 ()

Genes: Discrete Units of Genetic Information

Biological Sequences: Representations and Basic Algorithms