Assessing the Metabolic Potential

The representation of metabolic networks as bipartite graphs, apart from enabling the struc- tural analysis provided by the metrics described in the previous section, allows the study of the metabolic potential of a metabolic system, in terms of the metabolites that the system can produce.

This analysis requires to look at the network in a different way, by considering that each reaction node may be active or inactive, depending on the availability of the metabolites that act as its inputs. Indeed, a reaction will only occur within a cell if all required substrates (including co-factors) are present.

To provide for such analysis, from the networks considered above, we need to select a

“metabolite-reaction” network, where reversible reactions are split into two, one for each di- rection of the reaction (see Fig.14.1D). This is necessary to be able to clearly mark substrates (inputs) and products (outputs) of each reaction.

Given this representation, we can build a function to identify all reactions that are active, pro- viding as input a list of metabolites assumed to be available within the cell (for the sake of simplicity, we are ignoring here the existence of possible cell compartments). Also, symmet- rically, we can also implement a function that identifies all metabolites that can be produced given a set of active reactions.

These functions are given in the next code block, added to theMetabolicNetworkclass, given its specificity for this type of networks.

c l a s s MetabolicNetwork : (...)

d e f active_reactions (s e l f, active_metabolites ):

i f s e l f. net_type != " metabolite−reaction " or n o t s e l f. split_rev :

r e t u r n None res = []

f o r v i n s e l f. node_types [’ reaction ’]:

preds = s e t(s e l f. get_predecessors (v))

i f l e n( preds ) >0 and preds . issubset (s e t( active_metabolites )):

res . append (v) r e t u r n res

d e f produced_metabolites(s e l f, active_reactions ):

res = []

f o r r i n active_reactions :

sucs = s e l f. get_successors (r) f o r s i n sucs :

i f s n o t i n res : res . append (s) r e t u r n res

Having these functions implemented, we can implement a function that can determine the complete set of metabolites that can be produced given an initial list of available metabolites.

This implies to iterate calls over the previous functions checking if the metabolites that can be produced lead to the activation of other reactions, which in turn can lead to new metabolites being produced, and so on. The process ends when no reactions can be added to the list of active reactions. This is implemented by the functionall_produced_metabolitesprovided below.

d e f all_produced_metabolites(s e l f, initial_metabolites ):

mets = initial_metabolites cont = True

w h i l e cont : cont = F a l s e

reacs = s e l f. active_reactions ( mets )

new_mets = s e l f. produced_metabolites( reacs )

f o r nm i n new_mets : i f nm n o t i n mets :

mets . append ( nm) cont = True r e t u r n mets

This function can be applied to the toy network given in the last section for testing.

d e f test4 () :

mrsn = MetabolicNetwork (" metabolite−reaction ", True) mrsn . load_from_file (" example−net . txt ")

mrsn . print_graph ()

p r i n t( mrsn . produced_metabolites ([" R1 "])) p r i n t( mrsn . active_reactions ([" M1 "," M2 "]))

p r i n t( mrsn . all_produced_metabolites ([" M1 "," M2 "])) p r i n t( mrsn . all_produced_metabolites ([" M1 "," M2 "," M6 "])) test4 ()

An exploration of these functions using theE. colinetwork is left as an exercise for the inter- ested reader.

Bibliographic Notes and Further Reading

A paper by Jeong et al. in 2000 [83] analyzed the metabolic network of 43 organisms from different domains of life, concluding that, despite the differences, they shared similar topological properties, including those related to their scale-free nature. The concept of scale-free networks and their characterization were firstly put forward in a 1999 paper by Barabási et al. [21]. In 2002, Ravasz et al. [132] introduced the concept of hierarchical networks, in the context of metabolism.

Interesting examples of the analysis of other types of network might be found, for instance, in [81], which studies the transcriptional network, and [16] that analyzes the signaling network, both of the baker’s yeast (Saccharomyces cerevisiae). A recent discussion of integrated networks, spanning different sub-systems, may be found in [67].

An important concept in network analysis, not covered here, is the topic of network motifs, which are patterns that are over-represented in different types of biological networks, is ad- dressed in a paper by Milo et al. in 2002 [114].

Barabási and Olivai wrote, in 2004 [23], a very comprehensive review on the field ofnet- work biology, where they review these results over metabolic networks, but also consider other types of biological networks, considering their properties and discussing their evolu- tionary origin. In a more recent review, the same author and colleagues have published a very interesting review on some applications of network biology related methods to medical re- search [22].

Finally, many other representation formalisms, many based in graphs, have been used to model cells and their sub-systems in the wider field of Systems Biology, many of those al- lowing for methods to perform several types of simulation leading to phenotype prediction.

An interesting review of such paradigms has been done in [106].

Exercises and Programming Projects

Exercises

1. Considering the networks of the type “metabolite-reaction” presented in this chapter, which correspond to bipartite graphs, write a method to add to the classMetabolic- Networkthat can detect the set of “final” metabolites, those that are produced by at least one reaction, but are not consumed by any reaction in the network.

2. Write a method to add to the classMetabolicNetworkthat, given a set of initial metabolites (assumed to be available) and a target metabolite, returns the shortest path to produce the target metabolite, i.e. the shortest list of reactions that activated in that order allow to produce the target metabolite from the initial set of metabolites. The reactions in the list are only valid if they can be active in the given order (i.e. all their substrates ex- ist). The method returnsNoneif the target metabolite cannot be produced from the list of initial metabolites.

Programming Projects

1. Considering the class to implement undirected graphs proposed in an exercise in the previous chapter, add methods to implement the concepts related to network topological analysis described in this chapter.

2. Consider the databaseKyoyo Encyclopedia of Genes and Genoems(KEGG) (http://

www.genome.jp/kegg/). Explore the Python interfaces for this database. Implement code to get the reactions and metabolites for a given organism and build its metabolic network. Explore networks of different organisms.

3. Explore one of the Python libraries implementing graphs, such asNetworkX(http://

networkx.github.io) origraph(http://igraph.org/python). Implement

metabolic networks using those libraries, with similar functionality to the ones implemented in this chapter.

4. Build an implementation of a regulatory network, containing regulatory genes (e.g. those encoding transcription factors) and regulated genes. Design a network able to represent activation and inhibition events.

Assembling Reads Into Genomes:

Graph-Based Algorithms

In this chapter, we will address some of the challenges posed by genome assembly, i.e. re- building a sequence from the overlapping fragments (reads) returned by the DNA sequencing equipment. We will show that the most effective algorithms for these challenging problems are based on graphs, most precisely on the concepts of Hamiltonian and Eulerian paths, and related algorithms. We will implement, in Python, algorithms for simplified versions of genome assembly problems and discuss their efficiency. As well as discuss some of the challenges involved in real-word genome assembly tasks and the their solutions from state-of- the-art programs.

Genes: Discrete Units of Genetic Information

Biological Sequences: Representations and Basic Algorithms