The main conference was divided into five separate sessions, which discussed different functional genomic approa-ches in systems biology: a A global view of transcrip-tional regulation, b
Trang 1From functional genomics to systems biology
Meeting report based on the presentations at the 3rd EMBL Biennial Symposium 2006 (Heidelberg, Germany)
Sergii Ivakhno
Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, UK
Introduction
The third EMBL Biennial Symposium, From
func-tional genomics to systems biology, was held in
Heidel-berg, Germany, 14–17 October 2006 The title of the
conference clearly states the major challenges and
issues that were addressed by the speakers – how to
combine different ‘omics’ technologies and
bioinfor-matics⁄ computational methodologies to address
increasingly complex biological questions The main
conference was divided into five separate sessions,
which discussed different functional genomic
approa-ches in systems biology: (a) A global view of
transcrip-tional regulation, (b) Genomics of development and
disease, (c) Protein–protein interaction networks and
beyond, (d) Towards functional interaction networks, (e) Systems level analysis: from organisms to commu-nities
Table 1 gives a broad overview of topics presented
at the meeting according to the systems biology applications, types of high-throughput techniques, and biological networks From the sheer number of various high-throughput genomic approaches des-cribed at the meeting, it becomes clear that ‘postge-nome’ science has already entered the most exciting period of analyzing biological functions at the sys-tems-wide level Chromatin immunoprecipitation arrays (chip-on-chip), tiling arrays, DNA microar-rays, synthetic genetic armicroar-rays, high-content fluorescent microscopy, protein microarrays, RNA interference
Keywords
DNA microarray medical applications;
functional genomics; genetic interaction
networks; networks biology; signalling
networks; systems biology
Correspondence
S Ivakhno, Institute for Adaptive and Neural
Computation, School of Informatics,
University of Edinburgh, E4, 5 Forrest Hill,
Edinburgh EH1 2QL, UK
Fax: +44 (0) 131 6506899
Tel: +44 (0) 131 6676000, ext.
0131 6684266
E-mail: s0567096@sms.ed.ac.uk
(Received 30 January 2007, revised 1 March
2007, accepted 12 March 2007)
doi:10.1111/j.1742-4658.2007.05794.x
This review discusses the talks presented at the third EMBL Biennial Sym-posium, From functional genomics to systems biology, held in Heidelberg, Germany, 14–17 October 2006 Current issues and trends in various sub-fields of functional genomics and systems biology are considered, including analysis of regulatory elements, signalling networks, transcription networks, protein–protein interaction networks, genetic interaction networks, medical applications of DNA microarrays, and metagenomics Several technological advances in the fields of DNA microarrays, identification of regulatory ele-ments in the genomes of higher eukaryotes, and MS for detection of pro-tein interactions are introduced Major directions of future systems biology research are also discussed
Abbreviations
RNAi, RNA interference; SGA, synthetic genetic array; TF, transcription factor; Y1H, yeast one-hybrid; Y2H, yeast two-hybrid.
Trang 2(RNAi) screens, and high-throughput metagenomic
sequencing are some of the technologies discussed by
the speakers Computational methods and algorithms
were also an integral part of the conference, with
various systems biology applications of machine
learning, algorithmic network theory, differential
equation modelling, and simulation being introduced
In the following, I will discuss some of the talks
representing different areas of functional genomics,
networks and systems biology
Analysis of regulatory elements in the
genomes of higher eukaryotes
The first session of the conference began with a talk
by E Birney from the European Bioinformatics
Insti-tute (Hinxton, Cambridgeshire, UK) Birney described
recent efforts of the ENCODE project (Encyclopaedia
of DNA Elements), a multi-institutional collaboration
supported by NIH and the Welcome Trust that
attempts to map all functional elements in the human
genome: promoters, enhancers, repressors⁄ silencers,
exons, origins of replication, sites of replication
ter-mination, transcription factor (TF)-binding sites,
methylation sites, deoxyribonuclease I-hypersensitive
sites, chromatin modifications, and multispecies
con-served sequences of as yet unknown function [1] The
pilot phase, which began in September 2003, is less
ambitious and targets 44 uniformly distributed regions
that comprise 1% of the genome Birney’s talk empha-sized the problem of mapping TF-binding sites and other elements that regulate transcription Standardiza-tion of the protocols and comparison of different tech-niques was one of the major challenges encountered in the pilot phase Another big problem concerned the annotation of the transcription regulatory elements In contrast with genomes of simple eukaryotes such as yeast, in which the regulatory elements occur upstream
of the genes that they regulate, in the human genome they are widely dispersed and occur between and within intones, making them very hard to map This will probably be the next big challenge for computa-tional biologists, who need to develop new algorithms for detecting regulatory elements with varying position
in the genome Comparative genomic approaches previously gave the best results for finding regulatory elements in eukaryotes [2]; however, additional devel-opments will be required to detect elements dispersed throughout the genome
L Steinmetz from EMBL (Heidelberg, Germany) described the application of tiling arrays for detection
of new transcripts and refinement of boundary, struc-ture, and expression level of coding and noncoding transcripts in the yeast genome [3] Although the con-cept of using tiling arrays and gene expression to find functional transcribed elements is not new (a review of the topic can be found in [4]), Steinmetz’s group and collaborators developed a new and more sensitive
Table 1 Overview of the topics covered in the meeting report according to systems biology applications, types of high-throughput tech-niques, and biological networks.
Type of biological
networks ⁄ area of
functional genomics
High-throughput functional genomic techniques
Systems biology applications
leukaemia AmpliChip Analysis of regulatory
elements
Tiling arrays, chromosome conformation capture
ENCODE project [1], expression in the yeast genome [3], analysis of globin locus enhancers [5]
Transcription regulatory
networks
Chip-on-chip, Y1H system, DNA microarray
Transcription regulatory network during muscle development in Drosophila [19]
Genetic interaction networks Yeast SGA [29,31],
epistatic mini-array profiles [35]
Yeast genetic interaction network Chemical–genetic interaction
networks
Protein interaction networks Y2H screens, MS-based analysis
of protein complexes [39]
Coverage and false positives in protein interaction networks [37]
dynamics in single cells using [48] Networks of networks:
metagenomics
High-throughput DNA sequencing Bacterial communities [50]
Trang 3oligonucleotide array which contains 6.5 million
probes and interrogates both strands of the full
genomic sequence (accomplishing 8 nucleotide
resolu-tion for double-stranded targets) Significant
expres-sion above background was detected for 5104 ORFs
(90%) during exponential growth in rich medium
Remarkably, 16% of the transcribed base pairs had
not been annotated before, which is rather surprising
considering more than 10 years of intensive analysis of
the yeast genome
As already mentioned, in many cases regulatory
ele-ments are located at distances up to several megabases
from their target genes, in which case control of gene
expression cannot be mediated through direct physical
interaction between genes and their regulatory
ele-ments The development of techniques to detect
long-distance interactions was the topic of J Dekker’s talk
from the University of Massachusetts Medical School
(Boston, MA, USA) He described the chromosome
conformation capture methodology which uses
formal-dehyde cross-linking to covalently link interacting
chromatin segments in intact cells [5] Cross-linked
chromatin is then solubilized and digested with an
appropriate restriction enzyme, which is then followed
by intramolecular ligation of cross-linked fragments
The resulting template therefore contains a large
col-lection of ligation products that reflect interaction
between two genomic loci and can be detected by
quantitative PCR using specific primers The
abun-dance of each ligation product can be used in a
quan-titative manner to measure the frequency with which
the two loci in the genome interact with each other
Dekker’s group applied this technique to the analysis
of globin locus enhancers and showed that
chromo-some conformation capture has a similar or better
sen-sitivity than the chip-on-chip approach The advantage
of chromosome conformation capture is that it can
detect regulatory elements that are active only in a
particular cellular state, developmental stage or cell
type
Functional genomics approaches
to diagnosis of diseases
One of the goals of systems biology and
high-through-put functional genomics is to develop better diagnostic
tools that would allow adoption of personalized
medi-cine approaches in clinical settings [6] Medical
appli-cations of systems biology and functional genomics
were widely discussed at the conference, with several
talks devoted to the use of DNA microarrays for
can-cer diagnosis and prognosis For instance, leukaemia
comprises more than 20 subgroups, which may require
different approaches for successful treatment Cur-rently, the diagnosis and classification of leukaemia rely on the simultaneous application of multiple tech-niques, such as cytomorphology, histomorphology, cytochemistry and multiparameter flow cytometry, often supplemented by fluorescence in situ hybridi-zation and molecular techniques, such as PCR These high-cost and time-consuming approaches have encouraged the development of more effective diagnos-tic techniques The use of DNA microarrays for can-cer diagnosis was proposed more than 10 years ago, yet not a single microarray diagnostic kit has been approved by the FDA One of the key challenges in using DNA microarrays for cancer diagnosis is the reproducibility of signature genes characterized by dif-ferent groups [7,8] This issue was addressed by
F Holstege from the Genomics Laboratory (UMC Utrecht, The Netherlands) in his talk on signatures for detection of lymph node metastasis in patients with head and neck cancer It can often be very difficult to detect lymph node metastases reliably, but their early detection is crucial for the appropriate treatment Using DNA microarray, Holstege’s group and colla-borators built a 102-gene classifier from 82 tumours, which outperformed current clinical diagnosis tech-niques in its predictive accuracy when independently validated [9] However, further examination revealed that, when the oldest tumour samples were excluded, the predictive accuracy remained high but the overlap between two signature gene sets found was limited to
49 genes [10] This is a typical example that led many researchers to question the validity of DNA micro-array approaches for cancer classification [11] Hols-tege proposed an alternative explanation for such a discrepancy: incomplete overlap may be caused by the presence of a large number of genes with similar pat-terns of expression across samples This suggests that many predictive genes can be interchanged without influencing the predictive outcome and that multiple, different gene sets can be used for accurate prediction [10] Holstege described how through repetitive samp-ling they found that 3000 different signature gene sets (comprising 825 unique genes occurring in one set at least) can classify tumour samples with similar high accuracy Holstege concluded that there is no single set of genes with optimal predictive accuracy and that various signatures can be identified by different insti-tutes or simply by using different samples This study also exposes the flaw behind common attempts to make signature gene lists as small as possible, the argument being that molecular signatures based on more genes will be less prone to biases towards specific samples
Trang 4Next, T Haferlach from
Ludwig-Maximilians-Uni-versity (Munich, Germany) described the progress in
building the first commercial ampliChip DNA
microar-ray for testing leukaemia which will be released by
Roche The major challenge facing clinical trials is the
large number of tumour samples that must be analyzed
to ensure the high accuracy of signature gene lists,
which often results in high costs and time delay
Reporting on the preliminary screens, Haferlach
des-cribed a DNA microarray study of 937 bone marrow
and peripheral blood samples from 892 patients with
all clinically relevant leukaemia subtypes They were
used to build a classifier with overall prediction
accu-racy of 95.1% In the follow up round of clinical trials
carried out by Microarray Innovations in Leukaemia
(MILE, an international initiative with 11 centres from
Europe, USA and Singapore), DNA microarrays are
being used to analyze samples from more than 2000
patients The results from this and other studies will
help to restrict the number of genes on the ampliChip
to about 500 of the most predictive ones Although the
AmpliChip is not the first array to enter the market
(the MammaPrint 70-gene signature for diagnosis of
breast cancer based on a study by van’t Veer et al [12]
is already available through Agendia), it could be the
first one to obtain FDA approval for clinical tests
Haferlach estimates that, once the AmpliChip is
avail-able, it will provide a more accurate, faster and
cost-saving strategy for diagnosis of leukaemia
From systems to networks biology
Talks related to networks biology covered a large
por-tion of the meeting Among many different types of
biological networks discussed were gene regulatory
net-works, protein interaction netnet-works, genetic netnet-works,
signalling networks and networks of bacterial
commu-nities For interested readers, comprehensive surveys of
networks biology principles can be found in [13,14]
One of the reasons why networks biology receives such
close attention is that the network-based representation
of high-throughput biological data can serve as a core
around which more comprehensive information about
biological models can be arranged It also provides a
natural method for integration of different biological
data
Transcription regulatory networks
I begin by describing talks that addressed analysis of
transcription regulatory networks Transcription
regu-latory networks, first described for Escherichia coli [15]
and yeast [16], consist of physical and functional
inter-actions between TFs and their target genes represented
on the graph [17] The systematic mapping of TF–target gene interactions has been very successful in unicellular systems using ‘TF-centred’ approaches, such as combi-nation of chromatin immunoprecipitation (ChIP) with promoter DNA microarrays (known as chip-on-chip), which identifies a list of direct target genes for a partic-ular transcription factor under a given set of conditions However, as suggested by M Walhout from the Uni-versity of Massachusetts Medical School, metazoan sys-tems are less amenable to application of chip-on-chip methods First, TFs that are expressed at low levels, in
a few cells, or during a narrow developmental interval are not suitable for ‘TF-centred’ experiments Secondly, antibodies are only available for a very limited num-ber of metazoan TFs, restricting the applicability of chip-on-chip Walhout described an alternative
‘gene-centred’ approach for elucidating transcription regulatory networks, which uses a high-throughput gateway-compatible yeast one-hybrid (Y1H) system [18] Y1H is a genetic system based on the reporter gene expression in yeast that detects interactions between a
‘DNA bait’ (e.g cis-regulatory DNA elements or gene promoters) and ‘protein prey’ (e.g TFs) When a prey protein binds to the DNA bait, the heterologous activa-tion domain activates reporter gene expression Thus, physical interactions between repressors⁄ activators and their DNA targets can be identified
Walhout described an application of the Y1H sys-tem in Caenorhabditis elegans in which her group iden-tified 283 interactions between 72 digestive tract genes and 117 proteins, providing the first set of putative tar-get genes for nearly 10% of all predicted worm TFs Detailed analysis found that more than 70% of the promoters are bound by at least one of the top 10% most highly connected TFs In addition, 82% of the promoters are bound by at least one of the other less-well-connected interactors, and more than half of the target promoters bind both Summarizing these obser-vations, Walhout described a model of the transcrip-tion regulatory network in C elegans, where genes are subjected to three or more layers of transcriptional control The first layer consists of global regulators which control the expression of many genes in many different systems The second layer involves ‘master regulators’ which control the expression of multiple genes involved in specific cellular processes Finally, the third layer constitutes ‘specifiers’ which fine-tune the expression of a relatively small number of genes The description of the layered architecture for the
C elegans transcription regulatory network provides
an additional level of network hierarchy to previously described network motifs Quite interestingly, the
Trang 5layered architecture of the C elegans network
resem-bles dense overlapping regions of the E coli
transcrip-tion network [15], although in the latter case such
coherent division into different levels of global
regula-tion were not observed
E Furlong from EMBL devoted her talk to the
recent study of the transcription regulatory network
during muscle development in Drosophila The main
approach adopted by Furlong’s group is a combination
of chip-on-chip arrays with DNA microarrays and
immunohistochemistry Using a combination of these
techniques, they obtained a temporal regulatory
net-work of Mef2 activity, the key myogenesis regulator
during Drosophila embryonic development [19] Two
novel ideas behind this approach are worth mentioning
First, they used gene expression profiling of Mef2
mutant embryos during the time course and found
genes requiring Mef2 for their correct expression at
var-ious stages of development This provided functional
validation of the chip-on-chip results and distinguished
between direct and indirect regulation Second, the
chip-on-chip was itself performed over the time course,
which identified temporal patterns of Mef2 target gene
regulation Although most of the reported transcription
networks based on chip-on-chip data are static,
Fur-long described one of the first examples of a dynamic
transcription network which is relevant in the context
of developmental biology [20] This example also
reveals other crucial themes in biological networks
ana-lysis: integration of different data types for
reconstruc-tion of temporal and spatial relareconstruc-tions in the networks
As different high-throughput techniques become more
established and widespread, we can expect much wider
utilization of data integration approaches for building
more complex biological networks Ultimately, this will
lead to fusion of different biological networks, such as
signalling, transcription and metabolic, into the cellular
super network Several early attempts in this direction
have already produced interesting results For example,
Zhang et al [21] assembled an integrated yeast network
in which nodes represent genes (or their protein
prod-ucts) and edges represent various biological
interac-tions, such as protein–protein interacinterac-tions, genetic
interactions, transcriptional regulation, sequence
homology, and expression correlation A search for
sig-nificantly enriched motifs in this integrated network
found specific ‘network themes’, higher-order network
structures that correspond to various biological
phe-nomena, such as ‘compensatory complexes’ Another
similar study found that ‘action’ networks (metabolic,
co-expression, and interaction) share the same
scaffold-ing of hubs, whereas the regulatory network uses
differ-ent regulatory hubs [22]
Networks derived from synthetic genetic interactions and RNAi screens
Other approaches to the construction of biological net-works focus on functional relations between different genes RNAi and synthetic lethal screens that are used for building epistatic and genetic networks were also covered at the meeting N Perrimon from Harvard Medical School (Boston, MA, USA) described how high-throughput RNAi screens can be used to analyze information flow in Drosophila signal-transduction pathways One of the key considerations in such screens is the choice of appropriate read-out assays that can accurately assess the effect of gene knock-down on the pathway of interest [23] Whereas more proximal assays that measure activity near receptors would identify fewer regulators and may miss compo-nents of input branches from other receptors, distal readouts (e.g transcriptional reporters or morpho-logical outputs through ‘high-content screening’ micro-scopy [24]) may integrate more pathways than is desirable Therefore, for the comprehensive analysis of
a particular signalling pathway, several approaches should be combined to accurately identify correspond-ing phenotypes Perrimon described one example where 22 000 duplex RNAs were used for identifica-tion of new Wnt pathway targets [25] The screening method relied on sensitive reporter genes containing T-cell factor-binding sites fused to a minimal promoter upstream of a the luciferase gene This set-up led to the identification of 238 potential Wnt pathway genes
In the other RNAi screen, DNA microarrays were used as phenotypes to infer epistatic interactions or epistasis gene networks [26,27] Interestingly, similar approaches were independently developed for the ana-lysis of signalling networks, where kinase inhibitors and multiparameter flow cytometry are used in place
of RNAi and DNA microarrays [28] In this case, availability of the single-cell data from flow cytometry allows accurate de novo reconstruction of signalling networks using machine learning algorithms However, disadvantages of this approach are the limited availa-bility of phospho-specific antibodies and the difficulty
in scaling up the flow cytometry for simultaneous ana-lysis of multiple kinases
C Boone from The University of Toronto, Canada described two recent extensions to the synthetic genetic array (SGA) technology developed at his laboratory, which are based on detecting synthetic genetic action of essential genes and chemical–genetic inter-actions The idea behind the original technique is that most yeast genes are nonessential and therefore their knockdowns do not produce any observable
Trang 6phenotypic defects [29] However, the combination of
mutations in two genes that cause cell death or
reduced fitness provides a means of mapping genetic
interactions Genetic interactions among essential
genes were not examined systematically because of the
inherent difficulty in creating and working with
hypo-morphic (similar) alleles Boone described the use of
temperature-sensitive conditional alleles based on the
tetracycline (tet) promoter that overcomes this
chal-lenge A mutation in a particular query gene is first
crossed to an input array of single mutants, and then a
series of robotic pinning steps generates the array of
double mutants, which is then scored for fitness defects
relative to either of the single mutants With this
approach, Boone’s laboratory conducted 30 SGA
screens of 575 essential genes and built the
correspond-ing genetic network [30] This network resembles the
genetic network of nonessential genes: both have a
scale-free topology and most of the interactions do not
overlap with protein–protein interactions However,
the most notable property of the essential gene genetic
network is its density (median frequency of
interac-tions is 3%), which is five times higher than the
net-work density for nonessential genes These results
indicate that essential genes are well connected hubs
on the genetic interaction network, and that essential
pathways are also highly buffered compared with the
network of nonessential genes Interestingly, analogous
results were recently reported for the yeast
transcrip-tion network [31] Similar results obtained from the
analysis of different biological networks suggest that
scale-free architecture is not the only way to produce
biological robustness and that distributed architecture
may also contribute to the robustness in the same
net-work (although it may apply to different nodes in the
network, e.g TFs versus housekeeping genes)
SGA can also be used in combination with chemical
treatments to identify genes involved in mediating the
response to drug compounds [32] The approach is
based on the premise that, if a small molecule disrupts
the function of its target protein, then cells with a
smaller amount of that target protein would be more
sensitive to the compound In the second part of the
talk, Boone described a new screen with 82 compounds
against the Saccharomyces cerevisiae-viable deletion set
to generate chemical–genetic interaction profiles [32]
The clustering of the resulting data matrix identified
sets of compounds with similar biological effects and
genes that show sensitivity to similar compounds [33]
Several other talks also discussed analysis of genetic
networks For instance, one limitation of SGA is that
only negative interactions can be identified
Conse-quently, interactions that are detected generally involve
genes that have unrelated functions, which obscures the biological relevance and interpretation To over-come this limitation, N Krogan (University of Tor-onto, Canada) described a new technique, epistatic mini-array profiles, which consists of arrays of all double-mutant combinations for the genes involved in
a specific process [34] This approach involves measur-ing quantitative effects on colony growth, which, unlike looking for viability, can detect both positive and negative interactions
Protein interaction networks Protein interaction networks were also extensively dis-cussed at the meeting These networks usually repre-sent either direct or indirect (a part of a protein complex) physical interactions between proteins and are typically derived from yeast two-hybrid (Y2H) screens or MS-based analysis of protein complexes
(co-AP⁄ MS) [35] In most cases, protein interaction net-works are static and represent only a small subset of the true biological interactions M Vidal from Har-vard Medical School devoted his talk to the issues of network coverage and the effect of false negatives on the accuracy of the protein interaction network The small overlap between different Y2H maps is often attributed to low data accuracy However, Vidal argued that each map covers only 3–9% of the total interactome, so limited overlap should be expected To test this assumption, Vidal’s group developed a samp-ling algorithm for generation of many low coverage networks with properties similar to the current Y2H maps In almost 23 000 such comparisons, the interac-tome that was common to each pair comprised only 2.1%, which suggests that it is possible to observe per-fectly accurate samples (without false positives) that have very limited overlap solely because of the low coverage of their maps [36] Drawing from examples in the genome sequencing community, Vidal proposed a solution to this problem As any single study cannot possibly cover all the protein interactions, he suggested that individual research groups should continually con-tribute small subnetworks to the global interactome repository in the way it was done during sequencing of the human genome
The incompleteness of protein interaction networks might raise concerns about such well-established con-cepts as scale-free architecture, as it becomes unclear whether extrapolation of network topology from the currently limited data to the whole network can be achieved accurately and with high confidence Current interactome networks are often attributed with power law degree distribution, in which most proteins interact
Trang 7with a few partners, whereas a few proteins, ‘hubs’,
interact with many partners [37] In the biological
con-text, power law topology might relate to the generic
robustness of protein interaction networks, and the
hubs may be considered the most suitable targets for
drugs Vidal described a recent study by his group that
attempted to relate interactome network coverage to
the observable degree distribution [36] By sampling
from random networks with different degree
distribu-tions, they created multiple subnetworks of different
size (relating to the original random networks) For
instance, at 10% of coverage, random networks that
did not have power law distribution started exhibiting
scale-free behaviour Although more detailed
compar-ison with real Y2H and co-AP⁄ MS networks suggested
that complete a protein interactome map is still more
likely to be scale-free, other possibilities cannot be
ruled out, especially considering that many technical
false positives are auto-activators or sticky proteins
(creating nodes of artificially high degree)
Affinity purification methods allow macromolecules
physically associated with a tagged bait to be
retrieved and identified by MS These methods have
been used as large-scale screens in prokaryotic and
eukaryotic cells, leading to the construction of many
protein interaction maps [38] However, without
genome-wide coverage, assignment of a protein to a
particular complex relies heavily on experimental
stringency and arbitrary thresholds A.-C Gavin from
EMBL described the first genome-wide screen for
protein complexes in budding yeast based on tandem
affinity purification coupled to MS [39] This method
identified 491 complexes, of which 257 were novel
Commenting on the data analysis, Gavin pointed out
that complexes can be partitioned into the core and
attachment proteins, which provide diversity to the
core and allow execution of functions under different
conditions Using the ‘guilt by association’ principle,
Gavin and collaborators also identified functions for
several novel modules involved in ribosome biogenesis
and RNA metabolism The functional association was
further aided by integration of protein interaction
data with data on gene expression, localization,
func-tion, evolutionary conservafunc-tion, protein structure and
binary interactions Finally, Gavin reported the
deve-lopment of a new scoring system for measuring the
potency of proteins for forming associations, the
‘socioaffinity index’ The socioaffinity index represents
the tendency of proteins to associate under different
conditions and therefore could be used to analyze the
yeast interactome network from the dynamic
perspec-tive The socioaffinity index is similar to several
meth-ods developed for detecting community structures in
social networks and therefore could be extended with algorithms proposed in that context [40,41] It would
be interesting to compare the ‘community structures’
in protein interaction networks obtained by different algorithms
Signalling networks The issue of modelling signalling networks was also discussed at the meeting Signalling networks differ from protein interaction or transcription networks
in that they are by nature temporal and therefore amenable to modelling of signal propagation in the network Stochastic and deterministic differential equa-tions (i.e ODE), process algebra and Boolean kinetics have been used to analyze signalling networks [42] These approaches attain the highest level of modelling accuracy by incorporating kinetic parameters directly into the network However, they require availability of complete information about the structure of the signal-ling network and the values of kinetic parameters Unfortunately, this is not available in many cases, especially when large cascades of 30 or more proteins are considered Excellent reviews on mechanistic mod-elling of signalling networks can be found in Kholo-denko [43] and Mogilner et al [42]; the motif-based⁄ dynamic systems approach is covered in [44] At least two distinct levels of modelling signalling net-works can be described (although many examples lie between these two extremes) In one approach, com-prehensive ODE modelling of all the species deemed to participate in a particular signal-transduction cascade
is attempted with numerical methods (an example of this approach can be found in [45]) An alternative
‘hypothesis-driven’ approach starts by introducing some prior assumptions into the model to simplify it
to a few equations that can then be solved analytically Although the resulting model becomes a highly abstract representation of the signalling network, it can be very powerful in addressing specific questions ([46] contains typical examples)
A van Oudenaarden from Massachusetts Institute
of Technology devoted his talk to the epigenetic inheritance of gene-expression dynamics in single cells using a ‘hypothesis-driven’ modelling approach van Oudenaarden described how, on induction of cell dif-ferentiation, distinct cell phenotypes can be encoded
by complex signalling networks that prevent pheno-type reversion even in the presence of significant environmental fluctuations [47] To explore the key parameters that determine the stability of cellular memory, the galactose network of yeast was used
as a model system One of the advantages of this
Trang 8system over the networks of prokaryotes is that it
contains multiple nested feedback loops that bring
different functionalities to the complete network
Using fluorescent microscopy and computational
energy landscape approaches [48], the van
Oude-naarden group revealed intricate combinations of
signalling circuits One of the findings was that the
core positive-feedback loop through GAL3 is
neces-sary for this cellular memory, whereas a
negative-feedback loop through GAL80 competes with the
positive GAL3 loop and reduces the potential for
memory storage Consistently, when the negative
feedback loop is opened and Gal80p levels are
con-trolled constitutively, the memory persistence can be
tuned from hours to months Such observations
pro-vide a quantitative understanding of the stability and
reversibility of cellular differentiation states It should
be noted that the definition of epigenetic inheritance
in this talk was not restricted to nonmutational
changes in the chromatin, but comprised all possible
sources of inheritance unrelated to DNA sequence,
such as distribution and concentration of key
regula-tory proteins in the cytoplasm
Networks of networks: metagenomics
applications in systems biology
Several other avenues in systems and networks biology
were also briefly introduced by several speakers For
instance, in the talk ‘Metagenomics of organisms and
the air’, E Rubin from Lawrence Berkeley National
Laboratory (Berkeley, CA, USA) showed how
high-throughput DNA sequencing approaches can be used
to study and characterize organisms that are
imposs-ible to grow in the laboratory-controlled environment
[49] Metagenomic approaches rely on sequencing as a
tool to characterize microbial communities Rubin
des-cribed a study that investigated the composition of
organisms in the air harvested from two densely
popu-lated urban buildings Comparison of air samples with
each other and with nearby terrestrial and aquatic
environments suggested that indoor air microbes are
not random transients from surrounding environments,
but rather originate from indoor niches including
human occupants In another study described by
Rubin, an approach called ‘reverse genomics’ was used
to characterize a symbiotic microbial community in
the worm, Olavius algarvensis, which lacks mouth, gut,
and nephridia [50] This worm lives in several sediment
layers and forms species-specific associations with
extracellular bacterial endosymbionts located just
below the worm cuticle As the symbionts have not
been grown in culture, their phylogeny has only been
accessible through 16S ribosomal RNA analysis and fluorescence in situ hybridization, which uses reverse genomics to decipher the organism’s functions from its sequence By shotgun sequencing, Rubin’s group was able to reconstruct the symbiotic relationship between the worm and four different microbes that accounts for the loss of digestive and excretory systems
in O algarvensis In one plausible model, the selective advantage of harbouring multiple symbionts lies in their ability to supply the worm with energy from the diverse supply of reducing and oxidizing compounds needed for the worm to survive in various environ-ments of different oxidized and reduced sediment layers
The third EMBL Biennial Symposium brought together researchers from several fields to discuss cur-rent issues and trends in various subfields of func-tional genomics and systems biology The overall meeting and the talks of the individual speakers out-lined several important directions in which systems biology may significantly progress over the next few years First, analysis of regulatory elements in the human genome could yield novel results with the availability of new technologies such as discussed in the report of the chromosome conformation capture technique, supplemented by new computational algo-rithms for detection of functional elements In the lat-ter respect, advanced probabilistic graphic modelling approaches that extend hidden Markov models might produce the best results Secondly, the networks bio-logy paradigm will probably gain a more central role
in systems biology research and produce many inter-esting research directions in the areas of algorithmic networks theory (i.e various topological and cluster-ing measures), flow of biological information (i.e maximum flow in biological networks), ODE-based modelling of signalling networks, and obviously net-works integration through algorithmic and machine learning approaches Finally, systems biology should progress from its promise to direct examples of medi-cally relevant research projects DNA microarrays may be the first successful systems biology⁄ functional genomics application for diagnosis and treatment of patients with cancer
Another important trend that was noticeable at the symposium was the methodology and scope of systems and networks biology research The meeting was no longer a place for computer scientists, physicists and biologists who wanted to apply their individual exper-tise for solving complex systems-wide biological prob-lems It was a meeting of systems biologists who understand the methodologies and paradigms of com-puter science, physics and biology and recognize the
Trang 9limitations of each individual discipline and its role in
systems biology research More significantly, there was
a clear trend towards a general understanding of what
constitutes an important problem in systems biology
and how it should be resolved by application of
rele-vant methods and techniques
References
1 Encode Project Consortium (2004) The ENCODE
(ENCyclopedia of DNA Elements) Project Science 306,
636–640
2 Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V,
Lindblad-Toh K, Lander ES & Kellis M (2005)
Sys-tematic discovery of regulatory motifs in human
promo-ters and 3¢ UTRs by comparison of several mammals
Nature 434, 338–345
3 David L, Huber W, Granovskaia M, Toedling J, Palm
CJ, Bofkin L, Jones T, Davis RW & Steinmetz LM
(2006) A high-resolution map of transcription in the
yeast genome Proc Natl Acad Sci USA 103, 5320–5325
4 Royce TE, Rozowsky JS, Bertone P, Samanta M, Stolc
V, Weissman S, Snyder M & Gerstein M (2005) Issues
in the analysis of oligonucleotide tiling microarrays for
transcript mapping Trends Genet 21, 466–475
5 Dekker J (2006) The three ‘C’ s of chromosome
con-formation capture: controls, controls, controls Nat
Methods 3, 17–21
6 Hood L, Heath JR, Phelps ME & Lin B (2004) Systems
biology and new technologies enable predictive and
preventative medicine Science 306, 640–643
7 Ein-Dor L, Kela I, Getz G, Givol D & Domany E
(2005) Outcome signature genes in breast cancer: is
there a unique set? Bioinformatics 21, 171–178
8 Novak K (2006) News feature: where the chips fall
Nat Med 12, 158–159
9 Roepman P, Wessels LFA, Kettelarij N, Kemmeren P,
Miles AJ, Lijnzaad P, Tilanus MGJ, Koole R, Hordijk
G-J, van der Vliet PC et al (2005) An expression profile
for diagnosis of lymph node metastases from primary
head and neck squamous cell carcinomas Nat Genet 37,
182–186
10 Roepman P, Kemmeren P, Wessels LFA, Slootweg PJ
& Holstege FCP (2006) Multiple robust signatures for
detecting lymph node metastasis in head and neck
can-cer Cancer Res 66, 2361–2366
11 Michiels S, Koscielny S, Hill C (2205) Prediction of
cancer outcome with microarrays: a multiple random
validation strategy Lancet 365, 488–492
12 van ‘T, Veer LJ, Dai H, van de Vijver MJ, He YD,
Hart AAM, Mao M, Peterse HL, van der Kooy K,
Marton MJ, Witteveen AT et al (2002) Gene expression
profiling predicts clinical outcome of breast cancer
Nature 415, 530–536
13 Dobrin R, Beg Q, Barabasi A-L, Oltvai Z (2004) Aggregation of topological motifs in the E coli tran-scriptional regulatory network BMC Bioinformatics
5, 10
14 Tong AHY, Lesage G, Bader GD, Ding H, Xu H, Xin
X, Young J, Berriz GF, Brost RL, Chang M et al (2004) Global mapping of the yeast genetic interaction network Science 303, 808–813
15 Shen-Orr SS, Milo R, Mangan S & Alon U (2002) Net-work motifs in the transcriptional regulation netNet-work of Escherichia coli Nat Genet 31, 64–68
16 Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph
Z, Gerber GK, Hannett NM, Harbison CT, Thompson
CM, Simon I, et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae Science 298, 799–804
17 Blais A & Dynlacht BD (2005) Constructing transcriptional regulatory networks Genes Dev 19, 1499–1511
18 Deplancke B, Mukhopadhyay A, Ao W, Elewa AM, Grove CA, Martinez NJ, Sequerra R, Doucette-Stamm
L, Reece-Hoyes JS & Hope IA (2006) A gene-centered
C elegansprotein–DNA interaction network Cell 125, 1193–1205
19 Sandmann T, Jensen LJ, Jakobsen JS, Karzynski MM, Eichenlaub MP, Bork P & Furlong EEM (2006) A tem-poral map of transcription factor activity: Mef2 directly regulates target genes at all stages of muscle develop-ment Dev Cell 10, 797–807
20 Furlong EE (2004) Integrating transcriptional and sig-nalling networks during muscle development Curr Opin Genet Dev 14, 343–350
21 Zhang L, King O, Wong S, Goldberg D, Tong A, Lesage G, Andrews B, Bussey H, Boone C & Roth F (2005) Motifs, themes and thematic maps of an inte-grated Saccharomyces cerevisiae interaction network
J Biol 4, 6
22 Qi Y & Ge H (2006) Modularity and dynamics of cellu-lar networks PLoS Comput Biol 2, 174
23 Friedman A & Perrimon N (2006) High-throughput approaches to dissecting MAPK signaling pathways Methods 40, 262–271
24 Pepperkok R & Ellenberg J (2006) High-throughput fluorescence microscopy for systems biology Nat Rev Mol Cell Biol 7, 690–696
25 DasGupta R, Kaykas A, Moon RT & Perrimon N (2005) Functional genomic analysis of the Wnt-Wingless signaling pathway Science 308, 826–833
26 Boutros M, Agaisse H & Perrimon N (2002) Sequential activation of signaling pathways during innate immune responses in Drosophila Dev Cell 3, 711–722
27 Markowetz F, Bloch J & Spang R (2005) Non-transcrip-tional pathway features reconstructed from secondary
Trang 10effects of RNA interference Bioinformatics 21, 4026–
4032
28 Sachs K, Perez O, Pe’er D, Lauffenburger DA & Nolan
GP (2005) Causal protein-signaling networks derived
from multiparameter single-cell data Science 308,
523–529
29 Tong AHY, Evangelista M, Parsons AB, Xu H, Bader
GD, Page N, Robinson M, Raghibizadeh S, Hogue
CWV, Bussey H et al (2001) Systematic genetic analysis
with ordered arrays of yeast deletion mutants Science
294, 2364–2368
30 Mnaimneh S (2004) Exploration of essential gene
func-tions via titratable promoter alleles Cell 118, 31–44
31 Balaji S, Iyer LM, Aravind L & Babu MM (2006)
Uncovering a hidden distributed architecture behind
scale-free transcriptional regulatory networks, J Mol
Biol 360, 204–212
32 Parsons AB, Brost RL, Ding H, Li Z, Zhang C, Sheikh
B, Brown GW, Kane PM, Hughes TR & Boone C
(2004) Integration of chemical-genetic and genetic
inter-action data links bioactive compounds to cellular target
pathways Nat Biotech 22, 62–69
33 Dueck D, Morris QD & Frey BJ (2005) Multi-way
clustering of microarray data using probabilistic sparse
matrix factorization Bioinformatics 21, i144–151
34 Schuldiner M, Collins SR, Thompson NJ, Denic V,
Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C,
Greenblatt JF, et al (2005) Exploration of the function
and organization of the yeast early secretory pathway
through an epistatic miniarray profile Cell 123, 507–519
35 Cusick ME, Klitgord N, Vidal M & Hill DE (2005)
Interactome: gateway into systems biology Hum Mol
Genet 14 Spec No 2, R171–81
36 Han J-DJ, Dupuy D, Bertin N, Cusick ME & Vidal M
(2005) Effect of sampling on topology predictions of
protein–protein interaction networks Nat Biotechnol
23, 839–844
37 Roverato A (2005) A unified approach to the
characteri-zation of equivalence classes of DAGs, chain graphs with
no flags and chain graphs Scand J Stat 32, 295–312
38 Bouwmeester T, Bauch A, Ruffner H, Angrand P-O,
Bergamini G, Croughton K, Cruciat C, Eberhard D,
Gagneur J, Ghidelli S et al (2004) A physical and
functional map of the human TNF-a⁄ NF-jB signal transduction pathway Nat Cell Biol 6, 97–105
39 Gavin A-C, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld
B, et al (2006) Proteome survey reveals modularity of the yeast cell machinery Nature 440, 631–6
40 Girvan M & Newman MEJ (2002) Community struc-ture in social and biological networks Proc Natl Acad Sci USA 99, 7821–7826
41 Newman MEJ (2006) From the cover: modularity and community structure in networks Proc Natl Acad Sci USA 103, 8577–8582
42 Mogilner A, Wollman R & Marshall WF (2006) Quan-titative modeling in cell biology: what is it good for? Dev Cell 11, 279–287
43 Kholodenko BN (2006) Cell-signalling dynamics in time and space Nat Rev Mol Cell Biol 7, 165–176
44 Tyson JJ, Chen KC & Novak B (2003) Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signal-ing pathways in the cell Curr Opin Cell Biol 15, 221–231
45 Elowitz MB & Leibler S (2000) A synthetic oscillatory network of transcriptional regulators Nature 403, 335–338
46 Amonlirdviman K, Khare NA, Tree DRP, Chen W-S, Axelrod JD & Tomlin CJ (2005) Mathematical model-ing of planar cell polarity to understand domineermodel-ing nonautonomy Science 307, 423–426
47 Acar M, Becskei A & van Oudenaarden A (2005) Enhancement of cellular memory by reducing stochastic transitions Nature 435, 228–232
48 Becskei A, Kaufmann BB & van Oudenaarden A (2005) Contributions of low molecule number and chromo-somal positioning to stochastic gene expression Nat Genet 37, 937–944
49 Tringe SG & Rubin EM (2005) Metagenomics: DNA sequencing of environmental samples Nat Rev Genet 6, 805–814
50 Woyke T, Teeling H, Ivanova NN, Huntemann M, Richter M, Gloeckner FO, Boffelli D, Anderson IJ, Barry KW, Shapiro HJ, et al (2006) Symbiosis insights through metagenomic analysis of a microbial consor-tium Nature 443, 950–955