Báo cáo khoa học: From functional genomics to systems biology Meeting report based on the presentations at the 3rd EMBL Biennial Symposium 2006 (Heidelberg, Germany) Sergii Ivakhno pot

The main conference was divided into ﬁve separate sessions, which discussed different functional genomic approa-ches in systems biology: a A global view of transcrip-tional regulation, b

Trang 1

From functional genomics to systems biology

Meeting report based on the presentations at the 3rd EMBL Biennial Symposium 2006 (Heidelberg, Germany)

Sergii Ivakhno

Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, UK

Introduction

The third EMBL Biennial Symposium, From

func-tional genomics to systems biology, was held in

Heidel-berg, Germany, 14–17 October 2006 The title of the

conference clearly states the major challenges and

issues that were addressed by the speakers – how to

combine different ‘omics’ technologies and

bioinfor-matics⁄ computational methodologies to address

increasingly complex biological questions The main

conference was divided into ﬁve separate sessions,

which discussed different functional genomic

approa-ches in systems biology: (a) A global view of

transcrip-tional regulation, (b) Genomics of development and

disease, (c) Protein–protein interaction networks and

beyond, (d) Towards functional interaction networks, (e) Systems level analysis: from organisms to commu-nities

Table 1 gives a broad overview of topics presented

at the meeting according to the systems biology applications, types of high-throughput techniques, and biological networks From the sheer number of various high-throughput genomic approaches des-cribed at the meeting, it becomes clear that ‘postge-nome’ science has already entered the most exciting period of analyzing biological functions at the sys-tems-wide level Chromatin immunoprecipitation arrays (chip-on-chip), tiling arrays, DNA microar-rays, synthetic genetic armicroar-rays, high-content ﬂuorescent microscopy, protein microarrays, RNA interference

Keywords

DNA microarray medical applications;

functional genomics; genetic interaction

networks; networks biology; signalling

networks; systems biology

Correspondence

S Ivakhno, Institute for Adaptive and Neural

Computation, School of Informatics,

University of Edinburgh, E4, 5 Forrest Hill,

Edinburgh EH1 2QL, UK

Fax: +44 (0) 131 6506899

Tel: +44 (0) 131 6676000, ext.

0131 6684266

E-mail: s0567096@sms.ed.ac.uk

(Received 30 January 2007, revised 1 March

2007, accepted 12 March 2007)

doi:10.1111/j.1742-4658.2007.05794.x

This review discusses the talks presented at the third EMBL Biennial Sym-posium, From functional genomics to systems biology, held in Heidelberg, Germany, 14–17 October 2006 Current issues and trends in various sub-fields of functional genomics and systems biology are considered, including analysis of regulatory elements, signalling networks, transcription networks, protein–protein interaction networks, genetic interaction networks, medical applications of DNA microarrays, and metagenomics Several technological advances in the fields of DNA microarrays, identification of regulatory ele-ments in the genomes of higher eukaryotes, and MS for detection of pro-tein interactions are introduced Major directions of future systems biology research are also discussed

Abbreviations

RNAi, RNA interference; SGA, synthetic genetic array; TF, transcription factor; Y1H, yeast one-hybrid; Y2H, yeast two-hybrid.

Trang 2

(RNAi) screens, and high-throughput metagenomic

sequencing are some of the technologies discussed by

the speakers Computational methods and algorithms

were also an integral part of the conference, with

various systems biology applications of machine

learning, algorithmic network theory, differential

equation modelling, and simulation being introduced

In the following, I will discuss some of the talks

representing different areas of functional genomics,

networks and systems biology

Analysis of regulatory elements in the

genomes of higher eukaryotes

The ﬁrst session of the conference began with a talk

by E Birney from the European Bioinformatics

Insti-tute (Hinxton, Cambridgeshire, UK) Birney described

recent efforts of the ENCODE project (Encyclopaedia

of DNA Elements), a multi-institutional collaboration

supported by NIH and the Welcome Trust that

attempts to map all functional elements in the human

genome: promoters, enhancers, repressors⁄ silencers,

exons, origins of replication, sites of replication

ter-mination, transcription factor (TF)-binding sites,

methylation sites, deoxyribonuclease I-hypersensitive

sites, chromatin modiﬁcations, and multispecies

con-served sequences of as yet unknown function [1] The

pilot phase, which began in September 2003, is less

ambitious and targets 44 uniformly distributed regions

that comprise 1% of the genome Birney’s talk empha-sized the problem of mapping TF-binding sites and other elements that regulate transcription Standardiza-tion of the protocols and comparison of different tech-niques was one of the major challenges encountered in the pilot phase Another big problem concerned the annotation of the transcription regulatory elements In contrast with genomes of simple eukaryotes such as yeast, in which the regulatory elements occur upstream

of the genes that they regulate, in the human genome they are widely dispersed and occur between and within intones, making them very hard to map This will probably be the next big challenge for computa-tional biologists, who need to develop new algorithms for detecting regulatory elements with varying position

in the genome Comparative genomic approaches previously gave the best results for ﬁnding regulatory elements in eukaryotes [2]; however, additional devel-opments will be required to detect elements dispersed throughout the genome

L Steinmetz from EMBL (Heidelberg, Germany) described the application of tiling arrays for detection

of new transcripts and reﬁnement of boundary, struc-ture, and expression level of coding and noncoding transcripts in the yeast genome [3] Although the con-cept of using tiling arrays and gene expression to ﬁnd functional transcribed elements is not new (a review of the topic can be found in [4]), Steinmetz’s group and collaborators developed a new and more sensitive

Table 1 Overview of the topics covered in the meeting report according to systems biology applications, types of high-throughput tech-niques, and biological networks.

Type of biological

networks ⁄ area of

functional genomics

High-throughput functional genomic techniques

Systems biology applications

leukaemia AmpliChip Analysis of regulatory

elements

Tiling arrays, chromosome conformation capture

ENCODE project [1], expression in the yeast genome [3], analysis of globin locus enhancers [5]

Transcription regulatory

networks

Chip-on-chip, Y1H system, DNA microarray

Transcription regulatory network during muscle development in Drosophila [19]

Genetic interaction networks Yeast SGA [29,31],

epistatic mini-array profiles [35]

Yeast genetic interaction network Chemical–genetic interaction

networks

Protein interaction networks Y2H screens, MS-based analysis

of protein complexes [39]

Coverage and false positives in protein interaction networks [37]

dynamics in single cells using [48] Networks of networks:

metagenomics

High-throughput DNA sequencing Bacterial communities [50]

Trang 3

oligonucleotide array which contains 6.5 million

probes and interrogates both strands of the full

genomic sequence (accomplishing 8 nucleotide

resolu-tion for double-stranded targets) Signiﬁcant

expres-sion above background was detected for 5104 ORFs

(90%) during exponential growth in rich medium

Remarkably, 16% of the transcribed base pairs had

not been annotated before, which is rather surprising

considering more than 10 years of intensive analysis of

the yeast genome

As already mentioned, in many cases regulatory

ele-ments are located at distances up to several megabases

from their target genes, in which case control of gene

expression cannot be mediated through direct physical

interaction between genes and their regulatory

ele-ments The development of techniques to detect

long-distance interactions was the topic of J Dekker’s talk

from the University of Massachusetts Medical School

(Boston, MA, USA) He described the chromosome

conformation capture methodology which uses

formal-dehyde cross-linking to covalently link interacting

chromatin segments in intact cells [5] Cross-linked

chromatin is then solubilized and digested with an

appropriate restriction enzyme, which is then followed

by intramolecular ligation of cross-linked fragments

The resulting template therefore contains a large

col-lection of ligation products that reﬂect interaction

between two genomic loci and can be detected by

quantitative PCR using speciﬁc primers The

abun-dance of each ligation product can be used in a

quan-titative manner to measure the frequency with which

the two loci in the genome interact with each other

Dekker’s group applied this technique to the analysis

of globin locus enhancers and showed that

chromo-some conformation capture has a similar or better

sen-sitivity than the chip-on-chip approach The advantage

of chromosome conformation capture is that it can

detect regulatory elements that are active only in a

particular cellular state, developmental stage or cell

type

Functional genomics approaches

to diagnosis of diseases

One of the goals of systems biology and

high-through-put functional genomics is to develop better diagnostic

tools that would allow adoption of personalized

medi-cine approaches in clinical settings [6] Medical

appli-cations of systems biology and functional genomics

were widely discussed at the conference, with several

talks devoted to the use of DNA microarrays for

can-cer diagnosis and prognosis For instance, leukaemia

comprises more than 20 subgroups, which may require

different approaches for successful treatment Cur-rently, the diagnosis and classification of leukaemia rely on the simultaneous application of multiple tech-niques, such as cytomorphology, histomorphology, cytochemistry and multiparameter flow cytometry, often supplemented by fluorescence in situ hybridi-zation and molecular techniques, such as PCR These high-cost and time-consuming approaches have encouraged the development of more effective diagnos-tic techniques The use of DNA microarrays for can-cer diagnosis was proposed more than 10 years ago, yet not a single microarray diagnostic kit has been approved by the FDA One of the key challenges in using DNA microarrays for cancer diagnosis is the reproducibility of signature genes characterized by dif-ferent groups [7,8] This issue was addressed by

F Holstege from the Genomics Laboratory (UMC Utrecht, The Netherlands) in his talk on signatures for detection of lymph node metastasis in patients with head and neck cancer It can often be very difﬁcult to detect lymph node metastases reliably, but their early detection is crucial for the appropriate treatment Using DNA microarray, Holstege’s group and colla-borators built a 102-gene classiﬁer from 82 tumours, which outperformed current clinical diagnosis tech-niques in its predictive accuracy when independently validated [9] However, further examination revealed that, when the oldest tumour samples were excluded, the predictive accuracy remained high but the overlap between two signature gene sets found was limited to

49 genes [10] This is a typical example that led many researchers to question the validity of DNA micro-array approaches for cancer classification [11] Hols-tege proposed an alternative explanation for such a discrepancy: incomplete overlap may be caused by the presence of a large number of genes with similar pat-terns of expression across samples This suggests that many predictive genes can be interchanged without influencing the predictive outcome and that multiple, different gene sets can be used for accurate prediction [10] Holstege described how through repetitive samp-ling they found that 3000 different signature gene sets (comprising 825 unique genes occurring in one set at least) can classify tumour samples with similar high accuracy Holstege concluded that there is no single set of genes with optimal predictive accuracy and that various signatures can be identified by different insti-tutes or simply by using different samples This study also exposes the flaw behind common attempts to make signature gene lists as small as possible, the argument being that molecular signatures based on more genes will be less prone to biases towards specific samples

Trang 4

Next, T Haferlach from

Ludwig-Maximilians-Uni-versity (Munich, Germany) described the progress in

building the ﬁrst commercial ampliChip DNA

microar-ray for testing leukaemia which will be released by

Roche The major challenge facing clinical trials is the

large number of tumour samples that must be analyzed

to ensure the high accuracy of signature gene lists,

which often results in high costs and time delay

Reporting on the preliminary screens, Haferlach

des-cribed a DNA microarray study of 937 bone marrow

and peripheral blood samples from 892 patients with

all clinically relevant leukaemia subtypes They were

used to build a classiﬁer with overall prediction

accu-racy of 95.1% In the follow up round of clinical trials

carried out by Microarray Innovations in Leukaemia

(MILE, an international initiative with 11 centres from

Europe, USA and Singapore), DNA microarrays are

being used to analyze samples from more than 2000

patients The results from this and other studies will

help to restrict the number of genes on the ampliChip

to about 500 of the most predictive ones Although the

AmpliChip is not the ﬁrst array to enter the market

(the MammaPrint 70-gene signature for diagnosis of

breast cancer based on a study by van’t Veer et al [12]

is already available through Agendia), it could be the

ﬁrst one to obtain FDA approval for clinical tests

Haferlach estimates that, once the AmpliChip is

avail-able, it will provide a more accurate, faster and

cost-saving strategy for diagnosis of leukaemia

From systems to networks biology

Talks related to networks biology covered a large

por-tion of the meeting Among many different types of

biological networks discussed were gene regulatory

net-works, protein interaction netnet-works, genetic netnet-works,

signalling networks and networks of bacterial

commu-nities For interested readers, comprehensive surveys of

networks biology principles can be found in [13,14]

One of the reasons why networks biology receives such

close attention is that the network-based representation

of high-throughput biological data can serve as a core

around which more comprehensive information about

biological models can be arranged It also provides a

natural method for integration of different biological

data

Transcription regulatory networks

I begin by describing talks that addressed analysis of

transcription regulatory networks Transcription

regu-latory networks, ﬁrst described for Escherichia coli [15]

and yeast [16], consist of physical and functional

inter-actions between TFs and their target genes represented

on the graph [17] The systematic mapping of TF–target gene interactions has been very successful in unicellular systems using ‘TF-centred’ approaches, such as combi-nation of chromatin immunoprecipitation (ChIP) with promoter DNA microarrays (known as chip-on-chip), which identiﬁes a list of direct target genes for a partic-ular transcription factor under a given set of conditions However, as suggested by M Walhout from the Uni-versity of Massachusetts Medical School, metazoan sys-tems are less amenable to application of chip-on-chip methods First, TFs that are expressed at low levels, in

a few cells, or during a narrow developmental interval are not suitable for ‘TF-centred’ experiments Secondly, antibodies are only available for a very limited num-ber of metazoan TFs, restricting the applicability of chip-on-chip Walhout described an alternative

‘gene-centred’ approach for elucidating transcription regulatory networks, which uses a high-throughput gateway-compatible yeast one-hybrid (Y1H) system [18] Y1H is a genetic system based on the reporter gene expression in yeast that detects interactions between a

‘DNA bait’ (e.g cis-regulatory DNA elements or gene promoters) and ‘protein prey’ (e.g TFs) When a prey protein binds to the DNA bait, the heterologous activa-tion domain activates reporter gene expression Thus, physical interactions between repressors⁄ activators and their DNA targets can be identiﬁed

Walhout described an application of the Y1H sys-tem in Caenorhabditis elegans in which her group iden-tified 283 interactions between 72 digestive tract genes and 117 proteins, providing the first set of putative tar-get genes for nearly 10% of all predicted worm TFs Detailed analysis found that more than 70% of the promoters are bound by at least one of the top 10% most highly connected TFs In addition, 82% of the promoters are bound by at least one of the other less-well-connected interactors, and more than half of the target promoters bind both Summarizing these obser-vations, Walhout described a model of the transcrip-tion regulatory network in C elegans, where genes are subjected to three or more layers of transcriptional control The first layer consists of global regulators which control the expression of many genes in many different systems The second layer involves ‘master regulators’ which control the expression of multiple genes involved in specific cellular processes Finally, the third layer constitutes ‘specifiers’ which fine-tune the expression of a relatively small number of genes The description of the layered architecture for the

C elegans transcription regulatory network provides

an additional level of network hierarchy to previously described network motifs Quite interestingly, the

Trang 5

layered architecture of the C elegans network

resem-bles dense overlapping regions of the E coli

transcrip-tion network [15], although in the latter case such

coherent division into different levels of global

regula-tion were not observed

E Furlong from EMBL devoted her talk to the

recent study of the transcription regulatory network

during muscle development in Drosophila The main

approach adopted by Furlong’s group is a combination

of chip-on-chip arrays with DNA microarrays and

immunohistochemistry Using a combination of these

techniques, they obtained a temporal regulatory

net-work of Mef2 activity, the key myogenesis regulator

during Drosophila embryonic development [19] Two

novel ideas behind this approach are worth mentioning

First, they used gene expression proﬁling of Mef2

mutant embryos during the time course and found

genes requiring Mef2 for their correct expression at

var-ious stages of development This provided functional

validation of the chip-on-chip results and distinguished

between direct and indirect regulation Second, the

chip-on-chip was itself performed over the time course,

which identiﬁed temporal patterns of Mef2 target gene

regulation Although most of the reported transcription

networks based on chip-on-chip data are static,

Fur-long described one of the ﬁrst examples of a dynamic

transcription network which is relevant in the context

of developmental biology [20] This example also

reveals other crucial themes in biological networks

ana-lysis: integration of different data types for

reconstruc-tion of temporal and spatial relareconstruc-tions in the networks

As different high-throughput techniques become more

established and widespread, we can expect much wider

utilization of data integration approaches for building

more complex biological networks Ultimately, this will

lead to fusion of different biological networks, such as

signalling, transcription and metabolic, into the cellular

super network Several early attempts in this direction

have already produced interesting results For example,

Zhang et al [21] assembled an integrated yeast network

in which nodes represent genes (or their protein

prod-ucts) and edges represent various biological

interac-tions, such as protein–protein interacinterac-tions, genetic

interactions, transcriptional regulation, sequence

homology, and expression correlation A search for

sig-niﬁcantly enriched motifs in this integrated network

found speciﬁc ‘network themes’, higher-order network

structures that correspond to various biological

phe-nomena, such as ‘compensatory complexes’ Another

similar study found that ‘action’ networks (metabolic,

co-expression, and interaction) share the same

scaffold-ing of hubs, whereas the regulatory network uses

differ-ent regulatory hubs [22]

Networks derived from synthetic genetic interactions and RNAi screens

Other approaches to the construction of biological net-works focus on functional relations between different genes RNAi and synthetic lethal screens that are used for building epistatic and genetic networks were also covered at the meeting N Perrimon from Harvard Medical School (Boston, MA, USA) described how high-throughput RNAi screens can be used to analyze information ﬂow in Drosophila signal-transduction pathways One of the key considerations in such screens is the choice of appropriate read-out assays that can accurately assess the effect of gene knock-down on the pathway of interest [23] Whereas more proximal assays that measure activity near receptors would identify fewer regulators and may miss compo-nents of input branches from other receptors, distal readouts (e.g transcriptional reporters or morpho-logical outputs through ‘high-content screening’ micro-scopy [24]) may integrate more pathways than is desirable Therefore, for the comprehensive analysis of

a particular signalling pathway, several approaches should be combined to accurately identify correspond-ing phenotypes Perrimon described one example where 22 000 duplex RNAs were used for identiﬁca-tion of new Wnt pathway targets [25] The screening method relied on sensitive reporter genes containing T-cell factor-binding sites fused to a minimal promoter upstream of a the luciferase gene This set-up led to the identiﬁcation of 238 potential Wnt pathway genes

In the other RNAi screen, DNA microarrays were used as phenotypes to infer epistatic interactions or epistasis gene networks [26,27] Interestingly, similar approaches were independently developed for the ana-lysis of signalling networks, where kinase inhibitors and multiparameter ﬂow cytometry are used in place

of RNAi and DNA microarrays [28] In this case, availability of the single-cell data from flow cytometry allows accurate de novo reconstruction of signalling networks using machine learning algorithms However, disadvantages of this approach are the limited availa-bility of phospho-specific antibodies and the difficulty

in scaling up the ﬂow cytometry for simultaneous ana-lysis of multiple kinases

C Boone from The University of Toronto, Canada described two recent extensions to the synthetic genetic array (SGA) technology developed at his laboratory, which are based on detecting synthetic genetic action of essential genes and chemical–genetic inter-actions The idea behind the original technique is that most yeast genes are nonessential and therefore their knockdowns do not produce any observable

Trang 6

phenotypic defects [29] However, the combination of

mutations in two genes that cause cell death or

reduced ﬁtness provides a means of mapping genetic

interactions Genetic interactions among essential

genes were not examined systematically because of the

inherent difﬁculty in creating and working with

hypo-morphic (similar) alleles Boone described the use of

temperature-sensitive conditional alleles based on the

tetracycline (tet) promoter that overcomes this

chal-lenge A mutation in a particular query gene is ﬁrst

crossed to an input array of single mutants, and then a

series of robotic pinning steps generates the array of

double mutants, which is then scored for ﬁtness defects

relative to either of the single mutants With this

approach, Boone’s laboratory conducted 30 SGA

screens of 575 essential genes and built the

correspond-ing genetic network [30] This network resembles the

genetic network of nonessential genes: both have a

scale-free topology and most of the interactions do not

overlap with protein–protein interactions However,

the most notable property of the essential gene genetic

network is its density (median frequency of

interac-tions is 3%), which is ﬁve times higher than the

net-work density for nonessential genes These results

indicate that essential genes are well connected hubs

on the genetic interaction network, and that essential

pathways are also highly buffered compared with the

network of nonessential genes Interestingly, analogous

results were recently reported for the yeast

transcrip-tion network [31] Similar results obtained from the

analysis of different biological networks suggest that

scale-free architecture is not the only way to produce

biological robustness and that distributed architecture

may also contribute to the robustness in the same

net-work (although it may apply to different nodes in the

network, e.g TFs versus housekeeping genes)

SGA can also be used in combination with chemical

treatments to identify genes involved in mediating the

response to drug compounds [32] The approach is

based on the premise that, if a small molecule disrupts

the function of its target protein, then cells with a

smaller amount of that target protein would be more

sensitive to the compound In the second part of the

talk, Boone described a new screen with 82 compounds

against the Saccharomyces cerevisiae-viable deletion set

to generate chemical–genetic interaction proﬁles [32]

The clustering of the resulting data matrix identiﬁed

sets of compounds with similar biological effects and

genes that show sensitivity to similar compounds [33]

Several other talks also discussed analysis of genetic

networks For instance, one limitation of SGA is that

only negative interactions can be identiﬁed

Conse-quently, interactions that are detected generally involve

genes that have unrelated functions, which obscures the biological relevance and interpretation To over-come this limitation, N Krogan (University of Tor-onto, Canada) described a new technique, epistatic mini-array proﬁles, which consists of arrays of all double-mutant combinations for the genes involved in

a speciﬁc process [34] This approach involves measur-ing quantitative effects on colony growth, which, unlike looking for viability, can detect both positive and negative interactions

Protein interaction networks Protein interaction networks were also extensively dis-cussed at the meeting These networks usually repre-sent either direct or indirect (a part of a protein complex) physical interactions between proteins and are typically derived from yeast two-hybrid (Y2H) screens or MS-based analysis of protein complexes

(co-AP⁄ MS) [35] In most cases, protein interaction net-works are static and represent only a small subset of the true biological interactions M Vidal from Har-vard Medical School devoted his talk to the issues of network coverage and the effect of false negatives on the accuracy of the protein interaction network The small overlap between different Y2H maps is often attributed to low data accuracy However, Vidal argued that each map covers only 3–9% of the total interactome, so limited overlap should be expected To test this assumption, Vidal’s group developed a samp-ling algorithm for generation of many low coverage networks with properties similar to the current Y2H maps In almost 23 000 such comparisons, the interac-tome that was common to each pair comprised only 2.1%, which suggests that it is possible to observe per-fectly accurate samples (without false positives) that have very limited overlap solely because of the low coverage of their maps [36] Drawing from examples in the genome sequencing community, Vidal proposed a solution to this problem As any single study cannot possibly cover all the protein interactions, he suggested that individual research groups should continually con-tribute small subnetworks to the global interactome repository in the way it was done during sequencing of the human genome

The incompleteness of protein interaction networks might raise concerns about such well-established con-cepts as scale-free architecture, as it becomes unclear whether extrapolation of network topology from the currently limited data to the whole network can be achieved accurately and with high conﬁdence Current interactome networks are often attributed with power law degree distribution, in which most proteins interact

Trang 7

with a few partners, whereas a few proteins, ‘hubs’,

interact with many partners [37] In the biological

con-text, power law topology might relate to the generic

robustness of protein interaction networks, and the

hubs may be considered the most suitable targets for

drugs Vidal described a recent study by his group that

attempted to relate interactome network coverage to

the observable degree distribution [36] By sampling

from random networks with different degree

distribu-tions, they created multiple subnetworks of different

size (relating to the original random networks) For

instance, at 10% of coverage, random networks that

did not have power law distribution started exhibiting

scale-free behaviour Although more detailed

compar-ison with real Y2H and co-AP⁄ MS networks suggested

that complete a protein interactome map is still more

likely to be scale-free, other possibilities cannot be

ruled out, especially considering that many technical

false positives are auto-activators or sticky proteins

(creating nodes of artiﬁcially high degree)

Afﬁnity puriﬁcation methods allow macromolecules

physically associated with a tagged bait to be

retrieved and identiﬁed by MS These methods have

been used as large-scale screens in prokaryotic and

eukaryotic cells, leading to the construction of many

protein interaction maps [38] However, without

genome-wide coverage, assignment of a protein to a

particular complex relies heavily on experimental

stringency and arbitrary thresholds A.-C Gavin from

EMBL described the ﬁrst genome-wide screen for

protein complexes in budding yeast based on tandem

afﬁnity puriﬁcation coupled to MS [39] This method

identiﬁed 491 complexes, of which 257 were novel

Commenting on the data analysis, Gavin pointed out

that complexes can be partitioned into the core and

attachment proteins, which provide diversity to the

core and allow execution of functions under different

conditions Using the ‘guilt by association’ principle,

Gavin and collaborators also identiﬁed functions for

several novel modules involved in ribosome biogenesis

and RNA metabolism The functional association was

further aided by integration of protein interaction

data with data on gene expression, localization,

func-tion, evolutionary conservafunc-tion, protein structure and

binary interactions Finally, Gavin reported the

deve-lopment of a new scoring system for measuring the

potency of proteins for forming associations, the

‘socioafﬁnity index’ The socioafﬁnity index represents

the tendency of proteins to associate under different

conditions and therefore could be used to analyze the

yeast interactome network from the dynamic

perspec-tive The socioafﬁnity index is similar to several

meth-ods developed for detecting community structures in

social networks and therefore could be extended with algorithms proposed in that context [40,41] It would

be interesting to compare the ‘community structures’

in protein interaction networks obtained by different algorithms

Signalling networks The issue of modelling signalling networks was also discussed at the meeting Signalling networks differ from protein interaction or transcription networks

in that they are by nature temporal and therefore amenable to modelling of signal propagation in the network Stochastic and deterministic differential equa-tions (i.e ODE), process algebra and Boolean kinetics have been used to analyze signalling networks [42] These approaches attain the highest level of modelling accuracy by incorporating kinetic parameters directly into the network However, they require availability of complete information about the structure of the signal-ling network and the values of kinetic parameters Unfortunately, this is not available in many cases, especially when large cascades of 30 or more proteins are considered Excellent reviews on mechanistic mod-elling of signalling networks can be found in Kholo-denko [43] and Mogilner et al [42]; the motif-based⁄ dynamic systems approach is covered in [44] At least two distinct levels of modelling signalling net-works can be described (although many examples lie between these two extremes) In one approach, com-prehensive ODE modelling of all the species deemed to participate in a particular signal-transduction cascade

is attempted with numerical methods (an example of this approach can be found in [45]) An alternative

‘hypothesis-driven’ approach starts by introducing some prior assumptions into the model to simplify it

to a few equations that can then be solved analytically Although the resulting model becomes a highly abstract representation of the signalling network, it can be very powerful in addressing speciﬁc questions ([46] contains typical examples)

A van Oudenaarden from Massachusetts Institute

of Technology devoted his talk to the epigenetic inheritance of gene-expression dynamics in single cells using a ‘hypothesis-driven’ modelling approach van Oudenaarden described how, on induction of cell dif-ferentiation, distinct cell phenotypes can be encoded

by complex signalling networks that prevent pheno-type reversion even in the presence of signiﬁcant environmental ﬂuctuations [47] To explore the key parameters that determine the stability of cellular memory, the galactose network of yeast was used

as a model system One of the advantages of this

Trang 8

system over the networks of prokaryotes is that it

contains multiple nested feedback loops that bring

different functionalities to the complete network

Using ﬂuorescent microscopy and computational

energy landscape approaches [48], the van

Oude-naarden group revealed intricate combinations of

signalling circuits One of the ﬁndings was that the

core positive-feedback loop through GAL3 is

neces-sary for this cellular memory, whereas a

negative-feedback loop through GAL80 competes with the

positive GAL3 loop and reduces the potential for

memory storage Consistently, when the negative

feedback loop is opened and Gal80p levels are

con-trolled constitutively, the memory persistence can be

tuned from hours to months Such observations

pro-vide a quantitative understanding of the stability and

reversibility of cellular differentiation states It should

be noted that the deﬁnition of epigenetic inheritance

in this talk was not restricted to nonmutational

changes in the chromatin, but comprised all possible

sources of inheritance unrelated to DNA sequence,

such as distribution and concentration of key

regula-tory proteins in the cytoplasm

Networks of networks: metagenomics

applications in systems biology

Several other avenues in systems and networks biology

were also brieﬂy introduced by several speakers For

instance, in the talk ‘Metagenomics of organisms and

the air’, E Rubin from Lawrence Berkeley National

Laboratory (Berkeley, CA, USA) showed how

high-throughput DNA sequencing approaches can be used

to study and characterize organisms that are

imposs-ible to grow in the laboratory-controlled environment

[49] Metagenomic approaches rely on sequencing as a

tool to characterize microbial communities Rubin

des-cribed a study that investigated the composition of

organisms in the air harvested from two densely

popu-lated urban buildings Comparison of air samples with

each other and with nearby terrestrial and aquatic

environments suggested that indoor air microbes are

not random transients from surrounding environments,

but rather originate from indoor niches including

human occupants In another study described by

Rubin, an approach called ‘reverse genomics’ was used

to characterize a symbiotic microbial community in

the worm, Olavius algarvensis, which lacks mouth, gut,

and nephridia [50] This worm lives in several sediment

layers and forms species-speciﬁc associations with

extracellular bacterial endosymbionts located just

below the worm cuticle As the symbionts have not

been grown in culture, their phylogeny has only been

accessible through 16S ribosomal RNA analysis and ﬂuorescence in situ hybridization, which uses reverse genomics to decipher the organism’s functions from its sequence By shotgun sequencing, Rubin’s group was able to reconstruct the symbiotic relationship between the worm and four different microbes that accounts for the loss of digestive and excretory systems

in O algarvensis In one plausible model, the selective advantage of harbouring multiple symbionts lies in their ability to supply the worm with energy from the diverse supply of reducing and oxidizing compounds needed for the worm to survive in various environ-ments of different oxidized and reduced sediment layers

The third EMBL Biennial Symposium brought together researchers from several fields to discuss cur-rent issues and trends in various subfields of func-tional genomics and systems biology The overall meeting and the talks of the individual speakers out-lined several important directions in which systems biology may significantly progress over the next few years First, analysis of regulatory elements in the human genome could yield novel results with the availability of new technologies such as discussed in the report of the chromosome conformation capture technique, supplemented by new computational algo-rithms for detection of functional elements In the lat-ter respect, advanced probabilistic graphic modelling approaches that extend hidden Markov models might produce the best results Secondly, the networks bio-logy paradigm will probably gain a more central role

in systems biology research and produce many inter-esting research directions in the areas of algorithmic networks theory (i.e various topological and cluster-ing measures), flow of biological information (i.e maximum flow in biological networks), ODE-based modelling of signalling networks, and obviously net-works integration through algorithmic and machine learning approaches Finally, systems biology should progress from its promise to direct examples of medi-cally relevant research projects DNA microarrays may be the first successful systems biology⁄ functional genomics application for diagnosis and treatment of patients with cancer

Another important trend that was noticeable at the symposium was the methodology and scope of systems and networks biology research The meeting was no longer a place for computer scientists, physicists and biologists who wanted to apply their individual exper-tise for solving complex systems-wide biological prob-lems It was a meeting of systems biologists who understand the methodologies and paradigms of com-puter science, physics and biology and recognize the

Trang 9

limitations of each individual discipline and its role in

systems biology research More signiﬁcantly, there was

a clear trend towards a general understanding of what

constitutes an important problem in systems biology

and how it should be resolved by application of

rele-vant methods and techniques

References

1 Encode Project Consortium (2004) The ENCODE

(ENCyclopedia of DNA Elements) Project Science 306,

636–640

2 Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V,

Lindblad-Toh K, Lander ES & Kellis M (2005)

Sys-tematic discovery of regulatory motifs in human

promo-ters and 3¢ UTRs by comparison of several mammals

Nature 434, 338–345

3 David L, Huber W, Granovskaia M, Toedling J, Palm

CJ, Bofkin L, Jones T, Davis RW & Steinmetz LM

(2006) A high-resolution map of transcription in the

yeast genome Proc Natl Acad Sci USA 103, 5320–5325

4 Royce TE, Rozowsky JS, Bertone P, Samanta M, Stolc

V, Weissman S, Snyder M & Gerstein M (2005) Issues

in the analysis of oligonucleotide tiling microarrays for

transcript mapping Trends Genet 21, 466–475

5 Dekker J (2006) The three ‘C’ s of chromosome

con-formation capture: controls, controls, controls Nat

Methods 3, 17–21

6 Hood L, Heath JR, Phelps ME & Lin B (2004) Systems

biology and new technologies enable predictive and

preventative medicine Science 306, 640–643

7 Ein-Dor L, Kela I, Getz G, Givol D & Domany E

(2005) Outcome signature genes in breast cancer: is

there a unique set? Bioinformatics 21, 171–178

8 Novak K (2006) News feature: where the chips fall

Nat Med 12, 158–159

9 Roepman P, Wessels LFA, Kettelarij N, Kemmeren P,

Miles AJ, Lijnzaad P, Tilanus MGJ, Koole R, Hordijk

G-J, van der Vliet PC et al (2005) An expression proﬁle

for diagnosis of lymph node metastases from primary

head and neck squamous cell carcinomas Nat Genet 37,

182–186

10 Roepman P, Kemmeren P, Wessels LFA, Slootweg PJ

& Holstege FCP (2006) Multiple robust signatures for

detecting lymph node metastasis in head and neck

can-cer Cancer Res 66, 2361–2366

11 Michiels S, Koscielny S, Hill C (2205) Prediction of

cancer outcome with microarrays: a multiple random

validation strategy Lancet 365, 488–492

12 van ‘T, Veer LJ, Dai H, van de Vijver MJ, He YD,

Hart AAM, Mao M, Peterse HL, van der Kooy K,

Marton MJ, Witteveen AT et al (2002) Gene expression

proﬁling predicts clinical outcome of breast cancer

Nature 415, 530–536

13 Dobrin R, Beg Q, Barabasi A-L, Oltvai Z (2004) Aggregation of topological motifs in the E coli tran-scriptional regulatory network BMC Bioinformatics

5, 10

14 Tong AHY, Lesage G, Bader GD, Ding H, Xu H, Xin

X, Young J, Berriz GF, Brost RL, Chang M et al (2004) Global mapping of the yeast genetic interaction network Science 303, 808–813

15 Shen-Orr SS, Milo R, Mangan S & Alon U (2002) Net-work motifs in the transcriptional regulation netNet-work of Escherichia coli Nat Genet 31, 64–68

16 Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph

Z, Gerber GK, Hannett NM, Harbison CT, Thompson

CM, Simon I, et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae Science 298, 799–804

17 Blais A & Dynlacht BD (2005) Constructing transcriptional regulatory networks Genes Dev 19, 1499–1511

18 Deplancke B, Mukhopadhyay A, Ao W, Elewa AM, Grove CA, Martinez NJ, Sequerra R, Doucette-Stamm

L, Reece-Hoyes JS & Hope IA (2006) A gene-centered

C elegansprotein–DNA interaction network Cell 125, 1193–1205

19 Sandmann T, Jensen LJ, Jakobsen JS, Karzynski MM, Eichenlaub MP, Bork P & Furlong EEM (2006) A tem-poral map of transcription factor activity: Mef2 directly regulates target genes at all stages of muscle develop-ment Dev Cell 10, 797–807

20 Furlong EE (2004) Integrating transcriptional and sig-nalling networks during muscle development Curr Opin Genet Dev 14, 343–350

21 Zhang L, King O, Wong S, Goldberg D, Tong A, Lesage G, Andrews B, Bussey H, Boone C & Roth F (2005) Motifs, themes and thematic maps of an inte-grated Saccharomyces cerevisiae interaction network

J Biol 4, 6

22 Qi Y & Ge H (2006) Modularity and dynamics of cellu-lar networks PLoS Comput Biol 2, 174

23 Friedman A & Perrimon N (2006) High-throughput approaches to dissecting MAPK signaling pathways Methods 40, 262–271

24 Pepperkok R & Ellenberg J (2006) High-throughput ﬂuorescence microscopy for systems biology Nat Rev Mol Cell Biol 7, 690–696

25 DasGupta R, Kaykas A, Moon RT & Perrimon N (2005) Functional genomic analysis of the Wnt-Wingless signaling pathway Science 308, 826–833

26 Boutros M, Agaisse H & Perrimon N (2002) Sequential activation of signaling pathways during innate immune responses in Drosophila Dev Cell 3, 711–722

27 Markowetz F, Bloch J & Spang R (2005) Non-transcrip-tional pathway features reconstructed from secondary

Trang 10

effects of RNA interference Bioinformatics 21, 4026–

4032

28 Sachs K, Perez O, Pe’er D, Lauffenburger DA & Nolan

GP (2005) Causal protein-signaling networks derived

from multiparameter single-cell data Science 308,

523–529

29 Tong AHY, Evangelista M, Parsons AB, Xu H, Bader

GD, Page N, Robinson M, Raghibizadeh S, Hogue

CWV, Bussey H et al (2001) Systematic genetic analysis

with ordered arrays of yeast deletion mutants Science

294, 2364–2368

30 Mnaimneh S (2004) Exploration of essential gene

func-tions via titratable promoter alleles Cell 118, 31–44

31 Balaji S, Iyer LM, Aravind L & Babu MM (2006)

Uncovering a hidden distributed architecture behind

scale-free transcriptional regulatory networks, J Mol

Biol 360, 204–212

32 Parsons AB, Brost RL, Ding H, Li Z, Zhang C, Sheikh

B, Brown GW, Kane PM, Hughes TR & Boone C

(2004) Integration of chemical-genetic and genetic

inter-action data links bioactive compounds to cellular target

pathways Nat Biotech 22, 62–69

33 Dueck D, Morris QD & Frey BJ (2005) Multi-way

clustering of microarray data using probabilistic sparse

matrix factorization Bioinformatics 21, i144–151

34 Schuldiner M, Collins SR, Thompson NJ, Denic V,

Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C,

Greenblatt JF, et al (2005) Exploration of the function

and organization of the yeast early secretory pathway

through an epistatic miniarray proﬁle Cell 123, 507–519

35 Cusick ME, Klitgord N, Vidal M & Hill DE (2005)

Interactome: gateway into systems biology Hum Mol

Genet 14 Spec No 2, R171–81

36 Han J-DJ, Dupuy D, Bertin N, Cusick ME & Vidal M

(2005) Effect of sampling on topology predictions of

protein–protein interaction networks Nat Biotechnol

23, 839–844

37 Roverato A (2005) A uniﬁed approach to the

characteri-zation of equivalence classes of DAGs, chain graphs with

no ﬂags and chain graphs Scand J Stat 32, 295–312

38 Bouwmeester T, Bauch A, Ruffner H, Angrand P-O,

Bergamini G, Croughton K, Cruciat C, Eberhard D,

Gagneur J, Ghidelli S et al (2004) A physical and

functional map of the human TNF-a⁄ NF-jB signal transduction pathway Nat Cell Biol 6, 97–105

39 Gavin A-C, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld

B, et al (2006) Proteome survey reveals modularity of the yeast cell machinery Nature 440, 631–6

40 Girvan M & Newman MEJ (2002) Community struc-ture in social and biological networks Proc Natl Acad Sci USA 99, 7821–7826

41 Newman MEJ (2006) From the cover: modularity and community structure in networks Proc Natl Acad Sci USA 103, 8577–8582

42 Mogilner A, Wollman R & Marshall WF (2006) Quan-titative modeling in cell biology: what is it good for? Dev Cell 11, 279–287

43 Kholodenko BN (2006) Cell-signalling dynamics in time and space Nat Rev Mol Cell Biol 7, 165–176

44 Tyson JJ, Chen KC & Novak B (2003) Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signal-ing pathways in the cell Curr Opin Cell Biol 15, 221–231

45 Elowitz MB & Leibler S (2000) A synthetic oscillatory network of transcriptional regulators Nature 403, 335–338

46 Amonlirdviman K, Khare NA, Tree DRP, Chen W-S, Axelrod JD & Tomlin CJ (2005) Mathematical model-ing of planar cell polarity to understand domineermodel-ing nonautonomy Science 307, 423–426

47 Acar M, Becskei A & van Oudenaarden A (2005) Enhancement of cellular memory by reducing stochastic transitions Nature 435, 228–232

48 Becskei A, Kaufmann BB & van Oudenaarden A (2005) Contributions of low molecule number and chromo-somal positioning to stochastic gene expression Nat Genet 37, 937–944

49 Tringe SG & Rubin EM (2005) Metagenomics: DNA sequencing of environmental samples Nat Rev Genet 6, 805–814

50 Woyke T, Teeling H, Ivanova NN, Huntemann M, Richter M, Gloeckner FO, Boffelli D, Anderson IJ, Barry KW, Shapiro HJ, et al (2006) Symbiosis insights through metagenomic analysis of a microbial consor-tium Nature 443, 950–955

Định dạng
Số trang	10
Dung lượng	117,75 KB