1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Simulation of Biological Processes phần 5 docx

29 292 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 29
Dung lượng 669,22 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

deciphering molecular interactions in a context speci¢c manner; c obtaining the spatiotemporal location of the signalling events; d reconstructing signalling modules and networks evoked

Trang 2

native network are computable, which is like computing small perturbationsaround the native structure of a protein However, the dynamics of celldi¡erentiation, for example, would be extremely di⁄cult to compute, which islike computing the dynamics of protein folding from the extended chain to thenative structure A perturbation to the network may be internal or external Aninternal perturbation is a genomic change such as a gene mutation or a molecularchange such as a protein modi¢cation, and an external perturbation is a change inthe environment of the cell.

Although we do not yet have a proper way to compute dynamic responses of thenetwork to small perturbations, a general consideration can be made Figure 7illustrates the basic system architecture that results from the interactions with theenvironment The basic principle of the native structure formation of a globularprotein is that it consists of the conserved hydrophobic core to stabilize the globuleand the divergent hydrophilic surface to perform speci¢c functions The proteininteraction network in the cell seems to have a similar dual architecture It consists

of the conserved core such as metabolism for the basic maintenance of life and thedivergent surface such as transporters and receptors for interactions with theenvironment The subnetwork of genetic information processing may also have adual architecture: the conserved core of RNA polymerase and ribosome and thedivergent surface of transcription factors In both cases the core is encoded by aset of orthologous genes that are conserved among organisms, and the surface is

Trang 3

100 KANEHISA

Trang 4

encoded by sets of paralogous genes that are dependent on each organism Thus,

we expect that the genomic compositions of di¡erent types of genes in di¡erentorganisms re£ect the environments which they inhabit and also the stability ofthe network against environmental perturbations By comparative analysis of anumber of genomes, together with experimental data observing perturbation^response relations such as by microarray gene expression pro¢les, we hope tocome up with a ‘conformational energy’ of the protein interaction network,which would then be utilized to compute a perturbed network by an energyminimization procedure

Acknowledgements

This work was supported by grants from the Ministry of Education, Culture, Sports, Science and Technology of Japan, the Japan Society for the Promotion of Science, and the Japan Science and Technology Corporation.

References

Kanehisa M 1997 A database for post-genome analysis Trends Genet 13:375^376

Kanehisa M 2000 Post-genome informatics Oxford University Press, Oxford

Kanehisa M 2001 Prediction of higher order functional networks from genomic data Pharmacogenomics 2:373^385

Kanehisa M, Goto S, Kawashima S, Nakaya A 2002 The KEGG databases at GenomeNet Nucleic Acids Res 30:42^46

Ogata H, Fujibuchi W, Goto S, Kanehisa M 2000 A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters Nucleic Acids Res 28:4021^4028

DISCUSSION

Subramaniam:How would one go about making comparisons of microarray datawith yeast two-hybrid data, which have di¡erent methods of interaction distanceassessment and completely di¡erent metrics?

Kanehisa: At the moment we don’t include a numerical value We just saywhether the edge is present or not It is a kind of logical comparison If we startincluding the metrics we run into the problem of how we balance two di¡erentgraphs We would need to normalize them

Subramaniam:When you draw networks by analogy, using your graph-relatedmethods, if you have more nodes adding on going from a pathway in one organism

to a pathway in another organism, it is not a problem because you can add morenodes But what if the state of the protein is di¡erent in the two pathways? We have

a good example with receptor tyrosine kinases: there are two di¡erentphosphorylation states of this In one case there are two tyrosinesphosphorylated, in another there are four How do you deal with this distinction

in the state-dependent properties of the graph?

Trang 5

Kanehisa:At the moment we don’t distinguish di¡erent states We are satis¢edwith just relating each node to the genomic information As long as we have thebox coloured, which means that the gene is present, that is su⁄cient  our interest

is to obtain a rough picture of the global network, not details of individualpathways

Reinhardt: Take the following scenario I am trying to predict a protein^protein interaction from expression pro¢les I take two di¡erent genes, look

at them across a number of experiments and construct and compare thevectors I ¢nd that one of the genes has two biochemical roles, and isshuttling between two compartments Then what I would need, when I try

to speak in the language of sequence analysis, is a local alignment Currently,all we do in expression pro¢ling is to compute a global alignment We are inthe Stone Age Have you any idea of how to address this need for localalignment? Given your concluding Pearson correlation coe⁄cient of 0.97, itwouldn’t work if you have multifunctional proteins How do you addressthis?

Kanehisa:Again, just looking at expression data it is very di⁄cult to ¢nd theright answer But we have an additional set of data, including yeast two-hybriddata Integration of di¡erent types of data is the way we want to do the screening.Together with an additional data set we can ¢nd the local similarity when we do thegraph comparison

Crampin:How do you go about incorporating data other than just connectivity,for example the strengths of interactions between components of a network?Obviously, if you are describing atoms within a protein molecule, this is not ofsuch great importance But if you are looking at networks at the signalling level,the strengths of interactions may be crucial Interestingly, there are somemodelling results suggesting that for some gene networks it is the topology andnot the strengths of connections that is responsible for the behaviour of thenetwork (von Dassow et al 2000)

Kanehisa:We see this database as the starting point of giving you all candidates Byusing this database and then screening it is possible to identify subsets of candidates

If you have additional information, this may help identify subsets among the results.Then you can start incorporating kinetic parameters and so forth

Crampin:As you go up in scale from purely molecular data, you also need toinclude spatial information Are there clear ways of doing this?

Kanehisa:This can be done We showed the distinction of organism-speci¢cpathways by colouring The spatial information can be included by di¡erentcolouring or by drawing di¡erent diagrams

Subramaniam:From your graphs can you de¢ne modules for pathways that canthen be used for modelling at higher levels? Is there an automatic emergence of thenatural de¢nition of ‘module’

Trang 6

Kanehisa:Yes The reason why we are able to ¢nd graph features such as hubs andcliques is that the graph can be viewed at a lower resolution We are trying to ¢nd acomposite node or a module that can be used as a higher-level node in modelling.Berridge:So if you put Ras into your model, would it predict the MAP kinasepathway?

Kanehisa:I’m not sure First, we need a kinetics scheme among modules, which isnot present in our graph But maybe we can tell you which modules to consider.Reinhardt: As an example of how this approach might be used, if you have aprotein and you don’t know what it does, you can ask this system to give it itsbiological context If you think about it, half of the genes in the genome are ofunknown function In the future we will have whole genome A¡ymetrix-stylechips, and this will be a very important tool We can go to this 50% of unknowngenes, run it across a series of tissue samples and then try to see which pathwaysthese genes are involved with and which proteins they are interacting with Thiswould give us a rough idea of the biological context of these unknown genes.Reference

von Dassow G, Meir E, Munro EM, Odell GM 2000 The segment polarity network is a robust developmental module Nature 406:188^192

Trang 7

Bioinformatics of cellular signalling

Shankar Subramaniam and the Bioinformatics Core Laboratory

Departments of Bioengineering and Chemistry and Biochemistry, The University of California

at San Diego and The San Diego Supercomputer Center, La Jolla, CA 92037, USA

Abstract The completion of the human genome sequencing provides a unique opportunity to understand the complex functioning of cells in terms of myriad biochemical pathways Of special signi¢cance are pathways involved in cellular signalling Understanding how signal transduction occurs in cells is of paramount importance to medicine and pharmacology The major steps involved in deciphering signalling pathways are: (a) identifying the molecules involved in signalling; (b) ¢guring out who talks to whom, i.e deciphering molecular interactions in a context speci¢c manner; (c) obtaining the spatiotemporal location of the signalling events; (d) reconstructing signalling modules and networks evoked in speci¢c response to input; (e) correlating the signalling response to di¡erent cellular inputs; and (f) deciphering cross-talk between signalling modules in response to single and multiple inputs High-throughput experimental investigations o¡er the promise of providing data pertaining to the above steps A major challenge, then, is the organization of this data into knowledge in the form of hypothesis, models and context-speci¢c under- standing The Alliance for Cellular Signaling (AfCS) is a multi-institution, multidisciplinary project and its primary objective is to utilize a multitude of high throughput approaches to obtain context-speci¢c knowledge of cellular response to input It is anticipated that the AfCS experimental data in combination with curated gene and protein annotations, available from public repositories, will serve as a basis for reconstruction of signalling networks It will then be possible to model the networks mathematically to obtain quantitative measures of cellular response In this paper we describe some of the bioinformatics strategies employed in the AfCS.

2002 ‘In silico’ simulation of biological processes Wiley, Chichester (Novartis Foundation Symposium 247) p 104^118

The response of a mammalian cell to input is mediated by intracellular signallingpathways Such pathways have been the focus of extensive research ranging frommechanistic biochemistry to pharmacology The availability of the complete gen-ome sequences portends the potential to provide a detailed parts list from which allsignalling networks can eventually be constructed However, the genome merelyprovides the constitutive genes and carries no information on the on the exact state

of the protein that manifests function

In order to map signalling networks in mammalian cells it is desirable to obtain

an inventory of the contents of the cell in a spatiotemporal context, such that thepresence and concentration of every species is mapped from cellular input to

104

‘In Silico’ Simulation of Biological Processes: Novartis Foundation Symposium, Volume 247

Edited by Gregory Bock and Jamie A Goode Copyright ¶ Novartis Foundation 2002.

ISBN: 0-470-84480-9

Trang 8

response The ‘functional states’ of proteins and their interactions then can beconstituted into a network which can then serve as a model for computation andfurther experimental investigations (Duan et al 2002).

The Alliance for Cellular Signaling (AfCS) (http://www.afcs.org), is a institutional, multi-investigator e¡ort aimed at parsing cellular response to input

multi-in a context-dependent manner The major objectives of this e¡ort are to carry outextensive measurements of the parts list of the cell involved in cellular signalling toanswer the question of where, when and how proteins parse signals within cellsleading to a cellular response The measurements include ligand screen experi-ments that provide snapshots of the concentrations of the intracellular secondmessengers, phosphorylated proteins and gene transcripts after the addition ofde¢ned ligand inputs to the cell Further, protein interaction screens provide adetailed list of interacting proteins and £uorescent microscopy provides thelocation within the cell where speci¢c events occur These measurements inconjunction with phenotypic measurements such as movement of B cells in thepresence of chemoattractants and contractility in cardiac myocyte cells canprovide insights into the intracellular signalling framework

The ligand screen experiments are expected to provide a measure of similarity

of cellular response to di¡erent inputs and as a consequence provide insights intothe signalling network The data are publicly disseminated prior to analysis bythe AfCS laboratories through the AfCS website (http://www.afcs.org) Furtherexperiments include a variety of interaction screens including yeast two-hybridand co-immunoprecipitation It is expected that the combined data from theseexperiments will provide the input for reconstruction of the signalling networkReconstruction of biochemical networks is a complex task In metabolism, thetask is somewhat simpli¢ed because of the nature of the network, where each steprepresents the enzymatic conversion of a substrate into a product (Michal 1999).This is not the case in cellular signalling The role of each protein in a signallingnetwork is to communicate the signal from one node to the next, and to accomplishthis the protein has to be in a de¢ned signalling ‘state’ The state of a signallingmolecule is characterized by covalent modi¢cations of the native polypeptide, thesubstrates/ligands bound to the protein, its state of association with other proteinpartners, and its location in the cell A signalling molecule may be a receptor, achannel, an enzyme, or several other functionally de¢ned species, depending onits state In the process of parsing a signal, a molecule may undergo a transitionfrom one functional state to another We de¢ne the Molecule Pages databasewhich will provide a catalogue of states of each signalling molecule, such thatone can begin to reconstruct signalling pathways with molecules in well-de¢nedstates functioning as nodes of a network Interactions within and betweenfunctional states of molecules, as well as transitions between functional states,provide the building blocks for reconstruction of a signalling network The

Trang 9

AfCS experiments will test and validate such interactions and transitions inspeci¢c cells of interest.

The Molecule Pages database

‘Molecule Pages’ are the core elements of a comprehensive, literature-derivedobject-relational (Oracle) database that will capture qualitative and quantitativeinformation about a large number of signalling molecules and the interactionsbetween them The Molecule Pages contain data from all relevant publicrepositories and curated data from published literature entered by expert authors.Authors will construct Molecule Pages by entry of information from the literatureinto Web-based forms designed to standardize data input The principal barrier

on constructing a database such as this lies in the complex vocabulary used bybiologists to de¢ne entities relating to a molecule The database can only beuseful if it is founded on a structured vocabulary along with de¢ned relationshipsbetween objects that constitute the database (Carlis & Maguire 2001) The building

of this ‘schema’ thus is the ¢rst step towards the reconstruction of signallingnetworks The schema for sequence and other annotation data obtained frompublic data repositories is presented below A detailed schema for the author-curated data will be presented elsewhere

Automated data for Molecule List and Molecule Pages

The automated data component of each Molecule Page comprises informationobtained from external database records related in some way to the speci¢c AfCSprotein This includes SwissProt, GenBank, LocusLink, Pfam, PRINTS andInterpro data as well as Blast analysis results from comparing against a non-redundant set of sequence databases (created by the AfCS bioinformatics group).Generation of Protein List sequences

Protein and nucleic numbers are read on a nightly basis from the AfCS Protein List(by a Perl program), and they are used to scan the NCBI Fasta databases to ¢nd thesequences A tool that reports back information and any discrepancies (based onthe GI numbers that were assigned) is available for use by the Protein List editors.Fasta ¢les for all AfCS proteins and nucleotides are generated, with coded headersthat allow us to tie each sequence to its AfCS ID The Fasta ¢les as well as a text ¢lecontaining a spreadsheet-like view of the AfCS Protein List can be downloaded bythe public from an anonymous ftp server The Fasta protein ¢le is used as the basisfor further analysis

All AfCS data are stored in Oracle tables, keyed on the Protein GI number Linksare provided to NCBI A database is used to store information to allow each

Trang 10

sequence to be imported the Biology Workbench for further analysis This process

is run about once a month, and consists of a set of PERL programs, which launchthe various jobs, parse the output, and load the parsed output into the Oracledatabase

Supporting databases for Molecule Pages

In order to support all the annotation, entire copies of each relevant database aremirrored in £at ¢le form on the Alliance Information Management System Thesedatabases include Genbank, Refseq, SwissProt/TrEMBL/TrEMBLnew,LocusLink, MGDB (Mouse Genome Database from Jackson Laboratories), PIR,PRINTS, Pfam, InterPro, and the NCBI Blastable non-redundant protein data-base ‘NCBI-NR’ These databases are updated every day, if changes in the parentrepositories are detected Some of the databases (or sections of the databases) areconverted to a relational form and uploaded to the Oracle system to make theanalysis system more e⁄cient

The NCBI-NR database contains all the translations from Genbank, PIRsequences, and SwissProt sequences It does not contain information on TrEMBLsequences, however, and many public databases contain SwissProt/TrEMBLreferences exclusively This necessitated the construction of an in-house combinednon-redundant database, called ‘CNR’ for short

In addition to database links, title information and the sequence, CNR databasecontains date information (last update of the sequence) and NCBI taxonomy IDwhere available The database also contains the sequences SwissProt/TrEMBLclassify as splice variants, variants and con£icts (these are generally features withinthose records, so a special parser provided by SwissProt is used to generate thosevariant sequences) A Perl program constructs this database on a weekly basis, and

a combination of a Perl/DBI script and Oracle sqlldr is used to load the database

to the Alliance Information Management Oracle System

The interface pages are logical groups of the automated data, and are subject torearrangement and reclassi¢cation Making changes will have no e¡ect on theunderlying schema or the methods for obtaining the data Examples of schemafor automated data, employed in the molecule page database, for annotatingGenBank, SwissProt, LocusLink and Motif and Domain data are shown inFigs 1^3

Design of the Signalling Database and Analysis System

The Molecule Pages will serve as a component of the large Signalling Database andAnalysis System This system would have the capability to compare automated andexperimental data to elucidate the network components and connectivities in acontext-dependent manner Thus, we can use our biological knowledge of the

Trang 14

putative signalling pathways and concomitant protein interactions to interrogatelarge-scale experimental data The analysis of the data can then serve to form are¢ned pathway hypothesis and, as a consequence, suggest new experiments.The process of construction of pathway models requires the assembly of anextended signalling database and analysis system The main components of such asystem are a pathway graphical user interface (GUI) for representing both legacyand reconstructed pathways, an underlying data structure that can parse theobjects in the GUI into database objects, a signalling pathway database (inOracle), analysis links between the signalling GUI and other databases, and links

to systems analysis and modelling tools

The components of the Signalling Database and Analysis System include:(a) Creation of an integrated signalling GUI and database system

(b) Design of a system for testing legacy pathways against AfCS experimental data(c) Reconstruction of signalling pathways

(d) Creation of tools for validation of pathway models

An overview of an integrated signalling database environment is presented inFig 4

Computer science strategies

Development of an integrated system of this nature requires the amalgamation offour separate pieces, namely Java, Oracle, Enterprise Java Beans (EJB) and XML(eXtensible Markup Language) We envision an application based on a three-tierparadigm, consisting of the following components

System architecture The system is based on a three-tier architecture (Tsichritzis &Klug 1978), as illustrated in the following diagram (Fig 5) An Oracle 9i databaseserver is connected through a middle tier, Oracle application server (OAS) 9ifrom a client web browser or a stand-alone application using Java swing OAS9i can reduce the number of database connections from client by combinationand then connect to the database server Java Servlets, Java Server Page (JSP),Java Beans and/or EJB are used to separate business logic and presentation for adynamic web interface In the business logic middle tier, Java Beans and EJB areused.With Object Oriented features and component-oriented programming, JavaAPI bene¢ts our interface development

Communication between swing client and middle tier will be through EJBcomponents or via HTTP by talking to servlet/JSP The latter allows easynavigation through ¢rewalls, while the former allows the client to call the serverusing intuitive method names, obviates the need for XML parsing, and automati-cally gives remote access and load-balancing XML (Quin 2001) will be used for

Ngày đăng: 06/08/2014, 13:22

TỪ KHÓA LIÊN QUAN