Wilkinson Praise for the First Edition “…well suited as an in-depth introduction into stochastic chemical simulation, both for self-study or as a course text…” —Biomedical Engineering On
Trang 1SECOND EDITION
Darren J Wilkinson
Praise for the First Edition
“…well suited as an in-depth introduction into stochastic chemical simulation,
both for self-study or as a course text…”
—Biomedical Engineering Online, December 2006
Since the first edition of Stochastic Modelling for Systems Biology, there have
been many interesting developments in the use of “likelihood-free” methods
of Bayesian inference for complex stochastic models Re-written to reflect this
modern perspective, this second edition covers everything necessary for a good
appreciation of stochastic kinetic modelling of biological networks in the systems
biology context
Keeping with the spirit of the first edition, all of the new theory is presented in a
very informal and intuitive manner, keeping the text as accessible as possible to
the widest possible readership
New in the Second Edition
• All examples have been updated to Systems Biology Markup Language
Level 3
• All code relating to simulation, analysis, and inference for stochastic kinetic
models has been rewritten and restructured in a more modular way
• An ancillary website provides links, resources, errata, and up-to-date
information on installation and use of the associated R package
• More background material on the theory of Markov processes and
stochastic differential equations, providing more substance for
mathematically inclined readers
• Discussion of some of the more advanced concepts relating to stochastic
kinetic models, such as random time change representations, Kolmogorov
equations, Fokker–Planck equations and the linear noise approximation
• Simple modelling of “extrinsic” and “intrinsic” noise
An effective introduction to the area of stochastic modelling in computational
systems biology, this new edition adds additional mathematical detail and
computational methods which will provide a stronger foundation for the
development of more advanced courses in stochastic biological modelling
Bioinformatics
Second Edition
Trang 2Stochastic Modelling for Systems Biology
SECOND EDITION
Trang 3CHAPMAN & HALL/CRC
Mathematical and Computational Biology Series
Aims and scope:
This series aims to capture new developments and summarize what is known
over the entire spectrum of mathematical and computational biology and
medicine It seeks to encourage the integration of mathematical, statistical,
and computational methods into biology by publishing a broad range of
textbooks, reference works, and handbooks The titles included in the
series are meant to appeal to students, researchers, and professionals in the
mathematical, statistical and computational sciences, fundamental biology
and bioengineering, as well as interdisciplinary researchers involved in the
field The inclusion of concrete examples and applications, and programming
techniques and examples, is highly encouraged
School of Computer Science
Tel Aviv University
Maria Victoria Schneider
European Bioinformatics Institute
Mona Singh
Department of Computer Science
Princeton University
Anna Tramontano
Department of Biochemical Sciences
University of Rome La Sapienza
Proposals for the series should be submitted to one of the series editors above or directly to:
CRC Press, Taylor & Francis Group
4th, Floor, Albert House
1-4 Singer Street
London EC2A 4BQ
UK
Trang 4Ehud Lamm and Ron Unger
Biological Sequence Analysis Using
the SeqAn C++ Library
Andreas Gogol-Döring and Knut Reinert
Cancer Modelling and Simulation
Luigi Preziosi
Cancer Systems Biology
Edwin Wang
Cell Mechanics: From Single
Scale-Based Models to Multiscale Modeling
Arnaud Chauvière, Luigi Preziosi,
and Claude Verdier
Clustering in Bioinformatics and Drug
Discovery
John D MacCuish and Norah E MacCuish
Combinatorial Pattern Matching
Algorithms in Computational Biology
Using Perl and R
Differential Equations and Mathematical
Biology, Second Edition
D.S Jones, M.J Plank, and B.D Sleeman
Dynamics of Biological Systems
Sergei V Petrovskii and Bai-Lian Li
Gene Expression Studies Using Affymetrix Microarrays
Hinrich Göhlmann and Willem Talloen
Glycome Informatics: Methods and Applications
Golan Yona
Introduction to Proteins: Structure, Function, and Motion
Amit Kessel and Nir Ben-Tal
An Introduction to Systems Biology:
Design Principles of Biological Circuits
Uri Alon
Kinetic Modelling in Systems Biology
Oleg Demin and Igor Goryanin
Knowledge Discovery in Proteomics
Igor Jurisica and Dennis Wigle
Meta-analysis and Combining Information in Genetics and Genomics
Rudy Guerra and Darlene R Goldstein
Methods in Medical Informatics:
Fundamentals of Healthcare Programming in Perl, Python, and Ruby
Trang 5Normal Mode Analysis: Theory and
Applications to Biological and Chemical
Systems
Qiang Cui and Ivet Bahar
Optimal Control Applied to Biological
Models
Suzanne Lenhart and John T Workman
Pattern Discovery in Bioinformatics:
Theory & Algorithms
Spatiotemporal Patterns in Ecology
and Epidemiology: Theory, Models,
Trang 6Stochastic Modelling
for Systems Biology
Darren J Wilkinson SECOND EDITION
Trang 7CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2012 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S Government works
Version Date: 2011926
International Standard Book Number-13: 978-1-4398-3776-4 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
transmit-For permission to photocopy or use material electronically from this work, please access www.copyright com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Trang 8vii
Trang 9viii CONTENTS
Trang 10CONTENTS ix
Trang 1110.4 Diffusion approximations for inference 288
x
Trang 12List of tables
xi
Trang 13This page intentionally left blank
Trang 14List of figures
with the continuous deterministic solution for four different (λ, µ)
resampling procedure, showing good agreement with the theoretical
xiii
Trang 15xiv LIST OF FIGURES
continuous time Markov chain with transition rate matrix Q and
5.10 A single realisation of the diffusion approximation to the
5.11 R code for simulating the diffusion approximation to the
6.10 An R function to discretise the output of gillespie onto a regular
6.11 An R function to implement the Gillespie algorithm for a SPN,
Trang 16LIST OF FIGURES xv6.12 An R function which accepts as input an SPN, and returns as output
a function (closure) for advancing the state of the SPN using the
6.13 An R function to simulate a process on a regular time grid using a
6.14 R code showing how to use the functions StepGillespie and
dimerisa-tion kinetics model Right: A simulated realisadimerisa-tion of the discrete
the dimerisation kinetics model plotted on a concentration scale
(point-wise) “confidence bounds” based on 1,000 runs of the
simulator Right: Density histogram of the simulated realisations of
P at time t = 10 based on 10,000 runs, giving an estimate of the
Menten kinetics model Right: Simulated continuous deterministicdynamics of the Michaelis–Menten kinetics model based on the
of the discrete stochastic dynamics of the reduced-dimension
7.11 SBML-shorthand for the reduced dimension Michaelis–Menten
7.12 Left: A simulated realisation of the discrete stochastic dynamics
of the prokaryotic genetic auto-regulatory network model, for a
period of 5,000 seconds Right: A close-up on the first period of 250
Trang 17xvi LIST OF FIGURES7.13 Left: Close-up showing the time-evolution of the number of
on 10,000 runs Right: Empirical PMF for the prior predictive
7.15 SBML-shorthand for the lac-operon model (discrete stochastic
7.16 A simulated realisation of the discrete stochastic dynamics of the
stochastic Petri net representation of a coupled chemical reaction
stochastic Petri net representation of a coupled chemical reaction
immigration-death process discussed in Section 8.3.4 incorporating different
10.1 An R function to create a function closure for marginal likelihood
10.2 An R session showing how to use the function pfMLLik from
Trang 1810.3 Simulated time series data set, LVnoise10, consisting of 16
equally spaced observations of a realisation of a stochastic kineticLotka–Volterra model subject to Gaussian measurement error with a
10.4 R code implementing an MCMC sampler for fully Bayesian
inference for the stochastic Lotka–Volterra model using time course
10.5 Marginal posterior distributions for the parameters of the Lotka–
10.6 Marginal posterior distributions for the parameters of the Lotka–
10.7 Marginal posterior distributions for the parameters of the Lotka–
10.8 Marginal posterior distributions for the log-parameters of the Lotka–
xvii
Trang 19This page intentionally left blank
Trang 20in 2007 Professor Wilkinson is interested in computational statistics and Bayesianinference and in the application of modern statistical technology to problems in sta-tistical bioinformatics and systems biology He is involved in a variety of systemsbiology projects at Newcastle, including the Centre for Integrated Systems Biology
of Ageing and Nutrition (CISBAN) He currently holds a BBSRC Research ment Fellowship on integrative modelling of stochasticity, noise, heterogeneity andmeasurement error in the study of model biological systems
Develop-xix
Trang 21This page intentionally left blank
Trang 22me of a BBSRC Research Development Fellowship (grant number BBF0235451).
In addition, a considerable amount of work on this second edition was carried outduring a visit I made to the Statistical and Applied Mathematical Sciences Institute(SAMSI, www.samsi.info) in North Carolina during the spring of 2011, as part
of their research programme on the Analysis of Object-Oriented Data
Particular thanks are also due to all of the students who have been involved in theMSc in bioinformatics and computational systems biology programme at Newcastle,and especially those who took my course on Stochastic Systems Biology, as it wasthe teaching of that course which persuaded me that it was necessary to write thisbook
Last, but by no means least, I would like to thank my family for supporting me ineverything that I do
xxi
Trang 23This page intentionally left blank
Trang 24Preface to the second edition
I was keen to write a second edition of this book even before the first edition was lished in the spring of 2006 The first edition was written during the latter half of 2004and the first half of 2005 when the use of stochastic modelling within computationalsystems biology was still very much in its infancy Based on an inter-disciplinaryMasters course I was teaching I saw an urgent need for an introductory textbook inthis area, and tried in the first edition to lay down all of the key ingredients needed
pub-to get started I think that I largely succeeded, but the emphasis there was very much
on the “bare essentials” and accessibility to non-mathematical readers, and my goalwas to get the book published in a timely fashion, in order to help advance the field
I would like to think that the first edition of this text has played a small role in ing to make stochastic modelling a much more mainstream part of computationalsystems biology today But naturally there were many limitations of the first edi-tion There were several places where I would have liked to have elaborated further,providing additional details likely to be of interest to the more mathematically or sta-tistically inclined reader Also, the latter chapters on inference from data were ratherlimited and lacking in concrete examples This was partly due to the fact that thewhole area of inference for stochastic kinetic models was just developing, and so
help-it wasn’t possible to give a coherent overview of the problem from an introductoryviewpoint Since publishing the first edition there have been many interesting devel-opments in the use of “likelihood-free” methods of Bayesian inference for complexstochastic models, and so the latter chapters have now been re-written to reflect thismore modern perspective, including a detailed case study accompanied by workingcode examples
Of course the whole field has moved on considerably since 2005, and so the ond edition is also an opportunity to revise and update, and to change the emphasis
sec-of the text slightly The Systems Biology Markup Language (SBML) has ued to evolve, and SBML Level 3 is now finalised Consequently, I have updatedall of the examples to Level 3, which is likely to remain the standard encoding fordynamic biological models for the foreseeable future I have also taken the oppor-tunity to revise and update the R code examples associated with the book, and tobundle them all together as an R package (smfsb) This should make it much easierfor people to try out the examples given in the book I have also re-written and re-structured all of the code relating to simulation, analysis and inference for stochastickinetic models The code is now structured in a more modular way (using a functionalprogramming style), making it easy to “bolt together” different models, simulationalgorithms, and analysis tools I’ve created a new website specific to this second
xxiii
Trang 25xxiv PREFACE TO THE SECOND EDITION
I will keep links, resources, an errata, and up-to-date information on installation anduse of the associated R package
The new edition contains more background material on the theory of Markov cesses and stochastic differential equations, providing more substance for mathemati-cally inclined readers This allows discussion of some of the more advanced conceptsrelating to stochastic kinetic models, such as random time-change representations,Kolmogorov equations, Fokker–Planck equations and the linear noise approxima-tion It also enables simple modelling of “extrinsic” in addition to “intrinsic” noise.This should make the text suitable for use in a greater range of courses Naturally, inkeeping with the spirit of the first edition, all of the new theory is presented in a veryinformal and intuitive way, in order to keep the text accessible to the widest possiblereadership This is not a rigorous text on the theory of Markov processes (there areplenty of other good texts in that vein) — the book is still intended for use in coursesfor students with a life sciences background
pro-I’ve also updated the references, and provided new pointers to recent publications
in the literature where this is especially pertinent However, it should be emphasisedthat the book is not intended to provide a comprehensive survey of the stochasticsystems biology literature — I don’t think that is necessary (or even helpful) for anintroductory textbook, and I hope that people working in this area accept this if I fail
to cite their work
So here it is, the second edition, completed at last I hope that this text continues toserve as an effective introduction to the area of stochastic modelling in computationalsystems biology, and that this new edition adds additional mathematical detail andcomputational methods which will provide a stronger foundation for the development
of more advanced courses in stochastic biological modelling
Darren Wilkinson
Newcastle upon Tyne
Trang 26Preface to the first edition
Stochastic models for chemical and biochemical reactions have been around for along time The standard algorithm for simulating the dynamics of such processes
on a computer (the “Gillespie algorithm”) was published nearly 30 years ago (andmost of the relevant theory was sorted out long before that) In the meantime therehave been dozens of papers published on stochastic kinetics, and several books onstochastic processes in physics, chemistry and biology Biological modelling andbiochemical kinetic modelling have been around even longer These distinct subjectshave started to merge in recent years as technology has begun to give real insightinto intra-cellular processes Improvements in experimental technology are enablingquantitative real-time imaging of expression at the single-cell level, and improve-ment in computing technology is allowing modelling and stochastic simulation ofsuch systems at levels of detail previously impossible The message that keeps be-ing repeated is that the kinetics of biological processes at the intra-cellular level arestochastic, and that cellular function cannot be properly understood without build-
ing that stochasticity into in silico models It was this message that first interested
me in systems biology and in the many challenging statistical problems that follownaturally from this observation
It was only when I came to try and teach this interesting view of computationalsystems biology to graduate students that I realised there was no satisfactory text onwhich to base the course The papers assumed far too much background knowledge,the standard biological texts didn’t cover stochastic modelling, and the stochasticprocesses texts were too far removed from the practical applications of systems biol-ogy, paying little attention to stochastic biochemical kinetics Where stochastic mod-els do crop up in the mainstream systems biology literature, they tend to be treated as
an add-on or after-thought, in a slightly superficial way As a statistician I see this asproblematic The stochastic processes formalism provides a beautiful, elegant, andcoherent foundation for chemical kinetics, and there is a wealth of associated theoryevery bit as powerful and elegant as that for conventional continuous deterministicmodels Given the increasing importance of stochastic models in systems biology, Ithought it would be particularly appropriate to write an introductory text in this areafrom this perspective
This book assumes a basic familiarity with what might be termed high schoolmathematics That is, a basic familiarity with algebra and calculus It is also helpful
to have had some exposure to linear algebra and matrix theory, but not vital Sincethe teaching of probability and statistics at school is fairly patchy, essentially noth-ing will be assumed, though obviously a good background in this area will be veryhelpful Starting from here, the book covers everything that is necessary for a good
xxv
Trang 27xxvi PREFACE TO THE FIRST EDITIONappreciation of stochastic kinetic modelling of biological networks in the systemsbiology context There is an emphasis on the necessary probabilistic and stochasticmethods, but the theory is rooted in the intended application, and no time is wastedcovering interesting theory that is not necessary for stochastic kinetic modelling Onthe other hand, more-or-less everything that is necessary is covered, and the text (atleast up to Chapter 8) is intended to be self-contained The final chapters are nec-essarily a little more technical in nature, as they concern the difficult problem ofinference for stochastic kinetic models from experimental data This is still an activeresearch area, and so the main aim here is to give pointers to the existing literatureand provide enough background information to render that literature more accessible
to the non-specialist
The decision to make the book practically oriented necessitated some cal choices that will not suit everyone The two key technologies chosen for illustrat-ing the theory in this book are SBML and R I hope that the choice of the SystemsBiology Markup Language (SBML) for model representation is not too controversial
technologi-It is the closest thing to a standard that exists in the systems biology area, and thereare dozens of software tools that support it Of course, most people using SBML areusing it to encode continuous deterministic models However, SBML Level 2 andbeyond are perfectly capable of encoding discrete stochastic models, and so one ofthe reasons for using it in this text is to provide some working examples of SBMLmodels constructed with discrete stochastic simulation in mind
The other technological choice was the use of the statistical programming guage, R This is likely to be more controversial, as there are plenty of other lan-guages that could have been used It seemed to me that in the context of a textbook,using a very high-level language was most appropriate In the context of stochasticmodelling, a language with good built-in mathematical and statistical support alsoseemed highly desirable R stood out as being the best choice in terms of built-in lan-guage support for stochastic simulation and statistical analysis It also has the greatadvantage over some of the other possible choices of being completely free open-source software, and therefore available to anyone reading the book In addition, R
lan-is being used increasingly in bioinformatics and other areas of systems biology, sohopefully for this reason too it will be regarded as a positive choice
This book is intended for a variety of audiences (advanced undergraduates, uate students, postdocs, and academics from a variety of backgrounds), and exactlyhow it is read will depend on the context The book is certainly suitable for a va-riety of graduate programs in computational biology I will be using it as a text for
grad-a second-semester course on grad-a mgrad-asters in bioinformgrad-atics progrgrad-amme thgrad-at will covermuch of Chapters 1, 2, 4, 5, 6, and 7 (most of the material from Chapter 3 will becovered in a first-semester course) However, the book has also been written withself-study in mind, and here it is intended that the entire book be read in sequence,with some chapters skipped depending on background knowledge It is intended to besuitable for computational systems biologists from a continuous deterministic back-ground who would like to know more about the stochastic approach, as well as forstatisticians who are interested in learning more about systems biology (though statis-ticians will probably want to skip most of Chapters 3, 4, and 5) It is worth pointing
Trang 28PREFACE TO THE FIRST EDITION xxviiout that Chapters 9 and 10 will be easier to read with a reasonable background inprobability and statistics Though it should be possible to read these chapters with-out such a background and still appreciate the key concepts and ideas, some of thetechnical details may be difficult to understand fully from an elementary viewpoint.Writing this book has been more effort than I anticipated, and there were one
or two moments when I doubted the wisdom of taking the project on On the wholehowever, it has been an interesting and rewarding experience for me, and I am pleased
with the result I know it is a book that I will find useful and will refer to often,
as it integrates a fairly diverse literature into a single convenient and notationallyconsistent source I can only hope that others share the same view, and that the bookwill help make a stochastic approach to computational systems biology more widelyappreciated
Darren Wilkinson
Newcastle upon Tyne, October 2005
Trang 29This page intentionally left blank
Trang 30PART I
Modelling and networks
Trang 31This page intentionally left blank
Trang 32ele-The first issue to confront when embarking on a modelling project is to decide on
exactly which features to include in the model, and in particular, the level of detail
the model is intended to capture So, a model of an entire organism is unlikely todescribe the detailed functioning of every individual cell, but a model of a cell islikely to include a variety of very detailed descriptions of key cellular processes.Even then, however, a model of a cell is unlikely to contain details of every singlegene and protein
Fortunately, biologists are used to thinking about processes at different scales anddifferent levels of detail Consider, for example, the process of photosynthesis Whenstudying photosynthesis for the first time at school, it is typically summarised by asingle chemical reaction mixing water with carbon dioxide to get glucose and oxygen(catalysed by sunlight) This could be written very simply as
or more formally by replacing the molecules by their chemical formulae and ing to get
Of course, further study reveals that photosynthesis consists of many reactions, andthat the single reaction was simply a summary of the overall effect of the process
However, it is important to understand that the above equation is not really wrong, it
just represents the overall process at a higher level than the more detailed descriptionthat biologists often prefer to work with Whether a single overall equation or a fullbreakdown into component reactions is necessary depends on whether intermediariessuch as ADP and ATP are elements of interest to the modeller Indeed, really accurate
3
Trang 334 INTRODUCTION TO BIOLOGICAL MODELLINGmodelling of the process would require a model far more detailed and complex thanmost biologists would be comfortable with, using molecular dynamic simulationsthat explicitly manage the position and momentum of every molecule in the system.The “art” of building a good model is to capture the essential features of the biol-ogy without burdening the model with non-essential details Every model is to someextent a simplification of the biology, but models are valuable because they take ideasthat might have been expressed verbally or diagrammatically and make them more
explicit, so that they can begin to be understood in a quantitative rather than purely
1.2 Aims of modelling
The features of a model depend very much on the aims of the modelling exercise Wetherefore need to consider why people model and what they hope to achieve by sodoing Often the most basic aim is to make clear the current state of knowledge re-garding a particular system, by attempting to be precise about the elements involvedand the interactions between them Doing this can be a particularly effective way ofhighlighting gaps in understanding In addition, having a detailed model of a system
allows people to test that their understanding of a system is correct, by seeing if the
implications of their models are consistent with observed experimental data In tice, this model validation stage is central to the systems biology approach However,this work will often represent only the initial stage of the modelling process Once
prac-people have a model they are happy with, they often want to use their models
or impossible to do in the lab Such experiments may uncover important indirect lationships between model components that would be hard to predict otherwise Anadditional goal of modern biological modelling is to pool a number of small models
re-of well-understood mechanisms into a large model in order to investigate the effect
of interactions between the model components Models can also be extremely usefulfor informing the design and analysis of complex biological experiments
In summary, modelling and computer simulation are becoming increasingly portant in post-genomic biology for integrating knowledge and experimental dataand making testable predictions about the behaviour of complex biological systems
im-1.3 Why is stochastic modelling necessary?
Ignoring quantum mechanical effects, current scientific wisdom views biologicalsystems as essentially deterministic in character, with dynamics entirely predictablegiven sufficient knowledge of the state of the system (together with complete knowl-edge of the physics and chemistry of interacting biomolecules) At first this perhapssuggests that a deterministic approach to the modelling of biological systems is likely
to be successful However, despite the rapid advancements in computing technology,
we are still a very long way away from a situation where we might expect to be able
to model biological systems of realistic size and complexity over interesting timescales using such a molecular dynamic approach We must therefore use models that
Trang 34WHY IS STOCHASTIC MODELLING NECESSARY? 5leave out many details of the “state” of a system (such as the position, orientation, andmomentum of every single molecule under consideration), in favour of a higher-levelview Viewed at this higher level, the dynamics of the system are not deterministic,but intrinsically stochastic, and consideration of statistical physics is necessary to un-cover the precise nature of the stochastic processes governing the system dynamics.
A more detailed discussion of this issue will have to be deferred until much later inthe book, once the appropriate concepts and terminology have been established Inthe meantime, it is helpful to highlight the issues using a very simple example thatillustrates the importance of stochastic modelling, both for simulation and inference
The example we will consider is known as the linear birth–death process In the
first instance, it is perhaps helpful to view this as a model for the number of bacteria
in a bacterial colony It is assumed that each bacterium in the colony gives rise to new
will define such things much more precisely later Let the number of bacteria in the
description of the system leads directly to the ordinary differential equation
dX(t)
which can be solved analytically to give the complete dynamics of the system as
other things worth noting about this solution In particular, the solution clearly only
λ = 0.5, µ = 0 will lead to exactly the same solution as λ = 1, µ = 0.5) In some
dynamics At first this might sound like a good thing, but it is clear that there is a
flip-side: namely that studying experimental data on bacteria numbers can only provide
data can only provide information about the “shape” of the curve, and the shape of
for network inference, as well as inference for rate constants It is clear that in this
model we cannot know from experimental data if we have a pure birth or deathprocess, or a process involving both births and deaths, as it is not possible to know if
∗ This also illustrates another point that is not widely appreciated — the fact that reliable network
Trang 35in-6 INTRODUCTION TO BIOLOGICAL MODELLING
−0.3
−1
Figure 1.1 Five deterministic solutions of the linear birth–death process for values of λ− µ
given in the legend (x0= 50).
The problem, of course, is that bacteria don’t vary in number continuously anddeterministically They vary discretely and stochastically Using the techniques thatwill be developed later in this book, it is straightforward to understand the stochastic
process associated with this model as a Markov jump process, and to simulate it on a
computer By their very nature, such stochastic processes are random, and each timethey are simulated they will look different In order to understand the behaviour ofsuch processes it is therefore necessary (in general) to study many realisations of theprocess Five realisations are given in Figure 1.2, together with the correspondingdeterministic solution
It is immediately clear that the stochastic realisations exhibit much more esting behaviour and match much better with the kind of experimental data one islikely to encounter They also allow one to ask questions and get answers to issuesthat can’t be addressed using a continuous deterministic model For example, ac-
≃ 6.77 Even leaving aside the fact that this is not an integer, we seefrom the stochastic realisations that there is considerable uncertainty for the value of
X(2), and stochastic simulation allows us to construct, inter alia, a likely range of
the stochastic realisations that these do go extinct, and that there is considerable
ran-ference is necessarily more difficult than rate-parameter inran-ference, as determining the existence of a reaction is equivalent to deciding whether the rate of that reaction is zero.
Trang 36WHY IS STOCHASTIC MODELLING NECESSARY? 7
Figure 1.2 Five realisations of a stochastic linear birth–death process together with the
con-tinuous deterministic solution (x0= 50, λ = 3, µ = 4).
domness associated with the time that this occurs Again, stochastic simulation will
allow us to understand the distribution associated with the time to extinction,
some-thing that simply isn’t possible using a deterministic framework
Another particularly noteworthy feature of the stochastic process representation is
λ + µ controls the degree of “noise” or “volatility” in the system This is a criticallyimportant point to understand — it tells us that if stochastic effects are present inthe system, we cannot properly understand the system dynamics unless we know
as it is not possible to infer the stochastic rate constants using a deterministic model.This has important implications for the use of stochastic models for inference fromexperimental data It suggests that given some data on the variation in colony size
are interested in (as well as inferring network structure, as we could also test to see
inference stage of the process
Although we have here considered a trivial example, the implications are broad
In particular, they apply to the genetic and biochemical network models that much of
Trang 378 INTRODUCTION TO BIOLOGICAL MODELLING
Figure 1.3 Five realisations of a stochastic linear birth–death process together with the
con-tinuous deterministic solution for four different ( λ, µ) combinations, each with λ− µ = −1
andx0= 50.
this book will be concerned with This is because genetic and biochemical networksinvolve the interaction of integer numbers of molecules that react when they collideafter random times, driven by Brownian motion Although it is now becoming in-creasingly accepted that stochastic mechanisms are important in many (if not most)genetic and biochemical networks, routine use of stochastic simulation in order to
because inference methods regularly used in practice work by fitting continuous terministic models to experimental data We have just seen that such methods cannot
in general give us reliable information about all of the parameters important for termining the stochastic dynamics of a system, and so stochastic simulation cannot
de-be done reliably until we have good methods of inference for stochastic models Itturns out that it is possible to formalise the problem of inference for stochastic ki-netic models from time-course experimental data, and this is the subject matter ofthe latter chapters However, it should be pointed out at the outset that inference forstochastic models is an order of magnitude more difficult than inference for deter-ministic models (in terms of the mathematics required, algorithmic complexity, andcomputation time), and is still the subject of a great deal of ongoing research
† But it is a lot more widespread than when the first edition of this book was written.
Trang 38CHEMICAL REACTIONS 9
1.4 Chemical reactions
There are a number of ways one could represent a model of a biological system ologists have traditionally favoured diagrammatic schemas coupled with verbal ex-planations in order to convey qualitative information regarding mechanisms At theother extreme, applied mathematicians traditionally prefer to work with systems ofordinary or partial differential equations (ODEs or PDEs) These have the advantage
Bi-of being more precise and fully quantitative, but also have a number Bi-of tages In some sense differential equation models are too low level a description, asthey not only encode the essential features of the model, but also a wealth of accom-panying baggage associated with a particular interpretation of chemical kinetics that
disadvan-is not always well suited to application in the molecular biology context Somewherebetween these two extremes, the biochemist will tend to view systems as networks ofcoupled chemical reactions, and it appears that most of the best ways of representingbiochemical mechanisms exist at this level of detail, though there are many ways
of representing networks of this type Networks of coupled chemical reactions aresufficiently general that they can be simulated in different ways using different algo-rithms depending on assumptions made about the underlying kinetics On the otherhand, they are sufficiently detailed and precise that once the kinetics have been spec-ified, they can be used directly to construct full dynamic simulations of the systembehaviour on a computer
A general chemical reaction takes the form
known as stoichiometries The stoichiometries are usually (though not always)
as-sumed to be integers, and in this case it is asas-sumed that there is no common factor
of the stoichiometries That is, it is assumed that there is no integer greater than onewhich exactly divides each stoichiometry on both the left and right sides There is
proportions, along with what is produced
This is normally written
‡ Note that a chemical species that occurs on both the left- and right-hand sides with the same
stoichiom-etry is somewhat special, and is sometimes referred to as a modifier Clearly the reaction will have no effect on the amount of this species Such a species is usually included in the reaction because the rate
at which the reaction proceeds depends on the level of this species.
§ The use of the term “species” to refer to a particular type of molecule will be explained later in the chapter.
Trang 3910 INTRODUCTION TO BIOLOGICAL MODELLING
not usually written explicitly Similarly, the reaction for the dissociation of the dimerwould be written
A reaction that can happen in both directions is known as reversible Reversible
re-actions are quite common in biology and tend not to be written as two separate tions They can be written with a double-headed arrow such as
If one direction predominates over the other, this is sometimes emphasised in thenotation So if the above protein prefers the dimerised state, this may be writtensomething like
2P ⇀
It is important to remember that the notation for a reversible reaction is simply aconvenient shorthand for the two separate reaction processes taking place In the
context of the discrete stochastic models to be studied in this book, it will not usually
be acceptable to replace the two separate reactions by a single reaction proceeding
at some kind of overall combined rate.
1.5 Modelling genetic and biochemical networks
Before moving on to look at different ways of representing and working with tems of coupled chemical reactions in the next chapter, it will be helpful to end thischapter by looking in detail at some basic biochemical mechanisms and how theiressential features can be captured with fairly simple systems of coupled chemicalreactions Although biological modelling can be applied to biological systems at avariety of different scales, it turns out that stochastic effects are particularly impor-tant and prevalent at the scale of genetic and biochemical networks, and these willtherefore provide the main body of examples for this book
sys-1.5.1 Transcription (prokaryotes)
Transcription is a key cellular process, and control of transcription is a tal regulation mechanism As a result, virtually any model of genetic regulation islikely to require some modelling of the transcription process This process is muchsimpler in prokaryotic organisms, so it will be helpful to consider this in the firstinstance Here, typically, a promoter region exists just upstream of the gene of inter-est RNA-polymerase (RNAP) is able to bind to this promoter region and initiate thetranscription process, which ultimately results in the production of an mRNA tran-script and the release of RNAP back into the cell The transcription process itself iscomplex, but whether it will be necessary to model this explicitly will depend verymuch on the modelling goals If the modeller is primarily interested in control andthe downstream effects of the transcription process, it may not be necessary to modeltranscription itself in detail
Trang 40fundamen-MODELLING GENETIC AND BIOCHEMICAL NETWORKS 11
Figure 1.4 Transcription of a single prokaryotic gene.
simple representation of this process as a system of coupled chemical reactions can
chemical species within the cell, but we have chosen here not to model at such afine level of detail One detail not included here that may be worth considering isthe reversible nature of the binding of RNAP to the promoter region It is also worthnoting that these two reactions form a simple linear chain, whereby the product ofthe first reaction is the reactant for the second Indeed, we could write the pair ofreactions as
It is therefore tempting to summarise this chain of reactions by the single reaction
and this is indeed possible, but is likely to be inadequate for any model of regulation
model for competitive binding of RNAP and a repressor in the promoter region
If modelling the production of the entire RNA molecule in a single step is felt to be
an oversimplification, it is relatively straightforward to model the explicit elongation
of the molecule As a first attempt, consider the following model for the transcription