Stochastic modelling for systems biology, second edition (1)

Wilkinson Praise for the First Edition “…well suited as an in-depth introduction into stochastic chemical simulation, both for self-study or as a course text…” —Biomedical Engineering On

Trang 1

SECOND EDITION

Darren J Wilkinson

Praise for the First Edition

“…well suited as an in-depth introduction into stochastic chemical simulation,

both for self-study or as a course text…”

—Biomedical Engineering Online, December 2006

Since the first edition of Stochastic Modelling for Systems Biology, there have

been many interesting developments in the use of “likelihood-free” methods

of Bayesian inference for complex stochastic models Re-written to reflect this

modern perspective, this second edition covers everything necessary for a good

appreciation of stochastic kinetic modelling of biological networks in the systems

biology context

Keeping with the spirit of the first edition, all of the new theory is presented in a

very informal and intuitive manner, keeping the text as accessible as possible to

the widest possible readership

New in the Second Edition

• All examples have been updated to Systems Biology Markup Language

Level 3

• All code relating to simulation, analysis, and inference for stochastic kinetic

models has been rewritten and restructured in a more modular way

• An ancillary website provides links, resources, errata, and up-to-date

information on installation and use of the associated R package

• More background material on the theory of Markov processes and

stochastic differential equations, providing more substance for

mathematically inclined readers

• Discussion of some of the more advanced concepts relating to stochastic

kinetic models, such as random time change representations, Kolmogorov

equations, Fokker–Planck equations and the linear noise approximation

• Simple modelling of “extrinsic” and “intrinsic” noise

An effective introduction to the area of stochastic modelling in computational

systems biology, this new edition adds additional mathematical detail and

computational methods which will provide a stronger foundation for the

development of more advanced courses in stochastic biological modelling

Bioinformatics

Second Edition

Trang 2

Stochastic Modelling for Systems Biology

SECOND EDITION

Trang 3

CHAPMAN & HALL/CRC

Mathematical and Computational Biology Series

Aims and scope:

This series aims to capture new developments and summarize what is known

over the entire spectrum of mathematical and computational biology and

medicine It seeks to encourage the integration of mathematical, statistical,

and computational methods into biology by publishing a broad range of

textbooks, reference works, and handbooks The titles included in the

series are meant to appeal to students, researchers, and professionals in the

mathematical, statistical and computational sciences, fundamental biology

and bioengineering, as well as interdisciplinary researchers involved in the

field The inclusion of concrete examples and applications, and programming

techniques and examples, is highly encouraged

School of Computer Science

Tel Aviv University

Maria Victoria Schneider

European Bioinformatics Institute

Mona Singh

Department of Computer Science

Princeton University

Anna Tramontano

Department of Biochemical Sciences

University of Rome La Sapienza

Proposals for the series should be submitted to one of the series editors above or directly to:

CRC Press, Taylor & Francis Group

4th, Floor, Albert House

1-4 Singer Street

London EC2A 4BQ

UK

Trang 4

Ehud Lamm and Ron Unger

Biological Sequence Analysis Using

the SeqAn C++ Library

Andreas Gogol-Döring and Knut Reinert

Cancer Modelling and Simulation

Luigi Preziosi

Cancer Systems Biology

Edwin Wang

Cell Mechanics: From Single

Scale-Based Models to Multiscale Modeling

Arnaud Chauvière, Luigi Preziosi,

and Claude Verdier

Clustering in Bioinformatics and Drug

Discovery

John D MacCuish and Norah E MacCuish

Combinatorial Pattern Matching

Algorithms in Computational Biology

Using Perl and R

Differential Equations and Mathematical

Biology, Second Edition

D.S Jones, M.J Plank, and B.D Sleeman

Dynamics of Biological Systems

Sergei V Petrovskii and Bai-Lian Li

Gene Expression Studies Using Affymetrix Microarrays

Hinrich Göhlmann and Willem Talloen

Glycome Informatics: Methods and Applications

Golan Yona

Introduction to Proteins: Structure, Function, and Motion

Amit Kessel and Nir Ben-Tal

An Introduction to Systems Biology:

Design Principles of Biological Circuits

Uri Alon

Kinetic Modelling in Systems Biology

Oleg Demin and Igor Goryanin

Knowledge Discovery in Proteomics

Igor Jurisica and Dennis Wigle

Meta-analysis and Combining Information in Genetics and Genomics

Rudy Guerra and Darlene R Goldstein

Methods in Medical Informatics:

Fundamentals of Healthcare Programming in Perl, Python, and Ruby

Trang 5

Normal Mode Analysis: Theory and

Applications to Biological and Chemical

Systems

Qiang Cui and Ivet Bahar

Optimal Control Applied to Biological

Models

Suzanne Lenhart and John T Workman

Pattern Discovery in Bioinformatics:

Theory & Algorithms

Spatiotemporal Patterns in Ecology

and Epidemiology: Theory, Models,

Trang 6

Stochastic Modelling

for Systems Biology

Darren J Wilkinson SECOND EDITION

Trang 7

CRC Press

Taylor & Francis Group

6000 Broken Sound Parkway NW, Suite 300

Boca Raton, FL 33487-2742

CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S Government works

Version Date: 2011926

International Standard Book Number-13: 978-1-4398-3776-4 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S Copyright Law, no part of this book may be reprinted, reproduced, ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

transmit-For permission to photocopy or use material electronically from this work, please access www.copyright com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400 CCC is a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC,

a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used

only for identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site at

http://www.taylorandfrancis.com

and the CRC Press Web site at

http://www.crcpress.com

Trang 8

vii

Trang 9

viii CONTENTS

Trang 10

CONTENTS ix

Trang 11

10.4 Diffusion approximations for inference 288

x

Trang 12

List of tables

xi

Trang 13

This page intentionally left blank

Trang 14

List of figures

with the continuous deterministic solution for four different (λ, µ)

resampling procedure, showing good agreement with the theoretical

xiii

Trang 15

xiv LIST OF FIGURES

continuous time Markov chain with transition rate matrix Q and

5.10 A single realisation of the diffusion approximation to the

5.11 R code for simulating the diffusion approximation to the

6.10 An R function to discretise the output of gillespie onto a regular

6.11 An R function to implement the Gillespie algorithm for a SPN,

Trang 16

LIST OF FIGURES xv6.12 An R function which accepts as input an SPN, and returns as output

a function (closure) for advancing the state of the SPN using the

6.13 An R function to simulate a process on a regular time grid using a

6.14 R code showing how to use the functions StepGillespie and

dimerisa-tion kinetics model Right: A simulated realisadimerisa-tion of the discrete

the dimerisation kinetics model plotted on a concentration scale

(point-wise) “confidence bounds” based on 1,000 runs of the

simulator Right: Density histogram of the simulated realisations of

P at time t = 10 based on 10,000 runs, giving an estimate of the

Menten kinetics model Right: Simulated continuous deterministicdynamics of the Michaelis–Menten kinetics model based on the

of the discrete stochastic dynamics of the reduced-dimension

7.11 SBML-shorthand for the reduced dimension Michaelis–Menten

7.12 Left: A simulated realisation of the discrete stochastic dynamics

of the prokaryotic genetic auto-regulatory network model, for a

period of 5,000 seconds Right: A close-up on the first period of 250

Trang 17

xvi LIST OF FIGURES7.13 Left: Close-up showing the time-evolution of the number of

on 10,000 runs Right: Empirical PMF for the prior predictive

7.15 SBML-shorthand for the lac-operon model (discrete stochastic

7.16 A simulated realisation of the discrete stochastic dynamics of the

stochastic Petri net representation of a coupled chemical reaction

immigration-death process discussed in Section 8.3.4 incorporating different

10.1 An R function to create a function closure for marginal likelihood

10.2 An R session showing how to use the function pfMLLik from

Trang 18

10.3 Simulated time series data set, LVnoise10, consisting of 16

equally spaced observations of a realisation of a stochastic kineticLotka–Volterra model subject to Gaussian measurement error with a

10.4 R code implementing an MCMC sampler for fully Bayesian

inference for the stochastic Lotka–Volterra model using time course

10.5 Marginal posterior distributions for the parameters of the Lotka–

10.8 Marginal posterior distributions for the log-parameters of the Lotka–

xvii

Trang 19

Trang 20

in 2007 Professor Wilkinson is interested in computational statistics and Bayesianinference and in the application of modern statistical technology to problems in sta-tistical bioinformatics and systems biology He is involved in a variety of systemsbiology projects at Newcastle, including the Centre for Integrated Systems Biology

of Ageing and Nutrition (CISBAN) He currently holds a BBSRC Research ment Fellowship on integrative modelling of stochasticity, noise, heterogeneity andmeasurement error in the study of model biological systems

Develop-xix

Trang 21

Trang 22

me of a BBSRC Research Development Fellowship (grant number BBF0235451).

In addition, a considerable amount of work on this second edition was carried outduring a visit I made to the Statistical and Applied Mathematical Sciences Institute(SAMSI, www.samsi.info) in North Carolina during the spring of 2011, as part

of their research programme on the Analysis of Object-Oriented Data

Particular thanks are also due to all of the students who have been involved in theMSc in bioinformatics and computational systems biology programme at Newcastle,and especially those who took my course on Stochastic Systems Biology, as it wasthe teaching of that course which persuaded me that it was necessary to write thisbook

Last, but by no means least, I would like to thank my family for supporting me ineverything that I do

xxi

Trang 23

Trang 24

Preface to the second edition

I was keen to write a second edition of this book even before the first edition was lished in the spring of 2006 The first edition was written during the latter half of 2004and the first half of 2005 when the use of stochastic modelling within computationalsystems biology was still very much in its infancy Based on an inter-disciplinaryMasters course I was teaching I saw an urgent need for an introductory textbook inthis area, and tried in the first edition to lay down all of the key ingredients needed

pub-to get started I think that I largely succeeded, but the emphasis there was very much

on the “bare essentials” and accessibility to non-mathematical readers, and my goalwas to get the book published in a timely fashion, in order to help advance the field

I would like to think that the first edition of this text has played a small role in ing to make stochastic modelling a much more mainstream part of computationalsystems biology today But naturally there were many limitations of the first edi-tion There were several places where I would have liked to have elaborated further,providing additional details likely to be of interest to the more mathematically or sta-tistically inclined reader Also, the latter chapters on inference from data were ratherlimited and lacking in concrete examples This was partly due to the fact that thewhole area of inference for stochastic kinetic models was just developing, and so

help-it wasn’t possible to give a coherent overview of the problem from an introductoryviewpoint Since publishing the first edition there have been many interesting devel-opments in the use of “likelihood-free” methods of Bayesian inference for complexstochastic models, and so the latter chapters have now been re-written to reflect thismore modern perspective, including a detailed case study accompanied by workingcode examples

Of course the whole field has moved on considerably since 2005, and so the ond edition is also an opportunity to revise and update, and to change the emphasis

sec-of the text slightly The Systems Biology Markup Language (SBML) has ued to evolve, and SBML Level 3 is now finalised Consequently, I have updatedall of the examples to Level 3, which is likely to remain the standard encoding fordynamic biological models for the foreseeable future I have also taken the oppor-tunity to revise and update the R code examples associated with the book, and tobundle them all together as an R package (smfsb) This should make it much easierfor people to try out the examples given in the book I have also re-written and re-structured all of the code relating to simulation, analysis and inference for stochastickinetic models The code is now structured in a more modular way (using a functionalprogramming style), making it easy to “bolt together” different models, simulationalgorithms, and analysis tools I’ve created a new website specific to this second

xxiii

Trang 25

xxiv PREFACE TO THE SECOND EDITION

I will keep links, resources, an errata, and up-to-date information on installation anduse of the associated R package

The new edition contains more background material on the theory of Markov cesses and stochastic differential equations, providing more substance for mathemati-cally inclined readers This allows discussion of some of the more advanced conceptsrelating to stochastic kinetic models, such as random time-change representations,Kolmogorov equations, Fokker–Planck equations and the linear noise approxima-tion It also enables simple modelling of “extrinsic” in addition to “intrinsic” noise.This should make the text suitable for use in a greater range of courses Naturally, inkeeping with the spirit of the first edition, all of the new theory is presented in a veryinformal and intuitive way, in order to keep the text accessible to the widest possiblereadership This is not a rigorous text on the theory of Markov processes (there areplenty of other good texts in that vein) — the book is still intended for use in coursesfor students with a life sciences background

pro-I’ve also updated the references, and provided new pointers to recent publications

in the literature where this is especially pertinent However, it should be emphasisedthat the book is not intended to provide a comprehensive survey of the stochasticsystems biology literature — I don’t think that is necessary (or even helpful) for anintroductory textbook, and I hope that people working in this area accept this if I fail

to cite their work

So here it is, the second edition, completed at last I hope that this text continues toserve as an effective introduction to the area of stochastic modelling in computationalsystems biology, and that this new edition adds additional mathematical detail andcomputational methods which will provide a stronger foundation for the development

of more advanced courses in stochastic biological modelling

Darren Wilkinson

Newcastle upon Tyne

Trang 26

Preface to the first edition

Stochastic models for chemical and biochemical reactions have been around for along time The standard algorithm for simulating the dynamics of such processes

on a computer (the “Gillespie algorithm”) was published nearly 30 years ago (andmost of the relevant theory was sorted out long before that) In the meantime therehave been dozens of papers published on stochastic kinetics, and several books onstochastic processes in physics, chemistry and biology Biological modelling andbiochemical kinetic modelling have been around even longer These distinct subjectshave started to merge in recent years as technology has begun to give real insightinto intra-cellular processes Improvements in experimental technology are enablingquantitative real-time imaging of expression at the single-cell level, and improve-ment in computing technology is allowing modelling and stochastic simulation ofsuch systems at levels of detail previously impossible The message that keeps be-ing repeated is that the kinetics of biological processes at the intra-cellular level arestochastic, and that cellular function cannot be properly understood without build-

ing that stochasticity into in silico models It was this message that first interested

me in systems biology and in the many challenging statistical problems that follownaturally from this observation

It was only when I came to try and teach this interesting view of computationalsystems biology to graduate students that I realised there was no satisfactory text onwhich to base the course The papers assumed far too much background knowledge,the standard biological texts didn’t cover stochastic modelling, and the stochasticprocesses texts were too far removed from the practical applications of systems biol-ogy, paying little attention to stochastic biochemical kinetics Where stochastic mod-els do crop up in the mainstream systems biology literature, they tend to be treated as

an add-on or after-thought, in a slightly superficial way As a statistician I see this asproblematic The stochastic processes formalism provides a beautiful, elegant, andcoherent foundation for chemical kinetics, and there is a wealth of associated theoryevery bit as powerful and elegant as that for conventional continuous deterministicmodels Given the increasing importance of stochastic models in systems biology, Ithought it would be particularly appropriate to write an introductory text in this areafrom this perspective

This book assumes a basic familiarity with what might be termed high schoolmathematics That is, a basic familiarity with algebra and calculus It is also helpful

to have had some exposure to linear algebra and matrix theory, but not vital Sincethe teaching of probability and statistics at school is fairly patchy, essentially noth-ing will be assumed, though obviously a good background in this area will be veryhelpful Starting from here, the book covers everything that is necessary for a good

xxv

Trang 27

xxvi PREFACE TO THE FIRST EDITIONappreciation of stochastic kinetic modelling of biological networks in the systemsbiology context There is an emphasis on the necessary probabilistic and stochasticmethods, but the theory is rooted in the intended application, and no time is wastedcovering interesting theory that is not necessary for stochastic kinetic modelling Onthe other hand, more-or-less everything that is necessary is covered, and the text (atleast up to Chapter 8) is intended to be self-contained The final chapters are nec-essarily a little more technical in nature, as they concern the difficult problem ofinference for stochastic kinetic models from experimental data This is still an activeresearch area, and so the main aim here is to give pointers to the existing literatureand provide enough background information to render that literature more accessible

to the non-specialist

The decision to make the book practically oriented necessitated some cal choices that will not suit everyone The two key technologies chosen for illustrat-ing the theory in this book are SBML and R I hope that the choice of the SystemsBiology Markup Language (SBML) for model representation is not too controversial

technologi-It is the closest thing to a standard that exists in the systems biology area, and thereare dozens of software tools that support it Of course, most people using SBML areusing it to encode continuous deterministic models However, SBML Level 2 andbeyond are perfectly capable of encoding discrete stochastic models, and so one ofthe reasons for using it in this text is to provide some working examples of SBMLmodels constructed with discrete stochastic simulation in mind

The other technological choice was the use of the statistical programming guage, R This is likely to be more controversial, as there are plenty of other lan-guages that could have been used It seemed to me that in the context of a textbook,using a very high-level language was most appropriate In the context of stochasticmodelling, a language with good built-in mathematical and statistical support alsoseemed highly desirable R stood out as being the best choice in terms of built-in lan-guage support for stochastic simulation and statistical analysis It also has the greatadvantage over some of the other possible choices of being completely free open-source software, and therefore available to anyone reading the book In addition, R

lan-is being used increasingly in bioinformatics and other areas of systems biology, sohopefully for this reason too it will be regarded as a positive choice

This book is intended for a variety of audiences (advanced undergraduates, uate students, postdocs, and academics from a variety of backgrounds), and exactlyhow it is read will depend on the context The book is certainly suitable for a va-riety of graduate programs in computational biology I will be using it as a text for

grad-a second-semester course on grad-a mgrad-asters in bioinformgrad-atics progrgrad-amme thgrad-at will covermuch of Chapters 1, 2, 4, 5, 6, and 7 (most of the material from Chapter 3 will becovered in a first-semester course) However, the book has also been written withself-study in mind, and here it is intended that the entire book be read in sequence,with some chapters skipped depending on background knowledge It is intended to besuitable for computational systems biologists from a continuous deterministic back-ground who would like to know more about the stochastic approach, as well as forstatisticians who are interested in learning more about systems biology (though statis-ticians will probably want to skip most of Chapters 3, 4, and 5) It is worth pointing

Trang 28

PREFACE TO THE FIRST EDITION xxviiout that Chapters 9 and 10 will be easier to read with a reasonable background inprobability and statistics Though it should be possible to read these chapters with-out such a background and still appreciate the key concepts and ideas, some of thetechnical details may be difficult to understand fully from an elementary viewpoint.Writing this book has been more effort than I anticipated, and there were one

or two moments when I doubted the wisdom of taking the project on On the wholehowever, it has been an interesting and rewarding experience for me, and I am pleased

with the result I know it is a book that I will find useful and will refer to often,

as it integrates a fairly diverse literature into a single convenient and notationallyconsistent source I can only hope that others share the same view, and that the bookwill help make a stochastic approach to computational systems biology more widelyappreciated

Darren Wilkinson

Newcastle upon Tyne, October 2005

Trang 29

Trang 30

PART I

Modelling and networks

Trang 31

Trang 32

ele-The first issue to confront when embarking on a modelling project is to decide on

exactly which features to include in the model, and in particular, the level of detail

the model is intended to capture So, a model of an entire organism is unlikely todescribe the detailed functioning of every individual cell, but a model of a cell islikely to include a variety of very detailed descriptions of key cellular processes.Even then, however, a model of a cell is unlikely to contain details of every singlegene and protein

Fortunately, biologists are used to thinking about processes at different scales anddifferent levels of detail Consider, for example, the process of photosynthesis Whenstudying photosynthesis for the first time at school, it is typically summarised by asingle chemical reaction mixing water with carbon dioxide to get glucose and oxygen(catalysed by sunlight) This could be written very simply as

or more formally by replacing the molecules by their chemical formulae and ing to get

Of course, further study reveals that photosynthesis consists of many reactions, andthat the single reaction was simply a summary of the overall effect of the process

However, it is important to understand that the above equation is not really wrong, it

just represents the overall process at a higher level than the more detailed descriptionthat biologists often prefer to work with Whether a single overall equation or a fullbreakdown into component reactions is necessary depends on whether intermediariessuch as ADP and ATP are elements of interest to the modeller Indeed, really accurate

3

Trang 33

4 INTRODUCTION TO BIOLOGICAL MODELLINGmodelling of the process would require a model far more detailed and complex thanmost biologists would be comfortable with, using molecular dynamic simulationsthat explicitly manage the position and momentum of every molecule in the system.The “art” of building a good model is to capture the essential features of the biol-ogy without burdening the model with non-essential details Every model is to someextent a simplification of the biology, but models are valuable because they take ideasthat might have been expressed verbally or diagrammatically and make them more

explicit, so that they can begin to be understood in a quantitative rather than purely

1.2 Aims of modelling

The features of a model depend very much on the aims of the modelling exercise Wetherefore need to consider why people model and what they hope to achieve by sodoing Often the most basic aim is to make clear the current state of knowledge re-garding a particular system, by attempting to be precise about the elements involvedand the interactions between them Doing this can be a particularly effective way ofhighlighting gaps in understanding In addition, having a detailed model of a system

allows people to test that their understanding of a system is correct, by seeing if the

implications of their models are consistent with observed experimental data In tice, this model validation stage is central to the systems biology approach However,this work will often represent only the initial stage of the modelling process Once

prac-people have a model they are happy with, they often want to use their models

or impossible to do in the lab Such experiments may uncover important indirect lationships between model components that would be hard to predict otherwise Anadditional goal of modern biological modelling is to pool a number of small models

re-of well-understood mechanisms into a large model in order to investigate the effect

of interactions between the model components Models can also be extremely usefulfor informing the design and analysis of complex biological experiments

In summary, modelling and computer simulation are becoming increasingly portant in post-genomic biology for integrating knowledge and experimental dataand making testable predictions about the behaviour of complex biological systems

im-1.3 Why is stochastic modelling necessary?

Ignoring quantum mechanical effects, current scientific wisdom views biologicalsystems as essentially deterministic in character, with dynamics entirely predictablegiven sufficient knowledge of the state of the system (together with complete knowl-edge of the physics and chemistry of interacting biomolecules) At first this perhapssuggests that a deterministic approach to the modelling of biological systems is likely

to be successful However, despite the rapid advancements in computing technology,

we are still a very long way away from a situation where we might expect to be able

to model biological systems of realistic size and complexity over interesting timescales using such a molecular dynamic approach We must therefore use models that

Trang 34

WHY IS STOCHASTIC MODELLING NECESSARY? 5leave out many details of the “state” of a system (such as the position, orientation, andmomentum of every single molecule under consideration), in favour of a higher-levelview Viewed at this higher level, the dynamics of the system are not deterministic,but intrinsically stochastic, and consideration of statistical physics is necessary to un-cover the precise nature of the stochastic processes governing the system dynamics.

A more detailed discussion of this issue will have to be deferred until much later inthe book, once the appropriate concepts and terminology have been established Inthe meantime, it is helpful to highlight the issues using a very simple example thatillustrates the importance of stochastic modelling, both for simulation and inference

The example we will consider is known as the linear birth–death process In the

first instance, it is perhaps helpful to view this as a model for the number of bacteria

in a bacterial colony It is assumed that each bacterium in the colony gives rise to new

will define such things much more precisely later Let the number of bacteria in the

description of the system leads directly to the ordinary differential equation

dX(t)

which can be solved analytically to give the complete dynamics of the system as

other things worth noting about this solution In particular, the solution clearly only

λ = 0.5, µ = 0 will lead to exactly the same solution as λ = 1, µ = 0.5) In some

dynamics At first this might sound like a good thing, but it is clear that there is a

flip-side: namely that studying experimental data on bacteria numbers can only provide

data can only provide information about the “shape” of the curve, and the shape of

for network inference, as well as inference for rate constants It is clear that in this

model we cannot know from experimental data if we have a pure birth or deathprocess, or a process involving both births and deaths, as it is not possible to know if

∗ This also illustrates another point that is not widely appreciated — the fact that reliable network

Trang 35

in-6 INTRODUCTION TO BIOLOGICAL MODELLING

−0.3

−1

Figure 1.1 Five deterministic solutions of the linear birth–death process for values of λ− µ

given in the legend (x0= 50).

The problem, of course, is that bacteria don’t vary in number continuously anddeterministically They vary discretely and stochastically Using the techniques thatwill be developed later in this book, it is straightforward to understand the stochastic

process associated with this model as a Markov jump process, and to simulate it on a

computer By their very nature, such stochastic processes are random, and each timethey are simulated they will look different In order to understand the behaviour ofsuch processes it is therefore necessary (in general) to study many realisations of theprocess Five realisations are given in Figure 1.2, together with the correspondingdeterministic solution

It is immediately clear that the stochastic realisations exhibit much more esting behaviour and match much better with the kind of experimental data one islikely to encounter They also allow one to ask questions and get answers to issuesthat can’t be addressed using a continuous deterministic model For example, ac-

≃ 6.77 Even leaving aside the fact that this is not an integer, we seefrom the stochastic realisations that there is considerable uncertainty for the value of

X(2), and stochastic simulation allows us to construct, inter alia, a likely range of

the stochastic realisations that these do go extinct, and that there is considerable

ran-ference is necessarily more difficult than rate-parameter inran-ference, as determining the existence of a reaction is equivalent to deciding whether the rate of that reaction is zero.

Trang 36

WHY IS STOCHASTIC MODELLING NECESSARY? 7

Figure 1.2 Five realisations of a stochastic linear birth–death process together with the

con-tinuous deterministic solution (x0= 50, λ = 3, µ = 4).

domness associated with the time that this occurs Again, stochastic simulation will

allow us to understand the distribution associated with the time to extinction,

some-thing that simply isn’t possible using a deterministic framework

Another particularly noteworthy feature of the stochastic process representation is

λ + µ controls the degree of “noise” or “volatility” in the system This is a criticallyimportant point to understand — it tells us that if stochastic effects are present inthe system, we cannot properly understand the system dynamics unless we know

as it is not possible to infer the stochastic rate constants using a deterministic model.This has important implications for the use of stochastic models for inference fromexperimental data It suggests that given some data on the variation in colony size

are interested in (as well as inferring network structure, as we could also test to see

inference stage of the process

Although we have here considered a trivial example, the implications are broad

In particular, they apply to the genetic and biochemical network models that much of

Trang 37

8 INTRODUCTION TO BIOLOGICAL MODELLING

Figure 1.3 Five realisations of a stochastic linear birth–death process together with the

con-tinuous deterministic solution for four different ( λ, µ) combinations, each with λ− µ = −1

andx0= 50.

this book will be concerned with This is because genetic and biochemical networksinvolve the interaction of integer numbers of molecules that react when they collideafter random times, driven by Brownian motion Although it is now becoming in-creasingly accepted that stochastic mechanisms are important in many (if not most)genetic and biochemical networks, routine use of stochastic simulation in order to

because inference methods regularly used in practice work by fitting continuous terministic models to experimental data We have just seen that such methods cannot

in general give us reliable information about all of the parameters important for termining the stochastic dynamics of a system, and so stochastic simulation cannot

de-be done reliably until we have good methods of inference for stochastic models Itturns out that it is possible to formalise the problem of inference for stochastic ki-netic models from time-course experimental data, and this is the subject matter ofthe latter chapters However, it should be pointed out at the outset that inference forstochastic models is an order of magnitude more difficult than inference for deter-ministic models (in terms of the mathematics required, algorithmic complexity, andcomputation time), and is still the subject of a great deal of ongoing research

† But it is a lot more widespread than when the first edition of this book was written.

Trang 38

CHEMICAL REACTIONS 9

1.4 Chemical reactions

There are a number of ways one could represent a model of a biological system ologists have traditionally favoured diagrammatic schemas coupled with verbal ex-planations in order to convey qualitative information regarding mechanisms At theother extreme, applied mathematicians traditionally prefer to work with systems ofordinary or partial differential equations (ODEs or PDEs) These have the advantage

Bi-of being more precise and fully quantitative, but also have a number Bi-of tages In some sense differential equation models are too low level a description, asthey not only encode the essential features of the model, but also a wealth of accom-panying baggage associated with a particular interpretation of chemical kinetics that

disadvan-is not always well suited to application in the molecular biology context Somewherebetween these two extremes, the biochemist will tend to view systems as networks ofcoupled chemical reactions, and it appears that most of the best ways of representingbiochemical mechanisms exist at this level of detail, though there are many ways

of representing networks of this type Networks of coupled chemical reactions aresufficiently general that they can be simulated in different ways using different algo-rithms depending on assumptions made about the underlying kinetics On the otherhand, they are sufficiently detailed and precise that once the kinetics have been spec-ified, they can be used directly to construct full dynamic simulations of the systembehaviour on a computer

A general chemical reaction takes the form

known as stoichiometries The stoichiometries are usually (though not always)

as-sumed to be integers, and in this case it is asas-sumed that there is no common factor

of the stoichiometries That is, it is assumed that there is no integer greater than onewhich exactly divides each stoichiometry on both the left and right sides There is

proportions, along with what is produced

This is normally written

‡ Note that a chemical species that occurs on both the left- and right-hand sides with the same

stoichiom-etry is somewhat special, and is sometimes referred to as a modifier Clearly the reaction will have no effect on the amount of this species Such a species is usually included in the reaction because the rate

at which the reaction proceeds depends on the level of this species.

§ The use of the term “species” to refer to a particular type of molecule will be explained later in the chapter.

Trang 39

10 INTRODUCTION TO BIOLOGICAL MODELLING

not usually written explicitly Similarly, the reaction for the dissociation of the dimerwould be written

A reaction that can happen in both directions is known as reversible Reversible

re-actions are quite common in biology and tend not to be written as two separate tions They can be written with a double-headed arrow such as

If one direction predominates over the other, this is sometimes emphasised in thenotation So if the above protein prefers the dimerised state, this may be writtensomething like

2P ⇀

It is important to remember that the notation for a reversible reaction is simply aconvenient shorthand for the two separate reaction processes taking place In the

context of the discrete stochastic models to be studied in this book, it will not usually

be acceptable to replace the two separate reactions by a single reaction proceeding

at some kind of overall combined rate.

1.5 Modelling genetic and biochemical networks

Before moving on to look at different ways of representing and working with tems of coupled chemical reactions in the next chapter, it will be helpful to end thischapter by looking in detail at some basic biochemical mechanisms and how theiressential features can be captured with fairly simple systems of coupled chemicalreactions Although biological modelling can be applied to biological systems at avariety of different scales, it turns out that stochastic effects are particularly impor-tant and prevalent at the scale of genetic and biochemical networks, and these willtherefore provide the main body of examples for this book

sys-1.5.1 Transcription (prokaryotes)

Transcription is a key cellular process, and control of transcription is a tal regulation mechanism As a result, virtually any model of genetic regulation islikely to require some modelling of the transcription process This process is muchsimpler in prokaryotic organisms, so it will be helpful to consider this in the firstinstance Here, typically, a promoter region exists just upstream of the gene of inter-est RNA-polymerase (RNAP) is able to bind to this promoter region and initiate thetranscription process, which ultimately results in the production of an mRNA tran-script and the release of RNAP back into the cell The transcription process itself iscomplex, but whether it will be necessary to model this explicitly will depend verymuch on the modelling goals If the modeller is primarily interested in control andthe downstream effects of the transcription process, it may not be necessary to modeltranscription itself in detail

Trang 40

fundamen-MODELLING GENETIC AND BIOCHEMICAL NETWORKS 11

Figure 1.4 Transcription of a single prokaryotic gene.

simple representation of this process as a system of coupled chemical reactions can

chemical species within the cell, but we have chosen here not to model at such afine level of detail One detail not included here that may be worth considering isthe reversible nature of the binding of RNAP to the promoter region It is also worthnoting that these two reactions form a simple linear chain, whereby the product ofthe first reaction is the reactant for the second Indeed, we could write the pair ofreactions as

It is therefore tempting to summarise this chain of reactions by the single reaction

and this is indeed possible, but is likely to be inadequate for any model of regulation

model for competitive binding of RNAP and a repressor in the promoter region

If modelling the production of the entire RNA molecule in a single step is felt to be

an oversimplification, it is relatively straightforward to model the explicit elongation

of the molecule As a first attempt, consider the following model for the transcription

Tiêu đề	Stochastic Modelling for Systems Biology
Tác giả	Darren J. Wilkinson
Trường học	University of Bath
Chuyên ngành	Mathematical Sciences
Thể loại	textbook
Năm xuất bản	2011
Thành phố	London

Định dạng
Số trang	360
Dung lượng	6,87 MB