Computational methods in systems biology 14th international conference, CMSB 2016

We advocates here the use of mathematical logic for sys-tems biology, as a unified framework well suited for both modeling thedynamic behaviour of biological systems, expressing propert

Trang 1

Ezio Bartocci · Pietro Lio

Nicola Paoletti (Eds.)

Trang 2

Lecture Notes in Bioinformatics 9859 Subseries of Lecture Notes in Computer Science

LNBI Series Editors

University of Southern California, Los Angeles, CA, USA

LNBI Editorial Board

Trang 3

More information about this series at http://www.springer.com/series/5381

Trang 4

Ezio Bartocci • Pietro Lio

Nicola Paoletti (Eds.)

Trang 5

Lecture Notes in Bioinformatics

DOI 10.1007/978-3-319-45177-0

Library of Congress Control Number: 2016948626

LNCS Sublibrary: SL8 – Bioinformatics

This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on micro ﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a speci ﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Trang 6

This volume contains the papers presented at CMSB 2016, the 14th Conference onComputational Methods in Systems Biology, held on September 21–23, 2016 at theComputer Laboratory, University of Cambridge (UK)

The CMSB annual conference series, initiated in 2003, provides a unique forum ofdiscussion for computer scientists, biologists, mathematicians, engineers, and physi-cists interested in a system-level understanding of biological processes Topics ofinterest include formalisms for modelling biological processes; models and their bio-logical applications; frameworks for model veriﬁcation, validation, analysis, andsimulation of biological systems; high-performance computational systems biology andparallel implementations; model inference from experimental data; model integrationfrom biological databases; multi-scale modelling and analysis methods; and compu-tational approaches for synthetic biology Case studies in systems and synthetic biologyare especially encouraged

There were 37 regular submissions, 3 tools papers, and 9 poster submissions Eachregular submission and tool paper submission was reviewed by at least 4 ProgramCommittee members The committee decided to accept 17 regular papers, 3 toolpapers, and all submitted posters On average, regular and tool papers received 4.2reviews each, while each poster submissions received 2 reviews To complement thecontributed papers, we also included in the program four invited lectures: Luca Cardelli(Microsoft Research, UK), Joëlle Despeyroux (Inria Sophia Antipolis, France), RaduGrosu (TU Wien, Austria), and Jane Hillston (University of Edinburgh, UK)

As program co-chairs, we have many people to thank We are extremely grateful tothe members of the Program Committee and the external reviewers for their peerreviews and the valuable feedback they provided to the authors We thank also theauthors of the accepted papers for revising the papers according to the suggestions

of the program committee and for their responsiveness on providing the camera-readycopies within the deadline Our special thanks goes to François Fages and all themembers of the CMSB Steering Committee for their advice on organizing and runningthe conference We acknowledge the support of the EasyChair conference systemduring the reviewing process and the production of these proceedings We thankKaushik Chowdhury and the IEEE Computer Society Technical Committee on Sim-ulation for supporting the best student paper award and the best poster award We thankNVIDIA for providing their equipment as the best paper award Our gratitude also goes

to the tool track chair, Claudio Angione, and the local organization chair, Max way, for their help, support, and spirited participation before, during, and after theconference We are also really grateful to Paolo Zuliani for having organized a min-isymposium on Automated Reasoning for Systems Biology, which was held a daybefore the conference It is our pleasant duty to acknowledge theﬁnancial support fromour sponsor, Microsoft Research, and the support of the Computer Laboratory at theUniversity of Cambridge, where this year’s event was hosted Finally, we would like to

Trang 7

Con-thank all the participants of the conference It was the quality of their presentations andtheir contribution to the discussions that made the meeting a scientiﬁc success.

Pietro LioNicola Paoletti

Trang 8

Program Committee Co-chairs

Ezio Bartocci TU Wien, Austria

Pietro Lio University of Cambridge, UK

Nicola Paoletti Oxford University, UK

Tools Track Chair

Claudio Angione Teesside University, UK

Local Organization Chair

Max Conway University of Cambridge, UK

Program Committee

Gianluca Ascolani University of Cambridge, UK

Julio Banga IIM-CSIC, Spain

Ezio Bartocci TU Wien, Austria

Gregory Batt Inria Paris-Rocquencourt, France

Luca Bortolussi University of Trieste, Italy

Jérémie Bourdon Nantes University, France

Andrea Bracciali University of Stirling, UK

Luca Cardelli Microsoft Research, UK

MilanČeška Oxford University, UK

Vincent Danos University of Edinburgh, UK

Joëlle Despeyroux Inria Sophia Antipolis, France

Diego Di Bernardo University of Naples Federico II, Italy

François Fages Inria Paris-Rocquencourt, France

Flavio H Fenton Georgia Tech, USA

Jérơme Feret Inria/Ecole Normale Supérieure, France

Calin Guet IST Austria

Monika Heiner Brandenburg University of Technology, GermanyLila Kari University of Western Ontario, Canada

Heinz Kưppl TU Darmstadt, Germany

Hillel Kugler Bar-Ilan University, Israel

Marta Kwiatkowska University of Oxford, UK

Pietro Lio University of Cambridge, UK

Trang 9

Oded Maler CNRS-VERIMAG, France

Giancarlo Mauri University of Milano Bicocca, Italy

Pedro Mendes University of Manchester, UK/University of Connecticut

Health Center, USANicola Paoletti University of Oxford, UK

Tatjana Petrov IST Austria

Andrew Phillips Microsoft Research Cambridge, UK

Carla Piazza University of Udine, Italy

Ovidiu Radulescu University of Montpellier 2, France

Blanca Rodriguez University of Oxford, UK

Olivier Roux École Centrale de Nantes, France

DavidŠafránek Masaryk University, Czech Republic

Guido Sanguinetti University of Edinburgh, UK

Scott A Smolka Stony Brook University, USA

Oliver Stegle EBI, UK

Jörg Stelling ETH Zurich, Switzerland

Carolyn Talcott SRI International, USA

P.S Thiagarajan National University of Singapore, Singapore

Adelinde Uhrmacher University of Rostock, Germany

Verena Wolf Saarland University, Germany

Boyan Yordanov Microsoft Research Cambridge, UK

Paolo Zuliani Newcastle University, UK

Tool Evaluation Committee

Liu Bing Carnegie Mellon University, USA

Pierre Boutillier Harvard Medical School, USA

Giulio Caravagna University of Edinburgh, UK

Tommaso Dreossi UC Berkeley, USA

Maxime Folschette University of Nice Sophia-Antipolis, France

Fabian Fröhlich Helmholtz Zentrum München, Germany

Attila Gabot Aachen University, Germany

Emanuel Goncalves EBI, UK

Benjamin Gyori Harvard Medical School, USA

Ariful Islam Carnegie Mellon University, USA

Luca Laurenti University of Oxford, UK

Curtis Madsen Boston University, USA

Dimitrios Milios University of Edinburgh, UK

Niall Murphy Microsoft Research Cambridge, UK

Abhishek Murthy Philips Research, USA

Aurélien Naldi Université de Montpellier, France

Rasmus Petersen Queen Mary University of London, UK

Ly Kim Quyen Ecole Normale Supérieure, France

Giselle Reis Inria-Saclay, France

Satya Swarup Samal University of Bonn, Germany

Trang 10

Fedor Shmarov Newcastle University, UK

Elisabeth Yaneske Teesside University, UK

Steering Committee

Jérémie Bourdon Nantes University, France

Finn Drablos NTNU, Norway

François Fages Inria Paris-Rocquencourt, France

David Harel Weizmann Institute of Science, Israel

Monika Heiner Brandenburg University of Technology, GermanyTommaso Mazza IRCCS Casa Sollievo della Sofferenza - Mendel, ItalyPedro Mendes University of Manchester, UK/University of Connecticut

Health Center, USASatoru Miyano University of Tokyo, Japan

Gordon Plotkin University of Edinburgh, UK

Corrado Priami CoSBi/Microsoft Research, University of Trento, ItalyOlivier Roux École Centrale de Nantes, France

Carolyn Talcott SRI International, USA

Adelinde Uhrmacher University of Rostock, Germany

Lück, AlexanderMagnin, MorganMagron, VictorNiehren, Joachim

Nobile, MarcoPatanè, AndreaPaulevé, LọcRamanathan, S

Rohr, ChristianRuet, PaulSchnoerr, DavidSoliman, SylvainSrivastav, AbhinavTschaikowski, Max

Trang 11

Inference of Delayed Biological Regulatory Networks from Time

Series Data 30Emna Ben Abdallah, Tony Ribeiro, Morgan Magnin, Olivier Roux,

and Katsumi Inoue

Matching Models Across Abstraction Levels with Gaussian Processes 49Giulio Caravagna, Luca Bortolussi, and Guido Sanguinetti

Target Controllability of Linear Networks 67Eugen Czeizler, Cristian Gratie, Wu Kai Chiu, Krishna Kanhaiya,

and Ion Petre

High-Performance Symbolic Parameter Synthesis of Biological Models:

A Case Study 82Martin Demko, Nikola Beneš, Luboš Brim, Samuel Pastva,

and DavidŠafránek

Influence Systems vs Reaction Systems 98François Fages, Thierry Martinez, David A Rosenblueth,

and Sylvain Soliman

Local Traces: An Over-Approximation of the Behaviour of the Proteins

in Rule-Based Models 116

Jérôme Feret and Kim Quyên Lý

Bifurcation Analysis of Cardiac Alternans Usingd-Decidability 132

Md Ariful Islam, Greg Byrne, Soonho Kong, Edmund M Clarke,

Rance Cleaveland, Flavio H Fenton, Radu Grosu, Paul L Jones,

and Scott A Smolka

Trang 12

A Stochastic Hybrid Approximation for Chemical Kinetics Based

on the Linear Noise Approximation 147Luca Cardelli, Marta Kwiatkowska, and Luca Laurenti

Autonomous and Adaptive Control of Populations of Bacteria

Through Environment Regulation 168Chieh Lo and Radu Marculescu

Parameter Estimation for Reaction Rate Equation Constrained

Mixture Models 186Carolin Loos, Anna Fiedler, and Jan Hasenauer

Normalizing Chemical Reaction Networks by Confluent Structural

Simplification 201Guillaume Madelaine, Elisa Tonello, Cédric Lhoussaine,

and Joachim Niehren

Fast Simulation of Probabilistic Boolean Networks 216Andrzej Mizera, Jun Pang, and Qixia Yuan

Formal Quantitative Analysis of Reaction Networks Using Chemical

Organisation Theory 232Chunyan Mu, Peter Dittrich, David Parker, and Jonathan E Rowe

Goal-Oriented Reduction of Automata Networks 252

Lọc Paulevé

Hybrid Reductions of Computational Models of Ion Channels Coupled

to Cellular Biochemistry 273Jasha Sommer-Simpson, John Reinitz, Leonid Fridlyand,

Louis Philipson, and Ovidiu Radulescu

Formal Modeling and Analysis of Pancreatic Cancer Microenvironment 289Qinsi Wang, Natasa Miskov-Zivanov, Bing Liu, James R Faeder,

Michael Lotze, and Edmund M Clarke

Františka Romanovská, and Jan Červený

Trang 13

PREMER: Parallel Reverse Engineering of Biological Networks

with Information Theory 323Alejandro F Villaverde, Kolja Becker, and Julio R Banga

Posters

Modeling Peptide Adsorption on Inorganic Surfaces 339Priya Anand, Monika Borkowska-Panek, Florian Gußmann, Karin Fink,

and Wolfgang Wenzel

Temperature Dependence of Leakiness of Transcription Repression

Mechanisms of Escherichia coli 341Nadia Goncalves, Samuel M.D Oliveira, Vinodh K Kandavalli,

Jose M Fonseca, and Andre S Ribeiro

GPU-Accelerated Steady-State Analysis of Probabilistic Boolean Networks 343Andrzej Mizera, Jun Pang, and Qixia Yuan

PINT: A Static Analyzer for Dynamics of Automata Networks 346

Lọc Paulevé

Linear Temporal Logic for Biologists in BMA 348Benjamin A Hall, Nir Piterman, and Jasmin Fisher

Deregulation of Osmotic Regulation Machinery Explains and Predicts

Cellular Transformation in Cancer and Disease 351David Shorthouse, Angela Riedel, Jacqueline Shields,

and Benjamin A Hall

Game Theoretic Consideration of Transgenic Bacteria in the Human Gut

Microbiota Converting Omega-6 to Omega-3 Fats 353Ahmed M Ibrahim and James Smith

Revealing Biomarker Mixtures in Lipid Pools from Large-Scale Lipidomics 354Kai Loell, Albert Koulman, and James Smith

Author Index 355

Trang 14

Invited Paper

Trang 15

(Mathematical) Logic for Systems Biology

(Invited Paper)

Jo¨elle Despeyroux(B)

Inria and CNRS, I3S, Sophia-Antipolis, Francejoelle.despeyroux@inria.fr

Abstract We advocates here the use of (mathematical) logic for

sys-tems biology, as a unified framework well suited for both modeling thedynamic behaviour of biological systems, expressing properties of them,and verifying these properties The potential candidate logics should have

a traditional proof theoretic pedigree (including a sequent calculus sentation enjoying cut-elimination and focusing), and should come with(certified) proof tools Beyond providing a reliable framework, this allowsthe adequate encodings of our biological systems We present two candi-date logics (two modal extensions of linear logic, called HyLL and SELL),along with biological examples The examples we have considered so farare very simple ones - coming with completely formal (interactive) proofs

pre-in Coq Future works pre-includes uspre-ing automatic provers, which wouldextend existing automatic provers for linear logic This should enable us

to specify and study more realistic examples in systems biology, cine (diagnosis and prognosis), and eventually neuroscience

We consider here the question of reasoning about biological systems in ematical) logic We show that two new logics, both modal extensions of linearlogic [12] (LL), are particularly well-suited to this purpose The ﬁrst logic, calledHybrid Linear Logic (HyLL), has been developed by the author in joint workwith K Chaudhuri [8] The second logic, an extension of Subexponential LinearLogic (SELL), has been independently proposed by C Olarte, E Pimentel and

(math-V Nigam [15] Both HyLL and SELL provides a uniﬁed framework to encodebiological systems, to express temporal properties of their dynamic behaviour,and to prove these properties By constructing proofs in the logics, we directlywitness reachability as logical entailment [13,17] This approach is in contrast tomost current approaches to applying formal methods to systems biology, whichgenerally encode biological systems either in a dedicated programming language[6,10,19], or in diﬀerential equations [5], express properties in a temporal logic,and then verify these properties against some form of traces (model-checking),eventually built using an external simulator

In a joint work with E De Maria and A Felty, we presented some ﬁrst cations of HyLL to systems biology [13] In these ﬁrst experiments, we focused on

appli-c

Springer International Publishing AG 2016

E Bartocci et al (Eds.): CMSB 2016, LNBI 9859, pp 3–12, 2016.

Trang 16

In a recent joint work with C Olarte and E Pimentel [9], we compared HyLLand SELL, providing two encodings The ﬁrst enoding is from HyLL’s logicalrules into LL with the highest level of adequacy, hence showing that HyLL is asexpressive as LL We also proposed an encoding of HyLL into SELL (SELLplus quantiﬁcation over locations) that gives better insights about the meaning ofworlds in HyLL This shows that SELL is more expressive than HyLL However,the simplicity of HyLL might be of interest, both from the user point of view and

as far as proof search is concerned (a priori easier and more eﬃcient in HyLLthan in SELL) In this joint work, we furthermore encoded temporal operators

of Computational Tree Logic (CTL) into linear logic with ﬁxed point operators

We ﬁrst recall here these two previous works Then we brieﬂy mention ourcurrent joint work with P Lio, on formalizing the evolution of cancer cells,concluding with some future work

This note is thus based on joint works with K Chaudhuri (INRIA Saclay),

A Felty (Univ of Ottawa), P Lio (Cambridge Univ.), and C Olarte and E.Pimentel (Universidade Federal do Rio Grande do Norte, Brazil)

Although we assume that the reader is familiar with linear logic [12] (LL), wereview some of its basic proof theory in the following sections First, let us gentlyintroduce linear logic by means of an example

Linear Logic (LL) [12] is particularly well suited for describing state transitionsystems LL has been successfully used to model such diverse systems as: the

π-calculus, concurrent ML, security protocols, multiset rewriting, and games.

In the area of biology, a rule of activation (e.g., a protein activates a gene orthe transcription of another protein) can be modeled by the following LL axiom:

active(a, b)def

= pres(a) −◦ (pres(a) ⊗ pres(b)).

The formula active(a, b) describes the fact that a state where a is present

(pres(a) is true) can evolve into a state where both pres(a) and pres(b) aretrue

Propositions such aspres(a) are called resources, and a rule in the logic can

be viewed as a rewrite rule from a set of resources into another set of resources,where a set of resources describes a state of the system Thus, a particular statetransition system can be modeled by a set of rules of the above shape The rules of

Trang 17

(Mathematical) Logic for Systems Biology (Invited Paper) 5

the logic then allow us to prove some desired properties of the system, such as, forexample, the existence of a stable state However, linear implication is timeless.Linear implication can be used to model one event occurring after another, but

it cannot be precise about how many steps or how long the delay is withoutexplicitly encoding time In a domain where resources have lifetimes and state

changes have temporal, probabilistic or stochastic constraints, then the logic will

allow inferences that may not be realizable in the system being modeled Thiswas the motivation of the development of HyLL, which was designed to representconstrained transition systems

and their units 0 and are additive; ∀ and ∃ are (ﬁrst-order) quantiﬁers; and !

and ? are the exponentials (called bang and question-mark, respectively).First proposed by Andreoli [1] for linear logic, focused proof systems providenormal form proofs for cut-free proofs The connectives of linear logic can be

divided into two classes The negative connectives have invertible introduction

rules: these connectives are ,⊥, &, , ∀, and ? The positive connectives ⊗, 1,

⊕, 0, ∃, and ! are the de Morgan duals of the negative connectives A formula

is positive if it is a negated atom or its top-level logical connective is positive Similarly, a formula is negative if it is an atom or its top-level logical connective

is negative

Focused proofs are organized into two phases In the negative phase, all the invertible inference rules are eagerly applied The positive phase begins by choos- ing a positive formula F on which to focus Positive rules are applied to F until

either 1 or a negated atom is encountered (and the proof must end by applyingthe initial rules), the promotion rule (!) is applied, or a negative subformula isencountered and the proof switches to the negative phase

This change of phases on proof search is particularly interesting when the

focused formula is a bipole [1] Focusing on a bipole will produce a single positiveand a single negative phase This two-phase decomposition enables us to ade-quately capture the application of object-level inference rules by the meta-levellinear logic, as shown in [9]

Hybrid Linear Logic (HyLL) is a conservative extension of Intuitionistic order Linear Logic (ILL) [12] where the truth judgments are labelled by worldsrepresenting constraints on states and state transitions Instead of the ordinary

ﬁrst-judgment “A is true”, for a proposition A, ﬁrst-judgments of HyLL are of the form “A

is true at world w”, abbreviated as A @ w Particular choices of worlds produce particular instances of HyLL Typical examples are “A is true at time t”, or “A

is true with probability p” HyLL was ﬁrst proposed in [8] and it has been used

as a logical framework for specifying biological systems [13]

Trang 18

6 J Despeyroux

Formally, worlds are deﬁned as follows

Definition 1 (HyLL worlds) A constraint domain W is a monoid structure

The identity world ι is -initial and is intended to represent the lack of any

constraints Thus, the ordinary ﬁrst-order linear logic is embeddable into anyinstance of HyLL by setting all world labels to the identity A typical, simpleexample of constraint domain is

Atomic propositions (p, q, ) are applied to a sequence of terms (s, t, ), which are drawn from an untyped term language containing constants (c, d, ), term variables (x, y, ) and function symbols (f, g, ) applied to a list of terms

intuitionistic linear logic and the two hybrid connectives satisfaction (at), which states that a proposition is true at a given world (w, ι, u.v, ), and localization

following grammar summarizes the syntax of HyLL

t :: = c | x | f(t)

∀x A | ∃x A | (A at w) | ↓ u A | ∀u A | ∃u A

Note that world u is bounded in the propositions ↓ u A, ∀u A and ∃u A.

World variables cannot be used in terms, and neither can term variables occur

in worlds This restriction is important for the modular design of HyLL because

it keeps purely logical truth separate from constraint truth We note that↓ and

at commute freely with all non-hybrid connectives [8]

The sequent calculus [11] presentation of HyLL uses sequents of the form

multiset of judgments of the form A @ w Note that in a judgment A @ w (as

in a proposition A at w), w can be any expression in W, not only a variable.

The inference rules dealing with the new hybrid connectives are depictedbelow (the complete set of rules can be found in [8])

Note that (A at u) is a mobile proposition: it carries with it the world at which

it is true Weakening and contraction are admissible rules for the unboundedcontext

The most important structural properties are the admissibility of the generalidentity (i.e over any formulas, not only atomic propositions) and cut theorems.While the ﬁrst provides a syntactic completeness theorem for the logic, the latter

guarantees consistency (i.e that there is no proof of ; 0 @ w).

Trang 19

in ILL It is worth noting that HyLL is more expressive than S5, as it allows

direct manipulation of the worlds using the hybrid connectives and HyLL’s δ

connective (see Sect.5) is not deﬁnable in S5 We also note that HyLL admits

a complete focused [1] proof system The interested reader can ﬁnd proofs andfurther meta-theoretical theorems about HyLL in [8]

Modal Connectives We can deﬁne modal connectives in HyLL as follows:

Definition 2 (Modal connectives).

Adef

A [resp ♦A] represents all [resp some] state(s) satisfying A and reachable from now The connective δ represents a form of delay.

Linear logic with subexponentials (SELL) shares with LL all its connectivesexcept the exponentials: instead of having a single pair of exponentials ! and

?, SELL may contain as many subexponentials [7,18], written !a and ?a, as oneneeds The grammar of formulas in SELL is as follows:

The proof system for SELL is speciﬁed by a subexponential signature Σ =

where I is a set of labels, U ⊆ I is a set specifying which subexponentials allow

weakening and contraction, and is a pre-order among the elements of I We

shall use a, b, to range over elements in I and we will assume that is

upwardly closed with respect to U , i.e., if a ∈ U and a b, then b ∈ U.

The system SELL is constructed by adding all the rules for the linear logicconnectives except for the exponentials The rules for subexponentials are dere-

liction and promotion of the subexponential labelled with a ∈ I

Here, the rule !a has the side condition that a a i for alli That is, one can

only introduce a !a on the right if all other formulas in the sequent are marked

Trang 20

8 J Despeyroux

with indices that are greater or equal than a Moreover, for all indices a ∈ U,

we add the usual rules for weakening and contraction

We can enhance the expressiveness of SELL with the subexponential ﬁers and ([15,18]) given by the rules (omitting the subexponential signature)

where l eis fresh Intuitively, subexponential variables play a similar role as

eigen-variables The generic variable l x : a represents any subexponential, constant or variable in the ideal of a Hence l x can be substituted by any subexponential l

of type b (i.e., l : b) if b a We call the resulting system SELL.

As shown in [15,18], SELL admits a cut-free and also a complete focusedproof system

Theorem 2. SELLadmits cut-elimination for any subexponential signature Modal connectives We can deﬁne modal connectives in SELL as follows:

In a joint work with E De Maria and A Felty, we presented some ﬁrst tions of HyLL to systems biology [13] In these ﬁrst experiments, we focused onBoolean systems and in this case a time unit corresponds to a transition in thesystem

applica-The activation rule seen in LL (Sect.2.1) can be written in HyLL as

active(a, b)def

= pres(a) −◦ δ1 (pres(a) ⊗ pres(b)).

We chosed a simple yet representative biological example concerning theDNA-damage repair mechanism based on proteins p53 and Mdm2, and presentand proved several properties of this system All these properties were reacha-bility properties or the existence of an invariant Most interesting proofs requireinduction or case analysis, that we borrowed from the meta-level (Coq) Wefully formalized these proofs in the Coq Proof Assistant [3] In Coq, we can bothreason in HyLL and formalize meta-theoretic properties about it

We discussed the merits and eventual drawbacks of this new approach pared to approaches using temporal logic and model checking To better illustratethe correspondence with such approaches, which all use temporal logic to reasonabout (simulations of models of) the biological systems described, we also pre-sented, informally but in some detail, the encoding of temporal logic operators

com-in HyLL

Trang 21

We observe that, while linear logic has only seven logically distinct preﬁxes

of bangs and question-marks, SELL allows for an unbounded number of such

prefixes, e.g., ! i, or !i?j Hence, by using different prefixes, we allow for thespecification of richer systems where subexponentials are used to mark differ-ent modalities/states For instance, subexponentials can be used to representcontexts of proof systems [16]; to specify systems with temporal, epistemic andspatial modalities [18] and to specify and verify biological systems [17] An inhi-bition rule can be written in (classical) SELL as

inhib(a, b)def

= !t a −◦ ! t+1 (a ⊗ b ⊥ ).

HyLL and Linear Logic One may wonder whether the use of worlds in HyLL

increases also the expressiveness of LL In a joint work with C Olarte and E.Pimentel [9], we proved that this is not the case, by showing that HyLL rules can

be directly encoded into LL by using the methods proposed in [14] Moreover,

the encoding of HyLL into LL is adequate in the sense that a focused step in LL corresponds exactly to the application of one inference rule in HyLL.

HyLL and SELL Linear logic allows for the speciﬁcation of two kinds of context

maintenance: both weakening and contraction are available (classical context) orneither is available (linear context) That is, when we encode (linear) judgments

in HyLL belonging to diﬀerent worlds, the resulting meta-level atomic formulaswill be stored in the same (linear) LL context The same happens with classicalHyLL judgments and the classical LL context

Although this is perfectly ﬁne, encoding HyLL into SELLallows for a betterunderstanding of worlds in HyLL For that, we use subexponentials to representworlds, having each world as a linear context A HyLL judgment of the shape

Hence, HyLL judgments that hold at world w are stored at the w linear context

of SELL A judgment of the form G@w in the classical HyLL context is encoded

as the SELLformula ?c?w G@w Then, the encoding of G@w is stored in the

unbounded (classical) subexponential contextc

We showed that our encoding is indeed adequate Moreover, as before, the

adequacy of the encodings is on the level of derivations.

Information Confinement One of the features needed to specify spatial

modali-ties is information confinement : a space/world can be inconsistent and this does

not imply the inconsistency of the whole system We showed in [9] that tion confinement cannot be specified in HyLL The authors in [15] exploit thecombination of subexponentials of the form !w?win order to specify informationconfinement in SELL More precisely, note that the sequents (in a 2-sided pre-sentation of SELL) !w?w0 0 and ! w?w0 ! v?v0, representing “inconsistency is

informa-local” and “inconsistency is not propagated” respectively hold in SELL

Trang 22

10 J Despeyroux

Hybrid linear logic is expressive enough to encode some forms of modal operators,thus allowing for the specification of properties of transition systems As men-tioned in [13], it is possible to encode CTL temporal operators into HyLL con-sidering existential (E) and bounded universal (A) path quantifiers We extendedthese encodings in [9], showing how to fully capture E and A CTL quantifiers

in linear logic with ﬁxed points For that, we used the system μMALL [2] thatextends MALL (multiplicative, additive linear logic) with ﬁxed point operators

In [13], proofs of (encodings of) properties involving CTL quantiﬁers use tion borrowed from the (Coq) meta-level In [9], we could directly use ﬁxed points

induc-in linduc-inear logic

Concerning related work, it is worth noticing that there are some other logicalframeworks that are extensions of LL, for example, HLF [20] Being a logic in the

LF family, HLF is based on natural deduction, hence having a complex notion

of (βη) normal forms Thus adequacy (of encodings of systems) results are often

much harder to prove in HLF than in (focused) HyLL/SELL HLF seems tohave been later abandonned in favour of Hybridized Intuitionistic Linear Logic(HILL) [4] - a type theory based on a subpart of HyLL

Both HyLL and SELL have been used for formalizing and analyzing biologicalsystems [13,17] SELL proved to be a broader framework for handling suchsystems (in particular localities) However, the simplicity of HyLL may be ofinterest for speciﬁc purposes, such as building tools for diagnosis in biomedicine.Formal proofs in HyLL were implemented in [13], in the Coq [3] proof assis-tant It would be interesting to extend the implementations of HyLL given there

to SELL Such an interactive proof environment would enable both formal ies of encoded systems in SELL and formal meta-theoretical study of SELLitself

stud-We may pursue the goal of using HyLL/SELL for further applications Thatmight include neuroscience, a young and promising science where many hypothe-ses are provided and need to be veriﬁed Indeed, logic is a general tool whosearea of potential applications are not restricted per se This is in contrast tomost of the other approaches, which are valid only in a restricted area (typicallyinside or outside the cell)

In an ongoing joint work with P Lio, we are formalizing the evolution of cer cells, acquiring driver or passenger mutations A rule describing an intravasat-ing Circulating Tumour Cell, for example, might be:

can-C(n, breast, f, [EPCAM])−◦ δ d C(n, blood, 1, [EPCAM])

where f is a ﬁtness parameter, here in {0, 1} Our long term goal here is the

design of a Logical Framework for disease diagnosis and therapy prognosis.This requires the development of automatic tools for proof search in our logics

Trang 23

These tools should beneﬁt both from current research on proof search in linearlogic and from current developments of automatic provers for SELL

Coq’Art: The Calculus of Inductive Constructions Springer, Heidelberg (2004)

4 Caires, L., Perez, J., Pfenning, F.: Logic-based domain-aware session types (2014)(submitted)

5 Campagna, D., Piazza, C.: Hybrid automata in systems biology: how far can we

6 Danos, V.: Agile modelling of cellular signalling (invited paper) In: Proceedings

of the 5th Workshop on Structural Operational Semantics (SOS) Electronic Notes

in TCS, vol 229, pp 3–10 Elsevier (2009)

7 Danos, V., Joinet, J.B., Schellinx, H.: The structure of exponentials: uncoveringthe dynamics of linear logic proofs In: Mundici, D., Gottlob, G., Leitsch, A (eds.)KGC 1993 LNCS, vol 713, pp 159–171 Springer, Heidelberg (1993)

8 Despeyroux, J., Chaudhuri, K.: A hybrid linear logic for constrained transition tems In: Post-Proceedings of the 9th International Conference on Types for Proofsand Programs (TYPES 2013) Leibniz International Proceedings in Informatics,vol 26, pp 150–168 Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2014)

sys-9 Despeyroux, J., Olarte, C., Pimentel, E.: Hybrid and subexponential linear logics.In: Proceedings of the 11th workshop on Logical and Semantic Frameworks, withApplications (LSFA) (2016)

10 Fages, F., Soliman, S.: Formal cell biology in biocham In: Bernardo, M., Degano,P., Zavattaro, G (eds.) SFM 2008 LNCS, vol 5016, pp 54–80 Springer,Heidelberg (2008)

11 Gentzen, G.: Investigations into logical deductions, 1935 In: Szabo, M.E (ed.) TheCollected Papers of Gerhard Gentzen, pp 68–131 North-Holland Publishing Co.,Amsterdam (1969)

13 de Maria, E., Despeyroux, J., Felty, A.P.: A logical framework for systems ogy In: Fages, F., Piazza, C (eds.) FMMB 2014 LNCS, vol 8738, pp 136–155.Springer, Heidelberg (2014)

biol-14 Miller, D., Pimentel, E.: A formal framework for specifying sequent calculus proof

15 Nigam, V., Olarte, C., Pimentel, E.: A general proof system for modalities inconcurrent constraint programming In: D’Argenio, P.R., Melgratti, H (eds.)CONCUR 2013 – Concurrency Theory LNCS, vol 8052, pp 410–424 Springer,Heidelberg (2013)

16 Nigam, V., Pimentel, E., Reis, G.: Specifying proof systems in linear logic with

17 Olarte, C., Chiarugi, D., Falaschi, M., Hermith, D.: A proof theoretic view ofspatial and temporal dependencies in biochemical systems Theoret Comput Sci

Trang 24

20 Reed, J.: Hybridizing a logical framework In: International Workshop on HybridLogic (HyLo), Seattle, USA, August 2006

Trang 25

Regular Papers

Trang 26

Generalized Method of Moments for Stochastic

Reaction Networks in Equilibrium

Michael Backenk¨ohler1, Luca Bortolussi1,2, and Verena Wolf1(B)

wolf@cs.uni-saarland.de

University of Trieste, Trieste, Italy

Abstract Calibrating parameters is a crucial problem within

quantita-tive modeling approaches to reaction networks Existing methods for chastic models rely either on statistical sampling or can only be applied

sto-to small systems Here we present an inference procedure for ssto-tochas-tic models in equilibrium that is based on a moment matching schemewith optimal weighting and that can be used with high-throughput datalike the one collected by ﬂow cytometry Our method does not require

stochas-an approximation of the underlying equilibrium probability distributionand, if reaction rate constants have to be learned, the optimal values can

be computed by solving a linear system of equations We evaluate theeﬀectiveness of the proposed approach on three case studies

Stochastic models have proven to be a powerful tool for the analysis of chemical reaction networks Especially when chemical species are present in lowcopy numbers, a stochastic approach provides important insights on the random-ness inherent to the system when compared to deterministic approaches For theinference of parameters based on experimentally observed samples, more detaileddescriptions given by stochastic models can substantially improve the quality ofthe estimation [18]

bio-The arguably most popular stochastic modeling approach to chemical ics is based on a description in terms of continuous time Markov chains(CTMC) [10] In this case, the exact time evolution of the entire probabilitydistribution is given by the chemical master equation (CME) Although, thisdescription is exact up to the numerical precision of the integration scheme, itssolution is only feasible for simple systems with small molecular populations [22].Therefore, the applicability of inference approaches based on a maximum likeli-hood estimation (MLE) is limited to this class of networks since they require anapproximation of the probability distribution, i.e., a solution of the CME [2,3]

kinet-An alternative to ease the computational burden is to use stochastic simulation

to estimate the likelihood function or to learn parameters in a Bayesian ting, e.g by ABC methods [33] However, the total number of simulations to beperformed is huge, still resulting in a computationally intensive approach

set-c

Springer International Publishing AG 2016

E Bartocci et al (Eds.): CMSB 2016, LNBI 9859, pp 15–29, 2016.

Trang 27

16 M Backenk¨ohler et al.

A computationally more feasible approach is to consider the statisticalmoments (such as the expected value or the variance) instead of the entire prob-ability distribution Moment-based analysis approaches rely on a derivation of

a system of equations for the time-derivative of the moments [1,6,30] Since the

exact time evolution of the moments of order k may depend on moments of

higher order, a closure method has to be applied to arrive at a ﬁnite system ofequations However, moment-based methods complicate the application of MLEsince a reconstruction of the distribution is computationally expensive and may

be inaccurate depending on the shape of the distribution [4]

In this paper we propose a parameter estimation approach that does notrely on MLE and distribution approximations, but on the generalized method

of moments (GMM), which has been a widely used inference method in metrics for over 30 years [12,14] We consider the case in which experimentallyobserved samples are drawn when the process is in equilibrium Population snap-shot data of equilibrium processes are considered, for instance, if the (possiblymulti-stable) steady-state expression in a gene regulatory network is investi-gated [7,17] or if the steady-state behavior of a mutant is compared the behavior

econo-of the wild type [19,27] Modern high-throughput experimental techniques, likeﬂow cytometry, deliver a large amount of measurements from a population ofcells at steady state and thus give detailed information about the distribution ofproteins and RNAs [13,16,25] The idea of the GMM is to consider constraints

of the formE[f(Y i , θ0)] = 0 where Y i is a sample andθ0the parameter vector

We propose to choose f as the time derivatives of the statistical moments of

the model, which can directly be derived from the CME This follows from thefact that the time derivatives will become equal to zero when the process is inequilibrium A major advantage, given the availability of steady state samples,

is that, compared to time depended observations, no moment closure imations are necessary Instead exact equations for the steady state momentscan be used If the propensities are linear in the unknown parameters, as is thecase for mass action kinetics, a closed linear form is possible This results in anextremely fast inference procedure since no numerical optimization is needed Incase of propensities that are non-linear in the parameters numerical optimization

approx-is necessary Still, no numerical integration of moment equations or probabilities

is needed since the objective function corresponds to the right side of the steadystate moment equations

The moment equations may also contain moments of species whose quantity

is hard to measure (e.g the state of a promoter) Instead of treating these latentvariables as unknown (probably non-linear) parameters, here we propose a clus-tering approach that estimates promoter states in a preprocessing step Then, aclosed linear solution is still possible, which again enables an accurate estimation

in very short time

We analyse the eﬀectiveness of the GMM approach for the p53 oscillatormodel [9] and two variants of the genetic toggle switch [8,20] Our results showthat using moments of up to at least second order yields accurate estimates.The inclusion of higher order moments (higher than three) can lead to a further

Trang 28

Method of Moments for Stochastic Reaction Networks in Equilibrium 17

decrease of the estimators variances, but for the p53 model and only few ples the estimation becomes worse Nevertheless, even for comparatively smallsample sizes (100) the estimates are usually tightly distributed around the trueparameter value when moments up to order two or three are considered.The paper is organized as follows We ﬁrst provide some background on themodel in Sect.2 and present our inference approach based on GMM in Sect.3

sam-We discuss the inference results for the case studies in Sect.4 and conclude thepaper in Sect.6

A stochastic model of a network of chemical reactions is usually speciﬁed by a

set of n species, which are represented by a set of symbols S1, ,Sn We areinterested in the system state, i.e., the number of individuals of the species,and thus consider state space S ⊆ IN n

≥0 Furthermore, a set of J reactions is

given describing the interactions between the diﬀerent molecular populations

For j ∈ {1, , J} reaction R j is speciﬁed by its stoichiometry

where the vectorsν −

j andν+

j ∈ IN n

≥0 with entries ν j,i − and ν j,i+ for i ∈ {1, , n}

specify how many molecules are consumed (produced) of each type, respectively.The vector ν j = ν+

j − ν −

j is called the change vector of R j The propensity

functions α j are such that α j :S × Θ → IR ≥0 , where Θ is the parameter space.

If mass action kinetics are assumed, then α j is the product of the rate constant c j and the number of possible combinations of reactant molecules, i.e., α j(x, θ) =

only impose certain regularity conditions on the propensity functions, such ascontinuity and the existence of certain expected values If a reaction does notfollow mass action propensities, we give the propensity function separately fromthe stoichiometry (1)

Under the assumption of well-stirredness and thermal equilibrium such asystem can be accurately described by a continuous-time Markov chain (CTMC)

probability distribution is given by the chemical master equation (CME):

Due to the largeness of the state space the integration of dt d P is computationally

infeasible, especially if we have to integrate until convergence to determine theequilibrium distribution Given (2) it is straight-forward to compute the time

derivative of the expectation of some polynomial function g : S → IR [6]:

Trang 29

where we omit the dependence of X on t Here we are concerned with the

where we use the multi-index notation xm = x m1

n for the vectors

m = (m1, , m n) ∈ IN n

given by the sum m1+· · · + m n The ﬁrst order moment of the i-th population,

for example, is obtained from (3) by setting g( x) = x i:

In general, the equation of a moment of a certain order may depend on moments

of higher order, except if α is constant or linear, i.e., of the form cT

j x+b jfor some

constant c j ∈ IR n and b j ∈ IR Here, we do not aim at ﬁnding a ﬁnite system

of ODEs to approximate the moments but we rather propose to use the exact

moment equations when the system is in equilibrium The equilibrium probability

of a statex is deﬁned as the limit of P (X(t)=x) when t → ∞, i.e.,

and is uniquely deﬁned for ergodic processesX Since the equilibrium

distribu-tion is independent of time, the expected values in (3) are also time-independent

when t → ∞ Thus, we can use the right side of (3) to estimate propensityparameters given samples from the equilibrium distribution

We propose to use the moments of the equilibrium distribution as an inputfor a GMM inference, which is a very generic framework for parameter estima-tion [12,23] It is most popular in econometrics, where often the exact distribu-tion of a model is not known In this case MLE cannot be used since it needs asuﬃciently accurate description of the distribution for its optimality properties

to hold As opposed to this, the GMM is based on the construction and

min-imization of certain cost functions, called moment conditions, which relate the

population and sample moments A moment condition is given by a functionwhose expected value is zero for the true parameter value θ0 Given indepen-dent samples Y1, , Y N of the processX in equilibrium, a vector of moment

conditions is given by

E[f(Y, θ0)] = 0 , (6)where we omit the index of the samples whenever they appear within the expec-tation operator since Y1, , Y N are identically distribution according to the

equilibrium distribution π Moreover, let f be a vector of q diﬀerent functions,

Trang 30

i.e.,f : (S × Θ) → IR q The sample equivalent of (6) for the vectorf of moment

Depending on the number of such conditions q and the number of parameters

to be estimated p, we distinguish for the estimated value the non-identified case (q < p), the exactly identified case (p = q) and the over-identified case (q > p) In

the exactly identiﬁed case, assuming (7) has a unique solution, we have Pearson’s

classical method of moments [28]

Since we are considering the system at equilibrium, the right-hand side of (3)

must equal zero In principle, it is possible to use any polynomial g meeting

cer-tain regularity conditions [12] However, using population moments, i.e., mials ofY is a natural choice that leads to the moment conditions

side in (8) of the second order moments and so forth

We may choose as many moment conditions as there are parameters toexactly identify the estimate However, the inclusion of further information onthe distribution may lead to a more accurate estimation GMM provides a frame-work to deal with over-identiﬁed estimation problems The estimator is given by

Here, W is some positive semi-deﬁnite matrix containing weights for each pair

of moment conditions Under certain regularity conditions [12], this estimator isasymptotically normal and consistent, i.e., the estimator converges in probability

toθ These regularity conditions mostly consist of the existence of expectations

Trang 31

Assum-ing convergence to equilibrium moments the validity of these conditions dependssolely on the propensity functions They hold for mass action and Hill’s propen-sities, as they are smooth functions of the parameters The parameter spaceitself is assumed to be bounded, which in practice can be done by either ﬁxing

a biologically relevant space or assuming a suﬃciently large Θ [12] A furthernecessary condition for normality is thatθ0 is a unique interior point of Θ such

that E[f(Y, θ0)] = 0 However, if we have only samples from the steady statedistribution this property may not hold if one tries to estimate all parameters

at once The reason is that often for a fixed steady-state distribution there is aninfinite number of ergodic Markov chains having this steady-state distributionand the system is not fully identifiable

Although the estimator’s normality holds for all positive semi-deﬁnite

weight-ing matrices, a good choice of W reduces the asymptotic variance of the tor It can be shown, that the asymptotically most eﬃcient matrix W0 is given

estima-by the inverse of limN →∞ Var ( √

N f N(θ0)) [12,23] In case of independent and

identically distributed samples, W0 can be estimated as follows [23]:

ˆ

1

Since this estimate depends onθ0, which is unknown, GMM is usually applied in

a iterative manner: A ﬁrst estimate ˆθ1 is computed using some positive-deﬁnite

weight matrix, such as the identity matrix The estimate ˆθ1 is consistent, butlikely asymptotically ineﬃcient This estimate is then used to approximate (11).The procedure of estimatingθ0 and computing ˆW N can be iteratively applied

until some convergence criterion is met Since W is constant at each iterative

estimation, the solution to (9) can, under some restrictions on the propensities,

be expressed as a linear system (cf Sect.3.1)

Beyond this iterative estimation scheme, the continuously updating GMM

(CUGMM) [15] is a popular variant of the GMM estimator Instead of puting the weight estimate between minimizations, the weight estimation (11)

recom-is substituted into the objective function (10) The resulting estimator is thusgiven by

ˆ

θ CU ,N = arg min

1

In general, the minimization problem (9) can be solved using numerical mization algorithms However, depending on the rate functions, this may not be

Trang 32

opti-Method of Moments for Stochastic Reaction Networks in Equilibrium 21

necessary, because a closed form solution, i.e., a linear system, can be obtainedfor many relevant cases, including mass action kinetics This system results from

the ﬁrst order condition of the minimization ∂Q N( ˆθ N )/∂ θ = 0 which yields [12]

0 = ∂ f N( ˆθ N)

T

We now compute (13) under the condition that propensities are linear inθ and

be the index set of functions α jwhose propensity is dependent onθ Further let

Note, that the sample derivatives ∂ f N /∂θ i are independent ofθ In vector

nota-tion this gives us the linear system A ˆ θ N =b as a solution to (13) where

Analogous to the general iterative scheme, we now solve (15) and use the estimate

to in turn estimate W using (11) In the following discussion we will refer to this

as the closed form GMM (CFGMM) One sees immediately that this method is far more eﬃcient than numerically optimizing Q N

We evaluate the GMM estimation on three chemical reaction networks Samples

of the equilibrium distribution were generated by Gillespie’s stochastic tion algorithm (SSA) [10] and drawn by equidistant sampling after the initialwarm-up period For each case study 107 samples were generated and samplesets of diﬀerent sizes were drawn at random from this large set For each samplesize considered, the estimation procedure was carried out on 100 random samplesets, in order to estimate the variance of the estimator

Trang 33

simula-22 M Backenk¨ohler et al.

We ﬁrst consider Model IV proposed in [9], that describes the interactions of thetumor suppressor p53 This system describes a negative feedback loop betweenp53 and the oncogene Mdm2, where pMdm2 is a Mdm2 precursor [9] We chosethe same parameter values as in [1], that is, k1= 90, k2= 0.002, k3= 1.7, k4= 1.1,

con-stant zero are omitted as well as stoichiometric concon-stants equal to one.

We estimated the four parameters k3, k4, k5, and k6 using the CFGMM as posed in Sect.3.1 Note that α4(·, ·) is linear in k4 We ﬁxed k1 and k2to ensure

pro-identification as well as k7 to avoid a time-consuming numerical optimization.The iterations were continued until either the parameter vector converged or themaximum number of four iterations was reached The plot in Fig.1(left) showsthat the best results were obtained already after the second step for moderateand large sample sizes, while for a small sample size of 100 further iterations werebeneficial It is important to note that for the first iteration, ˆW N is chosen as theidentity matrix such that identical weights are assigned and mixed terms are notconsidered Hence, the general idea of assigning appropriate weights gives signif-icantly more accurate results compared to an estimation with identical weights

Fig 1 p53 System: (left) The normalized parameter deviation ˆ θ N − θ0/θ0 over

GMM iterations for diﬀerent sample sizes Moment conditions up to order two wereused (right) Comparison of the average running time for a single estimation, as a func-tion of the number of parameters (maximal moment order three) and of the maximumorder of moment conditions used (estimation for four parameters), for a sample size

of 100

Trang 34

Fig 2 p53 System: (left) Estimate of k4over GMM iterations (sample size 1000) (right)

order of used moment conditions Results are presented as box plots (whiskers with a

maximum of 1.5 IQR).

In Fig.1 (right) we compare the running times of the CUGMM (using anumerical optimization scheme, the L-BFGS-B algorithm [36]) and of the itera-tion based method The reported times are the average of 100 runs for a singleestimate for diﬀerent moment orders and diﬀerent numbers of estimated parame-ters As we can see, the iteration based method for linear propensities not onlyoutperforms CUGMM, but also is essentially insensitive to including higher ordermoments and to increasing the number of estimated parameters For CUGMM,

an optimization is carried out since (12) is not linear inθ and this optimization

becomes more costly when more moment conditions or parameters are

consid-ered The advantage of CFGMM is that the Jacobian and ¯f N is only computedonce for all iterations of a sample and no numerical optimization is needed

In Fig.2 we show the distribution of the estimate quality for different mum moment orders against (left) different numbers of iterations for CFGMMand (right) for different sample sizes The quality of the results is excellentfor large sample sizes, while increasing the moment order beyond two does notresult in significant improvements or may even (for small sample sizes) signifi-cantly decrease the quality (see Fig.2(right)) This bias may occur if the degree

maxi-of overidentiﬁcation (q − p) is increased too much It can be caused by the

estimation of W and the dependence on the previous estimates and decreases proportional to N −1 [12,26] In our evaluation estimators based on a maximalorder of two and three showed the most reliable performance Moreover, identi-cal weights in the ﬁrst step of the iteration lead to a very high variance of thecorresponding estimator, as shown in Fig.2 (left) In Fig.2 (right) we also seethat, when the number of samples is increased, the variance of the estimatorbecomes small

Trang 35

Model 2 (Explicit Toggle Switch) [ 20 ]

Slow Binding Toggle Switch In the case of low binding-/unbinding rates

several attractor regions can arise that directly correspond to a given DNA state

Here, we use the parameters ρX= 3, δX= 0.5, βX= 10−6 , γX= 3× 10 −4, which are

identical forX = A and X = B During the inference procedure, however, we didnot make use of the information that the parameters are symmetric For theseparameters we get three distinct attractor regions corresponding to either one

of the repressors being bound and both repressors being free2

Currently, our GMM-based approach requires all variables to be observed,which is in general unfeasible for the DNA state One possible solution, whenonly proteins are observed, is to cluster the samples of the proteins using the k-Means algorithm (cf Fig.3(left) for an example of a clustering of samples of thetoggle switch) Then we can infer the state of the latent DNA state by assigningeach cluster to a speciﬁc combination of DNA states and by looking at the clus-ter centroids, as illustrated in Fig.3(left) For low binding-/unbinding rates, theattractors are well separated and this approach is feasible, though more sophis-ticated approaches may be required when clusters overlap After reconstruction

of the state of the unobserved variables, we used the GMM estimation with theclosed form solution for linear propensities Results comparing diﬀerent samplesizes are shown in Fig.3(right) The estimation quality is very good even in thecase of only few samples, provided enough iterations are carried out It is impor-tant to note that for these results, we excluded moment conditions corresponding

to mixed moments involving the state of the gene as their moment conditionshave very similar values Including them leads to severe numerical instabilities(the matrix of the linear system for linear propensities becomes quasi-singular).However, ill-conditioned matrices are detected automatically when their deter-minant is calculated during the computation Then, those entries responsible forthe numerical instabilities can be excluded

Fast Binding Toggle Switch Often, it can be assumed that the repressor

(RA,B ) binding and unbinding (R 5,6 and R 7,8) happens a lot faster than the

which can be neglected if there are no such samples

Trang 36

Fig 3 Slow Switching Toggle Switch: (left) Clustering of a sample (size 100) using

iterations for diﬀerent sample sizes given the toggle switch with k-Means clustering.Moment conditions up to order 3 were used and 4 parameters were estimated

protein production Then, a Michaelis-Menten approximation is possible [20].Therein the time derivative of the repressors is assumed to be zero Applyingthis assumption to the mean-ﬁeld equations of Model2yields the implicit toggleswitch (Model3) In this case, we no longer need the repressor state of eachsample

Model 3 (Implicit Toggle Switch).

The toggle switch exhibits bistability if the binding happens signiﬁcantly faster

than the unbinding, i.e., kA, kB 1 [20] However, the estimation of kA and kB

is inherently diﬃcult because switching between the attractors is a rare event

In this case study, we simulated the explicit model using the symmetric

con-stants βX= 100.0, γX= 50.0, ρX= 0.2 and δX= 0.005, assuming we could observe only the two proteins Thus, we estimated the parameters kX and δX of the

implicit model and ﬁxed ρ X to ensure identiﬁcation Due to non-linear

depen-dency of production rates on kX, we cannot rely anymore on the method for linearpropensities of Sect.3.1, hence we resort to a numerical minimization routine,namely the L-BFGS-B algorithm [36], for the CUGMM scheme The initial guess

was chosen at random from [0, 1] p For detection of unsuccessful optimizations

we used the J-Test statistic [14], which states that under the null hypothesis

of a correctly specified model, N Q N( ˆθ N ) converges to the χ2q−p distribution Aconfidence threshold of 90 % was fixed and the optimization was repeated for at

Trang 37

Fig 4 Fast Binding Toggle Switch: Estimates of parameter kBin relation to the sizes ofthe sample sets and the maximum order of used moment conditions Only the parameter

1.5 IQR).

most four times until the threshold was met The use of numerical optimizationincreased the cost of a single estimate: For a sample size of 10,000 observationsand order 2, the computation takes 1–2 of minutes

In Fig.4, we give statistics on the quality of estimates based on 100 runs ofindependently generated datasets More speciﬁcally, we show how the quality ofestimates varies with the maximum order of moments considered in the methodand with sample size For a ﬁxed sample size, increasing the order from 1 to

2 improves considerably the quality of results Use of higher order momentssigniﬁcantly reduces the variance of the estimator, in particular for the case offew samples

In the context of stochastic chemical kinetics, parameter inference methods areeither based on Bayesian inference [5,32,34] or maximum likelihood estima-tion [2,3,29,31] The advantage of the latter method is that the correspondingestimators are, in a sense, the most informative estimates of unknown parame-ters and have desirable mathematical properties such as unbiasedness, eﬃciency,and normality On the other hand, the computational complexity of maximumlikelihood estimation is high If an analytic solution of the MLE is not possi-ble, then, as a part of the non-linear optimization problem, the likelihood andits derivatives have to be calculated Monte-Carlo simulation has been used toestimate the likelihood [31] During the repeated random sampling it is diﬃcult

to explore those parts of the state space that are unlikely under the current rateparameters Thus, especially if the rates are very diﬀerent from the true parame-ters, many simulation runs are necessary to calculate an accurate approximation

of the likelihood

Therefore methods using computationally far more attractive moment sion approximations have been proposed K¨ugler [18] uses results of the moment

Trang 38

expan-Method of Moments for Stochastic Reaction Networks in Equilibrium 27

closure approximations to apply an ad-hoc weighted least squares estimator ner et al [24] construct a multi-variate normal distribution based on low ordermoments obtained from a moment closure approximation in order to apply MLE.Another approach based on moment closure and MLE relies on a normal distri-bution based on sample means and variances [35]

Mil-All of the aforementioned moment-based inference methods are, in contrast

to the scenario discussed in this paper, based on samples of the transient tion before equilibrium is reached Therefore they have to rely on moment closureapproximations, which is not necessary in our approach based on the equilibriumdistribution Recently, the performance of GMM estimators has been studied fortransient (non-equilibrium) data [21] together with a (hybrid) moment closureapproach

Parameter inference methods for stochastic models of reaction networks requirehuge computational resources The proposed approach based on the general-ized method of moments is based on an adjustment of the statistical moments

of the model in equilibrium and therefore does not require the computation oflikelihoods This makes the approach appealing for complex networks where sto-chastic eﬀects play an important role, since no statistical sampling or numericalintegration of master or moment equations is necessary The proposed approachgives accurate results in seconds when the parameters are linear because a closedform of the solution is available For non-linear parameters, a global optimiza-tion problem must be solved and therefore the inference takes longer but isstill fast compared to other approaches based on the numerical computation oflikelihoods

Our results show that the GMM estimator yields accurate results, whereits variance decreases when moments of higher order are considered We foundthat when moments of order higher than three are included, the results becomeslightly worse in case of the p53 system while for the toggle switch qualityimproved (variance decreased) A general strategy could be to start with asmany cost functions as unknown parameters and increase the maximal orderuntil appropriate statistical tests suggest that higher orders do not lead to animprovement

Currently, a major drawback of the method is that all species must beobserved in order to apply it For populations of at most one individual, theproposed clustering approach circumvents the problem that such species canusually not be observed In general, however, the clustering may not always

be possible and there may be other species that can not be observed To dealwith such cases, we plan to develop an extension of the method that treatsthe moments of such species as (additional) unknown parameters Moreover, wewill investigate how measurement errors could be taken into account within theGMM framework

Trang 39

References

1 Ale, A., Kirk, P., Stumpf, M.: A general moment expansion method for stochastic

2 Andreychenko, A., Mikeev, L., Spieler, D., Wolf, V.: Parameter identiﬁcation formarkov models of biochemical reactions In: Gopalakrishnan, G., Qadeer, S (eds.)CAV 2011 LNCS, vol 6806, pp 83–98 Springer, Heidelberg (2011)

3 Andreychenko, A., Mikeev, L., Spieler, D., Wolf, V.: Approximate maximum lihood estimation for stochastic chemicalkinetics EURASIP J Bioinf Syst Biol

like-9, 1–14 (2012)

4 Andreychenko, A., Mikeev, L., Wolf, V.: Model reconstruction for moment-basedstochastic chemical kinetics ACM Trans Model Comput Simul (TOMACS)

25(2), 12 (2015)

5 Boys, R., Wilkinson, D., Kirkwood, T.: Bayesian inference for a discretely observed

6 Engblom, S.: Computing the moments of high dimensional solutions of the master

7 Fournier, T., Gabriel, J.-P., Mazza, C., Pasquier, J., Galbete, J.L., Mermod, N.:

(2007)

8 Gardner, T.S., Cantor, C.R., Collins, J.J.: Construction of a genetic toggle switch

9 Geva-Zatorsky, N., Rosenfeld, N., et al.: Oscillations and variability in the p53system Mol Syst Biol 2(1) (2006)

10 Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions J Phys

13 Hanley, M.B., Lomas, W., Mittar, D., Maino, V., Park, E.: Detection of low

(2013)

14 Hansen, L.P.: Large sample properties of generalized method of moments

15 Hansen, L.P., Heaton, J., Yaron, A.: Finite-sample properties of some alternative

Identiﬁcation of models of heterogeneous cell populations from population snapshot

17 Isaacs, F.J., Hasty, J., Cantor, C.R., Collins, J.J.: Prediction and measurement of

19 Lee, Y.J., Holzapfel, K.L., Zhu, J., Jameson, S.C., Hogquist, K.A.: Steady-stateproduction of il-4 modulates immunity in mouse strains and is determined by

20 Lipshtat, A., Loinger, A., Balaban, N.Q., Biham, O.: Genetic toggle switch without

Trang 40

stochastic reaction networks ArXiv e-prints, May 2016

22 Mateescu, M., Wolf, V., Didier, F., Henzinger, T.A.: Fast adaptive uniformisation

Uni-versity Press, New York (1999)

24 Milner, P., Gillespie, C.S., Wilkinson, D.J.: Moment closure based parameter

25 Munsky, B., Fox, Z., Neuert, G.: Integrating single-molecule experiments and crete stochastic models to understand heterogeneous gene transcription dynamics

26 Newey, W.K., Smith, R.J.: Higher order properties of gmm and generalized

27 Nishihara, M., Ogura, H., Ueda, N., Tsuruoka, M., Kitabayashi, C., et al.: gp130-STAT3 in T cells directs the development of IL-17+ Th with a minimum

28 Pearson, K.: Contributions to the mathematical theory of evolution Philos Trans

29 Reinker, S., Altman, R.M., Timmer, J.: Parameter estimation in stochastic

30 Singh, A., Hespanha, J.P.: Lognormal moment closures for biochemical reactions.In: 2006 45th IEEE Conference on Decision and Control, pp 2063–2068 IEEE(2006)

31 Tian, T., Xu, S., Gao, J., Burrage, K.: Simulated maximum likelihood method for

32 Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.: Approximate Bayesiancomputation scheme for parameter inference and model selection in dynamical

33 Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.P.H.: ApproximateBayesian computation scheme for parameter inference and model selection in

34 Wilkinson, D.J.: Stochastic Modelling for Systems Biology C & H, Sesser (2006)

35 Zechner, C., Ruess, J., Krenn, P., Pelet, S., Peter, M., Lygeros, J., Koeppl, H.:Moment-based inference predicts bimodality in transient gene expression PNAS

109(21), 8340–8345 (2012)

36 Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: fortran routines for large-scale bound-constrained optimization ACM Trans Math Softw

Định dạng
Số trang	361
Dung lượng	19,32 MB