The basis of the method is a linear least-squares estimation, using time-series measurements of concentrations and expression profiles, in which system states and parameter perturbations
Trang 1on general type system perturbations
Henning Schmidt1, Kwang-Hyun Cho2,3and Elling W Jacobsen1
1 Signals, Sensors and Systems, Royal Institute of Technology – KTH, Stockholm, Sweden
2 College of Medicine, Seoul National University, Chongno-gu, Seoul, Korea
3 Korea Bio-MAX Institute, Seoul National University, Gwanak-gu, Korea
New high-throughput experimental technologies, i.e
for monitoring the expression levels of large gene sets
and the concentrations of metabolites, are evolving
rapidly These data sets contain the information
required to uncover the organization of biological
systems on a genetic, proteomic, and metabolic level
However, in order to realize the translation of data
into a system level understanding of cell functions, methods that can construct quantitative mathematical models from data are needed In particular, determin-ation of the quantitative interactions between the components within and across these levels is an important issue These interactions lead to the notion
of networks that can be represented by weighted,
Keywords
biochemical networks; identification;
Jacobian; time-series measurements
Correspondence
E W Jacobsen, Department of Automatic
Control, Royal Institute of Technology –
KTH, Osquldasvag 10, S-10044 Stockholm,
Sweden
Fax: +46 8790 7329
Tel: +46 8790 7325
E-mail: jacobsen@s3.kth.se
K.-H Cho, College of Medicine, Seoul
National University, Chongno-gu, Seoul,
110–799, Korea, and Korea Bio-MAX
Institute, Seoul National University,
Gwanak-gu, Seoul, 151–818, Korea
Fax: +82 2887 2692
Tel: +82 2887 2650
E-mail: ckh-sb@snu.ac.kr
(Received 22 December 2004, accepted
8 February 2005)
doi:10.1111/j.1742-4658.2005.04605.x
New technologies enable acquisition of large data-sets containing genomic, proteomic and metabolic information that describe the state of a cell These data-sets call for systematic methods enabling relevant information about the inner workings of the cell to be extracted One important issue
at hand is the understanding of the functional interactions between genes, proteins and metabolites We here present a method for identifying the dynamic interactions between biochemical components within the cell, in the vicinity of a steady-state Key features of the proposed method are that
it can deal with data obtained under perturbations of any system param-eter, not only concentrations of specific components, and that the direct effect of the perturbations does not need to be known This is important as concentration perturbations are often difficult to perform in biochemical systems and the specific effects of general type perturbations are usually highly uncertain, or unknown The basis of the method is a linear least-squares estimation, using time-series measurements of concentrations and expression profiles, in which system states and parameter perturbations are estimated simultaneously An important side-effect of also employing esti-mation of the parameter perturbations is that knowledge of the system’s steady-state concentrations, or activities, is not required and that deviations from steady-state prior to the perturbation can be dealt with Time deriva-tives are computed using a zero-order hold discretization, shown to yield significant improvements over the widely used Euler approximation We also show how network interactions with dynamics that are too fast to be captured within the available sampling time can be determined and excluded from the network identification Known and unknown moiety conservation relationships can be processed in the same manner The method requires that the number of samples equals at least the number of network components and, hence, is at present restricted to relatively small-scale networks We demonstrate herein the performance of the method on two small-scale in silico genetic networks
Trang 2directed graphs, where the nodes correspond to the
biochemical components and the edges, represented as
arrows with weights attached, indicate the direct
quantitative effect that a change in a certain
compo-nent has on another compocompo-nent The weights are in
general nonlinear functions that represent reaction
kinetics Determination of these network structures
will provide insight into the functional relationships
between the involved components, so as to better
understand the functions of biological systems, and
will eventually lead to knowledge concerning how
these systems can be manipulated in order to achieve
a certain desired behavior
Due to the fact that the reaction kinetics in general
are unknown, and because of the large number of
parameters involved, it is in most cases unfeasible to
determine directly the nonlinear weights from
experi-mental data Herein, however, a distinction has to be
made between gene and metabolic networks For
meta-bolic networks, a good initial guess of the network
structure is usually available from databases, such as
KEGG [1], while the structures of gene networks
usu-ally are largely unknown in advance Therefore, the
approach presented in this paper probably has its
greatest value for gene networks, but can be applied
equally well to signaling and metabolic networks
where, e.g model validation and the determination of
new, previously unknown, connections between
inter-mediates is needed
A common approach in structural identification is to
consider the biochemical network behavior around
some steady-state and assume that it behaves linearly
for small deviations from this steady-state [2–4] With
this assumption, the network weights become
con-stants, quantifying the interactions between the
components in the neighborhood of the steady-state
Grouping these constant weights into a matrix yields
an interaction matrix, the Jacobian, which quantifies
the mutual effects of deviations from the steady-state
on the various components of the system
Several approaches to the determination of
interac-tion matrices of biochemical systems have been
pub-lished recently These can be divided roughly into
methods focusing on the determination of the
qualitat-ive structure of the interactions and those aimed at
determining quantitative information about the
inter-actions Ross [5] reviews two approaches to determine
the structure of reaction pathways from time-series
measurements of metabolites and proteins The first
approach is based on small pulses of concentration
changes applied to the different species around a stable
steady-state Depending on the relative behavior of the
measured responses, the considered metabolic pathway
can be determined [6,7] The second approach is based
on correlations between different species when period-ically forcing the system by changing some input spe-cies over time Using correlation and multidimensional scaling analysis, the structure of the considered path-way can be unravelled [8]
Kholodenko et al., Gardner et al and Vance et al propose methods for determining quantitative inter-action matrices based on steady-state responses of per-turbed genetic networks [2,3,9] As the responses to the applied perturbations can often become relatively large
in steady-state, these methods are potentially limited depending on the nonlinearity of the considered sys-tems Furthermore, the fact that Kholodenko et al and Vance et al determine the n2elements of the inter-action matrix from n2 measurements, suggests that the results are potentially sensitive to measurement uncer-tainty [2,9]
In contrast to methods based on steady-state meas-urements, methods based on time-series measurements can cope better with the issue of nonlinearity and measurement uncertainty Monitoring time-series also enables significantly more information to be extracted
in each experiment
A widespread method in the identification of reaction networks using time-series measurements is a least-squares estimation of the Jacobian An interesting method is presented by Mihaliuk et al., in which the idea is to apply perturbations to all components in the network and to determine the Jacobian by measuring only one component, or a linear combination of compo-nents [10] A drawback of this method, for the appli-cation to biological systems, is the fact that the perturbations are assumed to be instant changes of the concentrations of intermediates in the network Further-more, the magnitude of these perturbations is assumed
to be known The use of concentration shift experi-ments, that is, adding specific components to the system
as pulses or steps, is a typical assumption in many pre-viously proposed methods However, while such pertur-bations are mainly feasible in chemical systems, they are usually hard to realize in vitro or in vivo [11]
To overcome the restriction to concentration shift experiments, Sontag et al derive a method based on parameter perturbations, in which a separate experi-ment is performed for each network component so that the perturbation has no direct effect on this compo-nent, that is, the designed perturbation only works indi-rectly through other components in the network [4] However, this requires substantial a priori structural knowledge, and furthermore causes problems with rank deficient measurement matrices (The latter is discussed
in more detail in the supplementary material.)
Trang 3Herein, we study biochemical networks involving
genes, proteins, and⁄ or metabolites, and consider
determination of the Jacobian using least-squares
esti-mation from time-series measurements obtained in the
vicinity of some steady-state The method is able to
deal with very general types of system perturbations
In this paper we assume the use of constant parameter
perturbations, such as gene knockouts and inhibitor
additions For these type of perturbations, the exact
size as well as the direct effect of the perturbations will
in general be largely unknown We therefore also
con-sider incorporating a determination of the perturbation
itself from the available data However, the method
can be applied equally when pulse perturbations are
realizable for a given network, and in the case of
known or unknown time-varying parameter
perturba-tions Furthermore, it is possible to combine pulse and
parameter perturbations (The use of the method for
other types of perturbations is discussed in the
supplementary material.)
Furthermore, we show that the effect of
unsteady-state initial conditions can be considered an unknown
perturbation and hence can be estimated in the same
manner Due to the latter feature, the proposed
method does not, in contrast to most other methods,
require the system to be in a steady-state when the
perturbations are applied, nor does it require
know-ledge of the steady-state activities and concentrations
However, as the method is based on the assumption
that the network is behaving linearly around the same
steady-state for all experiments, the initial states of all
experiments should, in general, not be too far from
the steady-state at which the Jacobian is to be
deter-mined
Network modelling based on time-series data
requires estimation of time derivatives of the states
These are commonly calculated through the use of
some Euler type finite difference approximation
Herein we employ a representation of time derivatives,
commonly used in systems theory, that avoids any
approximations, thereby leading to significantly
improved estimation results Finally, we address the
issue of using dynamics that are significantly faster
than the sampling time, and show how such
interac-tions can be identified and extracted from the data-sets
prior to the network identification Thus, a reduced
network, with the fast dynamics replaced by algebraic
relationships, can be identified As we show, the same
approach is also applicable in the case of moiety
con-servations in metabolic and signaling networks, and
thus it is possible to determine the Jacobian expressed
only in terms of the independent intermediates of the
network
The proposed method will in general uncover only a phenomenological interaction topology of the network
as not all intermediates can be measured That is, we assume that only the measured components are part of the network to be modelled This is a common assumption often used [4] This assumption is relaxed somewhat by Mihaliuk et al [10] However, they assume that all components are known and are poss-ible to perturb The case of unknown and unmeasura-ble components is, of course, a highly relevant topic, but outside the scope of this paper
The outline of the paper is as follows We first pre-sent the problem formulation and briefly outline the method used for network identification from measure-ment samples The proposed method is then applied
to in silico models of two small scale gene networks Following the conclusions, we present a detailed description of the method for least-squares identifica-tion of the Jacobian, and discuss the impact of the sampling time and moiety conservations
Results and Discussion
Problem formulation
We consider metabolic reactions, signaling networks, and gene networks that can be described by a system
of nonlinear differential equations of the form in Eqn (1)
_
where, x¼ [x1,…,xn]T is the state vector containing the concentrations, activities, or expressions, of all components in the network and p¼ [p1,…,pq]T is a vector of adjustable parameters within the considered biological system, such as kinetic rate constants and genes whose expression levels can be perturbed The vector valued function f determines the dynamics of the biochemical network given the states and parame-ters The definition in Eqn (1) also incorporates the typical form of kinetic models, that is, _s¼ NV(s,p), [12] In cases of small molecular concentrations and⁄ or low levels of diffusion, partial differential and stochas-tic equations may be required, but this is outside the scope of this paper
Due to largely unknown reaction kinetics, and the large number of involved parameters, it is in general unfeasible to determine the nonlinear functions fi(x,p) using a ‘top-down’ approach, that is, determining all reaction mechanisms and involved parameters, such as rate constants, from measured responses of the per-turbed network We therefore consider the system (Eqn 1) in the neighborhood of some steady-state
Trang 4(x0,p0) and assume that it behaves linearly for small
variations around this state This assumption allows us
to represent the system as a linear time invariant
sys-tem (Eqn 2)
D _xðtÞ ¼ @f =@xjx0 ;p 0DxðtÞ þ @f =@pjx0 ;p 0DpðtÞ
where Dx(t) ¼ x(t)) x0 and Dp(t) ¼ p(t)) p0 denote
deviations from the considered steady-state Equation
(2) is obtained by truncating the Taylor expansion of
Eqn (1) after the linear terms The constant matrix A
is the Jacobian matrix of the nonlinear system and
rep-resents the network connectivity and the interactions
between the network components around the
consid-ered steady-state For example, in the case of gene
net-works, a zero element Aijindicates that the expression
level of gene j does not directly affect the expression of
gene i Positive and negative elements within A imply
activation and inhibition, respectively, of the
corres-ponding components
The aim here is to determine the Jacobian, or
inter-action matrix A, based on time-series measurements
We assume that the measurements are collected using
a fixed sampling time DT, and that at each sample the
concentrations, or activity levels, of all n components
in x are measured Furthermore we assume that the
perturbations are constant between two sampling
instants Due to the discrete nature of the
measure-ments, we reformulate the continuous time system
(Eqn 2) as a discrete time system (Eqn 3)
Dxkþ1¼ AdDxkþ BdDpk; ð3Þ
where Dxk¼ Dx(kDT) and Dpk¼ Dp(kDT) Using
Eqn (3) we will, in the following, show how an
esti-mation Ad for the discrete time Jacobian Ad can be
determined An estimation A for the continuous time
Jacobian A can then be calculated through a reverse
transformation to continuous time using the Euler
approximation or the, so called, zero-order hold
dis-cretization
The commonly used Euler approximation for the
time derivatives of the states implies replacing the
con-tinuous derivatives by the finite difference D _x(t) ¼
(Dxk+1) Dxk)⁄ DT The reverse transformation from
discrete time Ad then yields the following
approxima-tion for the continuous time Jacobian (Eqn 4)
Aeuler¼ 1
The Euler discretization method is approximate, and
the goodness of the approximation is in general highly
sensitive to the choice of the sampling time DT This
‘approximate’ relationship between the continuous and discrete time models can be avoided completely under the assumption that the perturbations Dp are constant between sampling instants Then, an analytical solution for Dx(t) can be derived and hence also the exact rela-tionship between Dxk+1 and Dxk This leads to the zero-order hold discretization [13] (Eqn 5)
Azoh¼ 1
where logm(Ad) denotes the matrix logarithm Note there are no approximations involved in this transfor-mation provided the parameter perturbations are con-stant between samples (A more detailed discussion of the zero-order hold discretization and a comparison to the Euler discretization can be found in part 1 of the supplementary material.)
Having determined an estimation Ad for Ad, an esti-mation for the continuous time Jacobian A can be obtained using the above transformations We will demonstrate that Eqn (5), in general, leads to a signifi-cantly better estimation of the Jacobian than Eqn (4)
In-Silico four gene network example
We consider a genetic network containing four genes, which has been used previously ([2,4]) as a test case for identification of interaction matrices and Jaco-bians The motivation behind choosing such a small scale network for illustration of the method is to keep the exposition complete and reasonably com-pact (The model equations and parameters are given
in part 6 of the supplementary material.) The nom-inal Jacobian at the considered steady-state is given
by Eqn (6)
A¼
2:31 2:80 14:46 0
2 6 4
3 7
and the corresponding network is illustrated in Fig 1 From system identification theory it is well known that a good estimation result requires a sufficient exci-tation of the system (In particular, part 3 of the sup-plementary material, shows that perturbations have to
be chosen such that the complete space of the network states is perturbed.) In the following we consider time-series data obtained from constant parameter per-turbation experiments The perturbed parameters cor-respond to the maximal enzyme rates involved in the transcription of the genes (part 6 of the supplementary material) Furthermore, the magnitudes, as well as the direct effects of the perturbations, are assumed to be
Trang 5unknown and thus not used in the identification
algorithm The estimations of ^Ad are obtained using
absolute measurements and applying Eqn 15 Unless
otherwise stated, the zero-order hold discretization is
used in the following
Estimation of the Jacobian
We first performed an in silico experiment in which the
maximal enzyme rate corresponding to the
transcrip-tion of gene number one is perturbed by 1% The
sampling time is chosen asDT ¼ 0.01 h, and we collect
six samples, the minimal number required for
estima-ting the Jacobian when the size of the perturbation is
unknown The first sample is taken one time-step after
the perturbation has been applied to the system It
should be noted that the sampling time is chosen
sufficiently small to enable the fastest dynamics of the
system to be captured
Applying the method proposed above, we obtain the
following estimate for the Jacobian:
^
A¼
6:45 2:90 0:01 2:52
0:00 8:17 0:00 3:93
2:31 2:77 14:40 0:01
0:00 0:09 10:22 9:77
2
6
4
3 7 5
Except for the (4,2) element, the estimated Jacobian is
very close to the nominal Jacobian, the largest
relat-ive error in the nonzero elements being less than 1%
This is not surprising, as the perturbation to the
sys-tem was chosen so small that the nonlinearity of the
system played a relatively modest role The fact that
the (4,2) element is relatively poorly estimated is
probably explained by more severe nonlinear effects
for this specific relationship with the chosen
param-eter perturbation Note that the nonlinear effects in
general will depend on the parameter chosen for
per-turbation
In real experiments, inhibition efficiencies by small interfering RNA (siRNA) or chemical inhibitors are much higher than in the experiment above A more realistic experimental setting is to assume perturbations
of 50%, and to do several experiments, in which differ-ent parameters are perturbed We performed four experiments and combine the obtained measurements
In each experiment, one of the maximal enzyme rates
of the four genes is perturbed by 50%, and the mini-mum required number of samples are taken – three samples in each experiment The sampling time is
DT ¼ 0.01 h The result of the estimation is given by:
^
A¼
6:45 2:69 0:01 2:59 0:00 8:14 0:02 4:06
2:19 2:90 14:38 0:01 0:07 0:08 8:51 9:72
2 6 4
3 7 5
Comparing with the ‘true’ linear Jacobian in Eqn (6),
we see that the network has been identified with rea-sonable accuracy, the largest relative error in the nonzero elements being less than 20% That we obtain such good results even for relatively large perturbations is partly explained by the fact that measurements from different experiments have been combined This allows for different perturbations of the system and a reduction in the number of samples required in each experiment The use of different per-turbed parameters in each experiment leads to a bet-ter excitation of the system, which is beneficial for the estimation result The reduction in the time span
of each experiment reduces the deviation from the initial state, thereby reducing the effects of nonlinea-rities
The above results demonstrate that it is theoretic-ally possible to determine the Jacobian from one experiment only, but that in practice usually more than one experiment will be preferable How to choose the perturbations in an optimal way is out of the scope of this paper and a topic for future work Instead we will, in the experiments below, consistently perform four experiments (In each experiment the transcription rate of a different gene is perturbed using the parameters given in part 6 of the supple-mentary materials.)
Effect of discretization method
In order to illustrate the importance of the method employed for determination of time derivatives, we herein perform estimations for different sampling times
DT and perturbation magnitudes, using two discretiza-tion methods (Eqns 4 and 5) The relative estimadiscretiza-tion error, e, is calculated as (Eqn 7)
1 3
2 4
Fig 1 Structure of the four-gene network The interconnections
represent the direct interactions between the genes An arrow
indi-cates a positive effect on the gene transcription, and a bar indiindi-cates
a negative (inhibitory) effect.
Trang 6e¼ 1 N
Xn i¼1
Xn j¼1
jaijj
aij¼
^
A ij A ij
Aij ; Aij6¼ 0 0; Ai;j¼ 0
;
where N denotes the number of nonzero elements in
the nominal Jacobian A The results are shown in
Table 1 In the table, we also show the error
intro-duced in the derivatives by using the Euler
approxima-tion The error is determined using the nominal
Jacobian A and is computed as
gðDTÞ ¼jje
ADT ðI þ ADTÞjjsum jjeADTjjsum : The results clearly demonstrate that the zero-order
hold discretization in Eqn (5) leads to a considerable
improvement in the network identification, compared
to the commonly used Euler approximation
Impact of measurement uncertainty
We consider herein the effect of measurement
tainty on the estimation of the Jacobian The
uncer-tainty is simulated in silico by adding noise to the
absolute measurements xkas follows
xknoise¼ xkþ W x0 Here, W denotes a diagonal matrix in which the entries
are uniformly distributed random variables between
)0.02 and 0.02 These values may appear small
com-pared to the uncertainty in realistic biological
experi-ments However, the noise levels relative to the
measured deviations Dx correspond to over 50% for
some samples This should also be seen in relation to
the fact that measurements of gene expressions are
often carried out in a relative manner, corresponding
to the measurement ofDx
The sampling time is DT ¼ 0.01 h as before, and considered magnitudes of parameter perturbations are
20, 50 and 100% The results for different numbers of measured time-steps per experiment can be seen in Fig 2 In order to display the mean value and the standard deviation of the relative estimation error in the nonzero elements of the Jacobian, one hundred Monte-Carlo simulations have been conducted at each point The results show that the relative estimation error (Eqn 7) and its standard deviation decrease for increasing numbers of measured time-steps It is inter-esting to note that the estimation error also decreases for increasing perturbation magnitudes This is explained by the fact that the signal-to-noise ratio becomes more improved for larger perturbations, which is reasonable also in practice This serves to illustrate that in general, there will exist a trade-off, in terms of effects of measurement uncertainty on the one hand and the effects of nonlinearities on the other hand, when choosing the size of parameter perturba-tions
Impact of sampling time
In order to illustrate the problems occurring in net-works with dynamic modes that are too fast to capture with the available sampling time, we considered identi-fication of a network consisting of five genes The net-work is a modification of the four gene netnet-work used
in the previous example, obtained by adding a fifth
Table 1 Comparison of estimation errors We compared the
esti-mation errors obtained for different sampling times, discretization
methods, and magnitudes of the parameter perturbation (Euler),
the Euler approximation; (ZOH), a zero-order hold discretization.
The last column displays the relative error introduced by using the
Euler approximation.
DT
Error (%)
50% perturbation 10% perturbation Approximation error
e (ZOH) e (Euler) e (ZOH) e (Euler) g(DT)
10 1
10 2
10 3
10 4
Measured time-steps / experiment
20% perturbation
50% perturbation
100% perturbation
21% Error
Fig 2 Mean value and standard deviation of the relative estimation error (7) in the nonzero elements of the Jacobian, obtained from
100 Monte-Carlo simulations.
Trang 7gene with relatively fast dynamics (The equations,
parameters, and the nominal Jacobian are given in
part seven of the supplementary material.) The
struc-ture of the five gene network is shown in Fig 3
In the considered network, the degradation rate of
the mRNA of gene five has been chosen to be much
faster than the degradation rates for the other
mRNAs, thereby introducing a relatively fast dynamic
mode The sampling time we employ is too large to
capture this fast dynamic mode
Data for the estimation of the Jacobian of the
sys-tem is generated in silico in the following way: (a) five
experiments, in each a 50% repression of one of the
genes is simulated In in silico implementations, this
corresponds to a parameter perturbation of )50% in
the maximal enzyme rate We stress that the
magni-tude of the perturbation is assumed unknown when
we apply the identification algorithm; (b) in each
experiment the mRNA concentrations, corresponding
to all five genes, are measured at four consecutive
time-steps The first sample is taken one time-step
after the perturbation is applied to the system; (c) the
perturbation is applied while the system is not in the
steady-state In in silico environments, this is
simula-ted by introducing the perturbation while all mRNA
concentrations are 5% below their steady-state values
This reflects the fact that a biological system in
gen-eral will not be in a steady-state when perturbations
are applied in a real experiment Furthermore, the
steady-state is assumed to be unknown and thus not
used in the identification; (d) the sampling time is
chosen to beDT ¼ 0.01 h
Following the approach discussed above, we collect
the measurements and find that the smallest singular
value of the measurement matrix M is r1¼ 0.00026,
which is relatively close to 0 compared to the other
singular values Thus, we conclude that the chosen
sampling time was too large with respect to the fastest dynamics of the system
Using the zero-order hold discretization to deter-mine A from Ad, ignoring the fact that some modes have not been captured in the data, the following result is obtained:
A¼
5:27 4:29 0:03 2:49 5:57
1:19 2:27 14:32 0:02 9:48
19:32 15:60 9:50 9:72 92:46 30:52 25:11 0:03 0:07 149:4
2 6 6 4
3 7 7
5:
As can be verified easily, this result does not capture the structure of the network in Fig 3 correctly For example, the estimate of the Jacobian shows a large direct effect of gene 1 on gene 4, which is incorrect The singular vector u1, corresponding to r1, shows that the fifth component in x, that is, the mRNA con-centration corresponding to gene five, is the most dominant with respect to the singularity of the meas-urement matrix Using the approach outlined in the Method section, neglecting the measurements of the fifth component, the following Jacobian for the reduced network is obtained:
A1;2;3;4¼
6:43 3:34 0:01 2:49
0:02 8:01 0:03 3:76
0:78 3:89 14:28 0:02
0:04 0:17 9:12 9:72
2 6 4
3 7 5
The identified Jacobian is close to the true Jacobian for the reduced network, with relative errors in all nonzero elements being smaller than 20%, and all zero elements being identified as close to zero It is import-ant to point out that the Jacobian of the reduced net-work is not supposed to be equal to the Jacobian of the four gene network in the previous example The dynamics of the reduced Jacobian also correspond rea-sonably well to the slow dynamics of the five gene net-work, as can be seen from the computed eigenvalues
in Table 2 The structure of the identified reduced Jacobian reflects well the structure of the network in Fig 3 when gene five is taken out For instance, gene one directly affects gene three when the dynamics of gene five are neglected, or assumed to be infinitely fast
The results presented above show that it is indeed possible to obtain a useful identification result even in the case that fast dynamics are not captured correctly Moreover, one can obtain the information on which components are involved in the fast reactions, and their static relationship with the other components of the network
5
1 3
2 4
Fig 3 Structure of the five-gene network The interconnections
determine the direct interactions between the genes An arrow
indicates a positive effect on the gene transcription, and a bar
indi-cates a negative effect.
Trang 8In this paper we have discussed the qualitative and
quantitative identification of network interactions
based on time-series measurements obtained from
per-turbation experiments and least-squares estimation
The proposed method is equally applicable to
identifi-cation of gene, protein, and metabolic networks Due
to the fact that the method requires at least n +1
sam-ples, where n is the number of network components,
the method is relatively costly for large scale networks,
and thus so far limited to the identification of smaller
networks However, as high throughput techniques are
evolving fast, it is probable that high-frequency
samp-ling can be obtained in the near future Thus, wet-lab
based experimental verification of the proposed
method remains as future study
The proposed approach has several advantages over
other approaches: the steady-state of the system does
not need to be known nor achieved prior to the
pertur-bation; general type perturbations can be used;
dynam-ics relatively fast compared to the sampling time can
be detected and removed from the identification; linear
dependencies due to moiety conservations can be
iden-tified and processed; samples from any number of
experiments can be combined in the identification, as
long as these experiments have been carried out
around the same steady-state
We have shown that measurement uncertainty can
have a large effect on the identification result Possible
solutions for uncertainty and noise are to collect and
use more measurement data, and to make use of
avail-able a priori structural knowledge In addition, methods
from identification theory on estimating and filtering
noise can be incorporated Furthermore, the
signal-to-noise ratio can be increased by choosing larger
pert-urbations However, the latter can lead to increased
nonlinear effects and a trade-off between the two
effects, therefore, has to be taken into consideration
Instead of using the widely accepted Euler
discretiza-tion, we have shown that the zero-order hold
discreti-zation, in general, results in a significantly improved
estimation and should be used in all methods aimed at
identifying dynamic biochemical networks
We have not discussed explicitly the effect of auto-regulation of biological systems by self-negative feed-back For example, certain components might be regu-lated by homeostatic effects and a response to perturbations might not be visible in the measurement data However, under the assumption that these effects are significantly slower than the sampling time
it is reasonable to assume that the proposed method will lead to an acceptable result Furthermore, we have only considered the case of the estimation around a stable steady-state of the network In the case of oscillations, created within the network or affecting the network, one would have to deal with time-varying Jacobians, which is outside the scope of this paper
Experimental procedures
Method
In this section, we present a method for the determination
min-imization of a least-squares criterion Some related issues, such as the choice of the sampling time and how to deal with moiety conservations in metabolic and signaling net-works is also discussed Least-squares based estimation is used widely within many areas of science and engineering,
an important reason being that it is applicable even in the case where no statistical information about the measure-ments are available [14]; this is typically the case with meas-urement data from biological systems
Excitation of a biochemical system is usually performed
as a constant parameter perturbation e.g gene knockouts
or the alteration of gene transcription rates Especially
in vivo, it is not possible to quantify the applied tions, meaning that the magnitude of the applied perturba-tions is unknown Furthermore, for gene networks, it is usually also unknown which components the perturbations affect in a direct manner Previously proposed methods often assume this information to be available, at least parti-ally Sontag et al., for example, assume that the magnitude
of the perturbations is unknown but the genes that are directly affected by the perturbed parameters are known [4] In the following we consider both the magnitude and the direct effects of perturbations to be unknown To keep the exposition relatively simple, we assume, however, that
(However, in part 2 of the supplementary material we show how this assumption can be relaxed to take time varying parameter perturbations, known or unknown, and pulse perturbations into account.)
We assume that the network response to the applied perturbations is sufficiently small such that, in the time range of the measurements, the system can be regarded as
Table 2 Comparison between the eigenvalues of the nominal
Jacobian of the five gene Anetwork and the eigenvalues of the
esti-mated reduced Jacobian ^ A 1;2;3;4 i, ffiffiffiffiffiffiffi
1 p
A (nominal) )571.7 )13.28 ± i 3.16 )6.93 )5.15
^
A 1;2;3;4 (estimated) None )13.27 ± i 3.36 )7.12 )4.80
Trang 9linear Equation (3) then describes the behavior of the
system (Eqn 1) for variations around the steady-state
corresponding term by a constant unknown perturbation
Du, as follows:
1
network to this perturbation is measured at the
con-centrations of the network components relative to the
steady-state concentrations obtained at time step k > 0
Measuring the response of the network until time-step
n+2, where n corresponds to the number of involved
components in the network, and arranging these
concen-tration vectors into matrices we obtain the following
mat-rix version of Eqn (8):
measure-ment matrix M on the right hand side in (Eqn 9) can be
from:
Invertibility of M can be guaranteed under a controllability
condition from linear systems theory (see proof in part 3 of
experi-ment
where the system is linear and no measurement uncertainty
is present In the case of noisy measurements and a
unknowns can be obtained It is then also important to
measure and use more time-steps than the minimum
required In the case of more than n + 2 measured
time-steps, the matrices M and R are constructed as above, but with more columns, corresponding to the measurements in the additional time-steps Thus, M will no longer be a square matrix and the pseudoinverse needs to be used instead:
As the identification of the overall network structure requires a relatively large number of measurement samples,
we consider combining data from several experiments It has to be pointed out that these experiments should be per-formed around the same steady-state, as only then an
Small variations of the initial state around the steady-state are admissible, as long as the system still can be seen as behaving linearly
If r experiments are performed, the result matrix R can
be constructed as Eqn (11):
m i;Dxi
experiment i Note that the measurements are assumed to
The measurement matrix M is constructed as Eqn (13):
2 6 6 4
3 7 7
corres-ponding to the i-th experiment, shifted by one time-step
constructed as Eqn (14):
The 1 and 0 elements in Eqn (13) denote row vectors with unity and zero entries, respectively These vectors have the same width as the corresponding measurement
in Eqn (9) However, in the case of several experiments,
the i-th experiment can be different from the perturbation
in the other experiments, and thus for each experiment one perturbation vector needs to be taken into account (The construction of the matrices M and R is illustrated for a simple example in part 4 of the supplementary material.)
Trang 10Estimations for the discrete time Jacobian Ad and the
from:
In the case of combined experiments, the total number of
columns of M should at least equal n + r For the
con-struction of R and M at least n + 2r measured time-steps
are required Note that Eqn (15) involves the pseudoinverse
An important side effect of incorporating estimation of
the applied perturbations using measurement data, is that
also nonzero, or unsteady-state, initial conditions can be
handled This follows from the fact that initial unknown
deviations from the steady-state in fact can be represented
as an unknown perturbation Thus, the proposed method
unknown To see this, Eqn (8) is reformulated using the
a lumped perturbation, consisting of the unknown
steady-state of the system does not need to be known In
order to use this approach, it is sufficient to replace the
k¼ xi
where the only difference lies in the fact that now the
Note, however, that the method is still based on the
assumption that the network is behaving linearly around
the same steady-state for all experiments Hence, the initial
states in all experiments should in general not be too far
from the steady-state at which the Jacobian is to be
deter-mined
The advantage of the approach proposed above is that
very general types of perturbations can be applied to the
system, and that information about the perturbations is not
required (As mentioned earlier in the text, in part 2 of the
supplementary material we relax the assumption of
con-stant parameter perturbations.)
Choice of sampling time and dealing
with moiety conservations
Biochemical networks generally contain dynamic modes
with a wide range of time constants In order to identify
the full Jacobian from time-series measurements, the
fastest dynamics are captured Due to experimental limita-tions, it may, however, not be possible to realize the required sampling time Furthermore, as the dynamics of the system in general are unknown in advance, it is hard to determine the required sampling time in advance Herein
we will consider how interactions with dynamics signifi-cantly faster than the sampling time can be identified a pri-ori from the collected data, and how these interactions then can be extracted from the data prior to identification of the network Jacobian We also show that the same approach can be used to deal with moiety conservations within the considered network
Assume the fastest mode of the linearized system (Eqn 2)
will essentially disappear between samples This implies that there exists an almost linear dependency between the meas-urements of the sampled states, and hence that the measurement matrix M will be (almost) rank deficient In general – and we assume the perturbations fulfil the
be equal to the number of modes with time-constant signifi-cantly smaller than the sampling time The linear depend-ency, corresponding to the interactions with dynamics significantly faster than the sampling time, can be deter-mined directly from the collected measurements using a singular value decomposition (SVD) of the measurement
singular directions
A possible solution to the problem with too slow samp-ling is to identify the components taking part in the fast dynamics, that is, components corresponding to nonzero
them for each fast mode Any component can in principle
be chosen, but a reasonable choice is to neglect the one being most dominant with respect to the singularity, that is,
Repeating this procedure for every singular value of M close to zero, will lead to a measurement matrix with full rank, allowing determination of the Jacobian of the net-work, reduced by one component for each fast mode The presence of moiety conservations in metabolic or signaling networks has the same effect on the estimation of
than the sampling time In other words, some of the concen-trations of the intermediates in the networks will be linearly dependent, resulting in a measurement matrix without full row rank Thus, the same approach as presented above for dealing with linear dependencies due to a too large sampling time, can be used to determine the components involved in