1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: Identification of small scale biochemical networks based on general type system perturbations pdf

11 452 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Identification of small scale biochemical networks based on general type system perturbations
Tác giả Henning Schmidt, Kwang-Hyun Cho, Elling W. Jacobsen
Trường học Royal Institute of Technology – KTH
Chuyên ngành Biochemical Networks
Thể loại Báo cáo khoa học
Năm xuất bản 2005
Thành phố Stockholm
Định dạng
Số trang 11
Dung lượng 193 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The basis of the method is a linear least-squares estimation, using time-series measurements of concentrations and expression profiles, in which system states and parameter perturbations

Trang 1

on general type system perturbations

Henning Schmidt1, Kwang-Hyun Cho2,3and Elling W Jacobsen1

1 Signals, Sensors and Systems, Royal Institute of Technology – KTH, Stockholm, Sweden

2 College of Medicine, Seoul National University, Chongno-gu, Seoul, Korea

3 Korea Bio-MAX Institute, Seoul National University, Gwanak-gu, Korea

New high-throughput experimental technologies, i.e

for monitoring the expression levels of large gene sets

and the concentrations of metabolites, are evolving

rapidly These data sets contain the information

required to uncover the organization of biological

systems on a genetic, proteomic, and metabolic level

However, in order to realize the translation of data

into a system level understanding of cell functions, methods that can construct quantitative mathematical models from data are needed In particular, determin-ation of the quantitative interactions between the components within and across these levels is an important issue These interactions lead to the notion

of networks that can be represented by weighted,

Keywords

biochemical networks; identification;

Jacobian; time-series measurements

Correspondence

E W Jacobsen, Department of Automatic

Control, Royal Institute of Technology –

KTH, Osquldasvag 10, S-10044 Stockholm,

Sweden

Fax: +46 8790 7329

Tel: +46 8790 7325

E-mail: jacobsen@s3.kth.se

K.-H Cho, College of Medicine, Seoul

National University, Chongno-gu, Seoul,

110–799, Korea, and Korea Bio-MAX

Institute, Seoul National University,

Gwanak-gu, Seoul, 151–818, Korea

Fax: +82 2887 2692

Tel: +82 2887 2650

E-mail: ckh-sb@snu.ac.kr

(Received 22 December 2004, accepted

8 February 2005)

doi:10.1111/j.1742-4658.2005.04605.x

New technologies enable acquisition of large data-sets containing genomic, proteomic and metabolic information that describe the state of a cell These data-sets call for systematic methods enabling relevant information about the inner workings of the cell to be extracted One important issue

at hand is the understanding of the functional interactions between genes, proteins and metabolites We here present a method for identifying the dynamic interactions between biochemical components within the cell, in the vicinity of a steady-state Key features of the proposed method are that

it can deal with data obtained under perturbations of any system param-eter, not only concentrations of specific components, and that the direct effect of the perturbations does not need to be known This is important as concentration perturbations are often difficult to perform in biochemical systems and the specific effects of general type perturbations are usually highly uncertain, or unknown The basis of the method is a linear least-squares estimation, using time-series measurements of concentrations and expression profiles, in which system states and parameter perturbations are estimated simultaneously An important side-effect of also employing esti-mation of the parameter perturbations is that knowledge of the system’s steady-state concentrations, or activities, is not required and that deviations from steady-state prior to the perturbation can be dealt with Time deriva-tives are computed using a zero-order hold discretization, shown to yield significant improvements over the widely used Euler approximation We also show how network interactions with dynamics that are too fast to be captured within the available sampling time can be determined and excluded from the network identification Known and unknown moiety conservation relationships can be processed in the same manner The method requires that the number of samples equals at least the number of network components and, hence, is at present restricted to relatively small-scale networks We demonstrate herein the performance of the method on two small-scale in silico genetic networks

Trang 2

directed graphs, where the nodes correspond to the

biochemical components and the edges, represented as

arrows with weights attached, indicate the direct

quantitative effect that a change in a certain

compo-nent has on another compocompo-nent The weights are in

general nonlinear functions that represent reaction

kinetics Determination of these network structures

will provide insight into the functional relationships

between the involved components, so as to better

understand the functions of biological systems, and

will eventually lead to knowledge concerning how

these systems can be manipulated in order to achieve

a certain desired behavior

Due to the fact that the reaction kinetics in general

are unknown, and because of the large number of

parameters involved, it is in most cases unfeasible to

determine directly the nonlinear weights from

experi-mental data Herein, however, a distinction has to be

made between gene and metabolic networks For

meta-bolic networks, a good initial guess of the network

structure is usually available from databases, such as

KEGG [1], while the structures of gene networks

usu-ally are largely unknown in advance Therefore, the

approach presented in this paper probably has its

greatest value for gene networks, but can be applied

equally well to signaling and metabolic networks

where, e.g model validation and the determination of

new, previously unknown, connections between

inter-mediates is needed

A common approach in structural identification is to

consider the biochemical network behavior around

some steady-state and assume that it behaves linearly

for small deviations from this steady-state [2–4] With

this assumption, the network weights become

con-stants, quantifying the interactions between the

components in the neighborhood of the steady-state

Grouping these constant weights into a matrix yields

an interaction matrix, the Jacobian, which quantifies

the mutual effects of deviations from the steady-state

on the various components of the system

Several approaches to the determination of

interac-tion matrices of biochemical systems have been

pub-lished recently These can be divided roughly into

methods focusing on the determination of the

qualitat-ive structure of the interactions and those aimed at

determining quantitative information about the

inter-actions Ross [5] reviews two approaches to determine

the structure of reaction pathways from time-series

measurements of metabolites and proteins The first

approach is based on small pulses of concentration

changes applied to the different species around a stable

steady-state Depending on the relative behavior of the

measured responses, the considered metabolic pathway

can be determined [6,7] The second approach is based

on correlations between different species when period-ically forcing the system by changing some input spe-cies over time Using correlation and multidimensional scaling analysis, the structure of the considered path-way can be unravelled [8]

Kholodenko et al., Gardner et al and Vance et al propose methods for determining quantitative inter-action matrices based on steady-state responses of per-turbed genetic networks [2,3,9] As the responses to the applied perturbations can often become relatively large

in steady-state, these methods are potentially limited depending on the nonlinearity of the considered sys-tems Furthermore, the fact that Kholodenko et al and Vance et al determine the n2elements of the inter-action matrix from n2 measurements, suggests that the results are potentially sensitive to measurement uncer-tainty [2,9]

In contrast to methods based on steady-state meas-urements, methods based on time-series measurements can cope better with the issue of nonlinearity and measurement uncertainty Monitoring time-series also enables significantly more information to be extracted

in each experiment

A widespread method in the identification of reaction networks using time-series measurements is a least-squares estimation of the Jacobian An interesting method is presented by Mihaliuk et al., in which the idea is to apply perturbations to all components in the network and to determine the Jacobian by measuring only one component, or a linear combination of compo-nents [10] A drawback of this method, for the appli-cation to biological systems, is the fact that the perturbations are assumed to be instant changes of the concentrations of intermediates in the network Further-more, the magnitude of these perturbations is assumed

to be known The use of concentration shift experi-ments, that is, adding specific components to the system

as pulses or steps, is a typical assumption in many pre-viously proposed methods However, while such pertur-bations are mainly feasible in chemical systems, they are usually hard to realize in vitro or in vivo [11]

To overcome the restriction to concentration shift experiments, Sontag et al derive a method based on parameter perturbations, in which a separate experi-ment is performed for each network component so that the perturbation has no direct effect on this compo-nent, that is, the designed perturbation only works indi-rectly through other components in the network [4] However, this requires substantial a priori structural knowledge, and furthermore causes problems with rank deficient measurement matrices (The latter is discussed

in more detail in the supplementary material.)

Trang 3

Herein, we study biochemical networks involving

genes, proteins, and⁄ or metabolites, and consider

determination of the Jacobian using least-squares

esti-mation from time-series measurements obtained in the

vicinity of some steady-state The method is able to

deal with very general types of system perturbations

In this paper we assume the use of constant parameter

perturbations, such as gene knockouts and inhibitor

additions For these type of perturbations, the exact

size as well as the direct effect of the perturbations will

in general be largely unknown We therefore also

con-sider incorporating a determination of the perturbation

itself from the available data However, the method

can be applied equally when pulse perturbations are

realizable for a given network, and in the case of

known or unknown time-varying parameter

perturba-tions Furthermore, it is possible to combine pulse and

parameter perturbations (The use of the method for

other types of perturbations is discussed in the

supplementary material.)

Furthermore, we show that the effect of

unsteady-state initial conditions can be considered an unknown

perturbation and hence can be estimated in the same

manner Due to the latter feature, the proposed

method does not, in contrast to most other methods,

require the system to be in a steady-state when the

perturbations are applied, nor does it require

know-ledge of the steady-state activities and concentrations

However, as the method is based on the assumption

that the network is behaving linearly around the same

steady-state for all experiments, the initial states of all

experiments should, in general, not be too far from

the steady-state at which the Jacobian is to be

deter-mined

Network modelling based on time-series data

requires estimation of time derivatives of the states

These are commonly calculated through the use of

some Euler type finite difference approximation

Herein we employ a representation of time derivatives,

commonly used in systems theory, that avoids any

approximations, thereby leading to significantly

improved estimation results Finally, we address the

issue of using dynamics that are significantly faster

than the sampling time, and show how such

interac-tions can be identified and extracted from the data-sets

prior to the network identification Thus, a reduced

network, with the fast dynamics replaced by algebraic

relationships, can be identified As we show, the same

approach is also applicable in the case of moiety

con-servations in metabolic and signaling networks, and

thus it is possible to determine the Jacobian expressed

only in terms of the independent intermediates of the

network

The proposed method will in general uncover only a phenomenological interaction topology of the network

as not all intermediates can be measured That is, we assume that only the measured components are part of the network to be modelled This is a common assumption often used [4] This assumption is relaxed somewhat by Mihaliuk et al [10] However, they assume that all components are known and are poss-ible to perturb The case of unknown and unmeasura-ble components is, of course, a highly relevant topic, but outside the scope of this paper

The outline of the paper is as follows We first pre-sent the problem formulation and briefly outline the method used for network identification from measure-ment samples The proposed method is then applied

to in silico models of two small scale gene networks Following the conclusions, we present a detailed description of the method for least-squares identifica-tion of the Jacobian, and discuss the impact of the sampling time and moiety conservations

Results and Discussion

Problem formulation

We consider metabolic reactions, signaling networks, and gene networks that can be described by a system

of nonlinear differential equations of the form in Eqn (1)

_

where, x¼ [x1,…,xn]T is the state vector containing the concentrations, activities, or expressions, of all components in the network and p¼ [p1,…,pq]T is a vector of adjustable parameters within the considered biological system, such as kinetic rate constants and genes whose expression levels can be perturbed The vector valued function f determines the dynamics of the biochemical network given the states and parame-ters The definition in Eqn (1) also incorporates the typical form of kinetic models, that is, _s¼ NV(s,p), [12] In cases of small molecular concentrations and⁄ or low levels of diffusion, partial differential and stochas-tic equations may be required, but this is outside the scope of this paper

Due to largely unknown reaction kinetics, and the large number of involved parameters, it is in general unfeasible to determine the nonlinear functions fi(x,p) using a ‘top-down’ approach, that is, determining all reaction mechanisms and involved parameters, such as rate constants, from measured responses of the per-turbed network We therefore consider the system (Eqn 1) in the neighborhood of some steady-state

Trang 4

(x0,p0) and assume that it behaves linearly for small

variations around this state This assumption allows us

to represent the system as a linear time invariant

sys-tem (Eqn 2)

D _xðtÞ ¼ @f =@xjx0 ;p 0DxðtÞ þ @f =@pjx0 ;p 0DpðtÞ

where Dx(t) ¼ x(t)) x0 and Dp(t) ¼ p(t)) p0 denote

deviations from the considered steady-state Equation

(2) is obtained by truncating the Taylor expansion of

Eqn (1) after the linear terms The constant matrix A

is the Jacobian matrix of the nonlinear system and

rep-resents the network connectivity and the interactions

between the network components around the

consid-ered steady-state For example, in the case of gene

net-works, a zero element Aijindicates that the expression

level of gene j does not directly affect the expression of

gene i Positive and negative elements within A imply

activation and inhibition, respectively, of the

corres-ponding components

The aim here is to determine the Jacobian, or

inter-action matrix A, based on time-series measurements

We assume that the measurements are collected using

a fixed sampling time DT, and that at each sample the

concentrations, or activity levels, of all n components

in x are measured Furthermore we assume that the

perturbations are constant between two sampling

instants Due to the discrete nature of the

measure-ments, we reformulate the continuous time system

(Eqn 2) as a discrete time system (Eqn 3)

Dxkþ1¼ AdDxkþ BdDpk; ð3Þ

where Dxk¼ Dx(kDT) and Dpk¼ Dp(kDT) Using

Eqn (3) we will, in the following, show how an

esti-mation Ad for the discrete time Jacobian Ad can be

determined An estimation A for the continuous time

Jacobian A can then be calculated through a reverse

transformation to continuous time using the Euler

approximation or the, so called, zero-order hold

dis-cretization

The commonly used Euler approximation for the

time derivatives of the states implies replacing the

con-tinuous derivatives by the finite difference D _x(t) ¼

(Dxk+1) Dxk)⁄ DT The reverse transformation from

discrete time Ad then yields the following

approxima-tion for the continuous time Jacobian (Eqn 4)

Aeuler¼ 1

The Euler discretization method is approximate, and

the goodness of the approximation is in general highly

sensitive to the choice of the sampling time DT This

‘approximate’ relationship between the continuous and discrete time models can be avoided completely under the assumption that the perturbations Dp are constant between sampling instants Then, an analytical solution for Dx(t) can be derived and hence also the exact rela-tionship between Dxk+1 and Dxk This leads to the zero-order hold discretization [13] (Eqn 5)

Azoh¼ 1

where logm(Ad) denotes the matrix logarithm Note there are no approximations involved in this transfor-mation provided the parameter perturbations are con-stant between samples (A more detailed discussion of the zero-order hold discretization and a comparison to the Euler discretization can be found in part 1 of the supplementary material.)

Having determined an estimation Ad for Ad, an esti-mation for the continuous time Jacobian A can be obtained using the above transformations We will demonstrate that Eqn (5), in general, leads to a signifi-cantly better estimation of the Jacobian than Eqn (4)

In-Silico four gene network example

We consider a genetic network containing four genes, which has been used previously ([2,4]) as a test case for identification of interaction matrices and Jaco-bians The motivation behind choosing such a small scale network for illustration of the method is to keep the exposition complete and reasonably com-pact (The model equations and parameters are given

in part 6 of the supplementary material.) The nom-inal Jacobian at the considered steady-state is given

by Eqn (6)

2:31 2:80 14:46 0

2 6 4

3 7

and the corresponding network is illustrated in Fig 1 From system identification theory it is well known that a good estimation result requires a sufficient exci-tation of the system (In particular, part 3 of the sup-plementary material, shows that perturbations have to

be chosen such that the complete space of the network states is perturbed.) In the following we consider time-series data obtained from constant parameter per-turbation experiments The perturbed parameters cor-respond to the maximal enzyme rates involved in the transcription of the genes (part 6 of the supplementary material) Furthermore, the magnitudes, as well as the direct effects of the perturbations, are assumed to be

Trang 5

unknown and thus not used in the identification

algorithm The estimations of ^Ad are obtained using

absolute measurements and applying Eqn 15 Unless

otherwise stated, the zero-order hold discretization is

used in the following

Estimation of the Jacobian

We first performed an in silico experiment in which the

maximal enzyme rate corresponding to the

transcrip-tion of gene number one is perturbed by 1% The

sampling time is chosen asDT ¼ 0.01 h, and we collect

six samples, the minimal number required for

estima-ting the Jacobian when the size of the perturbation is

unknown The first sample is taken one time-step after

the perturbation has been applied to the system It

should be noted that the sampling time is chosen

sufficiently small to enable the fastest dynamics of the

system to be captured

Applying the method proposed above, we obtain the

following estimate for the Jacobian:

^

6:45 2:90 0:01 2:52

0:00 8:17 0:00 3:93

2:31 2:77 14:40 0:01

0:00 0:09 10:22 9:77

2

6

4

3 7 5

Except for the (4,2) element, the estimated Jacobian is

very close to the nominal Jacobian, the largest

relat-ive error in the nonzero elements being less than 1%

This is not surprising, as the perturbation to the

sys-tem was chosen so small that the nonlinearity of the

system played a relatively modest role The fact that

the (4,2) element is relatively poorly estimated is

probably explained by more severe nonlinear effects

for this specific relationship with the chosen

param-eter perturbation Note that the nonlinear effects in

general will depend on the parameter chosen for

per-turbation

In real experiments, inhibition efficiencies by small interfering RNA (siRNA) or chemical inhibitors are much higher than in the experiment above A more realistic experimental setting is to assume perturbations

of 50%, and to do several experiments, in which differ-ent parameters are perturbed We performed four experiments and combine the obtained measurements

In each experiment, one of the maximal enzyme rates

of the four genes is perturbed by 50%, and the mini-mum required number of samples are taken – three samples in each experiment The sampling time is

DT ¼ 0.01 h The result of the estimation is given by:

^

6:45 2:69 0:01 2:59 0:00 8:14 0:02 4:06

2:19 2:90 14:38 0:01 0:07 0:08 8:51 9:72

2 6 4

3 7 5

Comparing with the ‘true’ linear Jacobian in Eqn (6),

we see that the network has been identified with rea-sonable accuracy, the largest relative error in the nonzero elements being less than 20% That we obtain such good results even for relatively large perturbations is partly explained by the fact that measurements from different experiments have been combined This allows for different perturbations of the system and a reduction in the number of samples required in each experiment The use of different per-turbed parameters in each experiment leads to a bet-ter excitation of the system, which is beneficial for the estimation result The reduction in the time span

of each experiment reduces the deviation from the initial state, thereby reducing the effects of nonlinea-rities

The above results demonstrate that it is theoretic-ally possible to determine the Jacobian from one experiment only, but that in practice usually more than one experiment will be preferable How to choose the perturbations in an optimal way is out of the scope of this paper and a topic for future work Instead we will, in the experiments below, consistently perform four experiments (In each experiment the transcription rate of a different gene is perturbed using the parameters given in part 6 of the supple-mentary materials.)

Effect of discretization method

In order to illustrate the importance of the method employed for determination of time derivatives, we herein perform estimations for different sampling times

DT and perturbation magnitudes, using two discretiza-tion methods (Eqns 4 and 5) The relative estimadiscretiza-tion error, e, is calculated as (Eqn 7)

1 3

2 4

Fig 1 Structure of the four-gene network The interconnections

represent the direct interactions between the genes An arrow

indi-cates a positive effect on the gene transcription, and a bar indiindi-cates

a negative (inhibitory) effect.

Trang 6

e¼ 1 N

Xn i¼1

Xn j¼1

jaijj

aij¼

^

A ij A ij

Aij ; Aij6¼ 0 0; Ai;j¼ 0

;

where N denotes the number of nonzero elements in

the nominal Jacobian A The results are shown in

Table 1 In the table, we also show the error

intro-duced in the derivatives by using the Euler

approxima-tion The error is determined using the nominal

Jacobian A and is computed as

gðDTÞ ¼jje

ADT ðI þ ADTÞjjsum jjeADTjjsum : The results clearly demonstrate that the zero-order

hold discretization in Eqn (5) leads to a considerable

improvement in the network identification, compared

to the commonly used Euler approximation

Impact of measurement uncertainty

We consider herein the effect of measurement

tainty on the estimation of the Jacobian The

uncer-tainty is simulated in silico by adding noise to the

absolute measurements xkas follows

xknoise¼ xkþ W  x0 Here, W denotes a diagonal matrix in which the entries

are uniformly distributed random variables between

)0.02 and 0.02 These values may appear small

com-pared to the uncertainty in realistic biological

experi-ments However, the noise levels relative to the

measured deviations Dx correspond to over 50% for

some samples This should also be seen in relation to

the fact that measurements of gene expressions are

often carried out in a relative manner, corresponding

to the measurement ofDx

The sampling time is DT ¼ 0.01 h as before, and considered magnitudes of parameter perturbations are

20, 50 and 100% The results for different numbers of measured time-steps per experiment can be seen in Fig 2 In order to display the mean value and the standard deviation of the relative estimation error in the nonzero elements of the Jacobian, one hundred Monte-Carlo simulations have been conducted at each point The results show that the relative estimation error (Eqn 7) and its standard deviation decrease for increasing numbers of measured time-steps It is inter-esting to note that the estimation error also decreases for increasing perturbation magnitudes This is explained by the fact that the signal-to-noise ratio becomes more improved for larger perturbations, which is reasonable also in practice This serves to illustrate that in general, there will exist a trade-off, in terms of effects of measurement uncertainty on the one hand and the effects of nonlinearities on the other hand, when choosing the size of parameter perturba-tions

Impact of sampling time

In order to illustrate the problems occurring in net-works with dynamic modes that are too fast to capture with the available sampling time, we considered identi-fication of a network consisting of five genes The net-work is a modification of the four gene netnet-work used

in the previous example, obtained by adding a fifth

Table 1 Comparison of estimation errors We compared the

esti-mation errors obtained for different sampling times, discretization

methods, and magnitudes of the parameter perturbation (Euler),

the Euler approximation; (ZOH), a zero-order hold discretization.

The last column displays the relative error introduced by using the

Euler approximation.

DT

Error (%)

50% perturbation 10% perturbation Approximation error

e (ZOH) e (Euler) e (ZOH) e (Euler) g(DT)

10 1

10 2

10 3

10 4

Measured time-steps / experiment

20% perturbation

50% perturbation

100% perturbation

21% Error

Fig 2 Mean value and standard deviation of the relative estimation error (7) in the nonzero elements of the Jacobian, obtained from

100 Monte-Carlo simulations.

Trang 7

gene with relatively fast dynamics (The equations,

parameters, and the nominal Jacobian are given in

part seven of the supplementary material.) The

struc-ture of the five gene network is shown in Fig 3

In the considered network, the degradation rate of

the mRNA of gene five has been chosen to be much

faster than the degradation rates for the other

mRNAs, thereby introducing a relatively fast dynamic

mode The sampling time we employ is too large to

capture this fast dynamic mode

Data for the estimation of the Jacobian of the

sys-tem is generated in silico in the following way: (a) five

experiments, in each a 50% repression of one of the

genes is simulated In in silico implementations, this

corresponds to a parameter perturbation of )50% in

the maximal enzyme rate We stress that the

magni-tude of the perturbation is assumed unknown when

we apply the identification algorithm; (b) in each

experiment the mRNA concentrations, corresponding

to all five genes, are measured at four consecutive

time-steps The first sample is taken one time-step

after the perturbation is applied to the system; (c) the

perturbation is applied while the system is not in the

steady-state In in silico environments, this is

simula-ted by introducing the perturbation while all mRNA

concentrations are 5% below their steady-state values

This reflects the fact that a biological system in

gen-eral will not be in a steady-state when perturbations

are applied in a real experiment Furthermore, the

steady-state is assumed to be unknown and thus not

used in the identification; (d) the sampling time is

chosen to beDT ¼ 0.01 h

Following the approach discussed above, we collect

the measurements and find that the smallest singular

value of the measurement matrix M is r1¼ 0.00026,

which is relatively close to 0 compared to the other

singular values Thus, we conclude that the chosen

sampling time was too large with respect to the fastest dynamics of the system

Using the zero-order hold discretization to deter-mine A from Ad, ignoring the fact that some modes have not been captured in the data, the following result is obtained:

5:27 4:29 0:03 2:49 5:57

1:19 2:27 14:32 0:02 9:48

19:32 15:60 9:50 9:72 92:46 30:52 25:11 0:03 0:07 149:4

2 6 6 4

3 7 7

5:

As can be verified easily, this result does not capture the structure of the network in Fig 3 correctly For example, the estimate of the Jacobian shows a large direct effect of gene 1 on gene 4, which is incorrect The singular vector u1, corresponding to r1, shows that the fifth component in x, that is, the mRNA con-centration corresponding to gene five, is the most dominant with respect to the singularity of the meas-urement matrix Using the approach outlined in the Method section, neglecting the measurements of the fifth component, the following Jacobian for the reduced network is obtained:

A1;2;3;4¼

6:43 3:34 0:01 2:49

0:02 8:01 0:03 3:76

0:78 3:89 14:28 0:02

0:04 0:17 9:12 9:72

2 6 4

3 7 5

The identified Jacobian is close to the true Jacobian for the reduced network, with relative errors in all nonzero elements being smaller than 20%, and all zero elements being identified as close to zero It is import-ant to point out that the Jacobian of the reduced net-work is not supposed to be equal to the Jacobian of the four gene network in the previous example The dynamics of the reduced Jacobian also correspond rea-sonably well to the slow dynamics of the five gene net-work, as can be seen from the computed eigenvalues

in Table 2 The structure of the identified reduced Jacobian reflects well the structure of the network in Fig 3 when gene five is taken out For instance, gene one directly affects gene three when the dynamics of gene five are neglected, or assumed to be infinitely fast

The results presented above show that it is indeed possible to obtain a useful identification result even in the case that fast dynamics are not captured correctly Moreover, one can obtain the information on which components are involved in the fast reactions, and their static relationship with the other components of the network

5

1 3

2 4

Fig 3 Structure of the five-gene network The interconnections

determine the direct interactions between the genes An arrow

indicates a positive effect on the gene transcription, and a bar

indi-cates a negative effect.

Trang 8

In this paper we have discussed the qualitative and

quantitative identification of network interactions

based on time-series measurements obtained from

per-turbation experiments and least-squares estimation

The proposed method is equally applicable to

identifi-cation of gene, protein, and metabolic networks Due

to the fact that the method requires at least n +1

sam-ples, where n is the number of network components,

the method is relatively costly for large scale networks,

and thus so far limited to the identification of smaller

networks However, as high throughput techniques are

evolving fast, it is probable that high-frequency

samp-ling can be obtained in the near future Thus, wet-lab

based experimental verification of the proposed

method remains as future study

The proposed approach has several advantages over

other approaches: the steady-state of the system does

not need to be known nor achieved prior to the

pertur-bation; general type perturbations can be used;

dynam-ics relatively fast compared to the sampling time can

be detected and removed from the identification; linear

dependencies due to moiety conservations can be

iden-tified and processed; samples from any number of

experiments can be combined in the identification, as

long as these experiments have been carried out

around the same steady-state

We have shown that measurement uncertainty can

have a large effect on the identification result Possible

solutions for uncertainty and noise are to collect and

use more measurement data, and to make use of

avail-able a priori structural knowledge In addition, methods

from identification theory on estimating and filtering

noise can be incorporated Furthermore, the

signal-to-noise ratio can be increased by choosing larger

pert-urbations However, the latter can lead to increased

nonlinear effects and a trade-off between the two

effects, therefore, has to be taken into consideration

Instead of using the widely accepted Euler

discretiza-tion, we have shown that the zero-order hold

discreti-zation, in general, results in a significantly improved

estimation and should be used in all methods aimed at

identifying dynamic biochemical networks

We have not discussed explicitly the effect of auto-regulation of biological systems by self-negative feed-back For example, certain components might be regu-lated by homeostatic effects and a response to perturbations might not be visible in the measurement data However, under the assumption that these effects are significantly slower than the sampling time

it is reasonable to assume that the proposed method will lead to an acceptable result Furthermore, we have only considered the case of the estimation around a stable steady-state of the network In the case of oscillations, created within the network or affecting the network, one would have to deal with time-varying Jacobians, which is outside the scope of this paper

Experimental procedures

Method

In this section, we present a method for the determination

min-imization of a least-squares criterion Some related issues, such as the choice of the sampling time and how to deal with moiety conservations in metabolic and signaling net-works is also discussed Least-squares based estimation is used widely within many areas of science and engineering,

an important reason being that it is applicable even in the case where no statistical information about the measure-ments are available [14]; this is typically the case with meas-urement data from biological systems

Excitation of a biochemical system is usually performed

as a constant parameter perturbation e.g gene knockouts

or the alteration of gene transcription rates Especially

in vivo, it is not possible to quantify the applied tions, meaning that the magnitude of the applied perturba-tions is unknown Furthermore, for gene networks, it is usually also unknown which components the perturbations affect in a direct manner Previously proposed methods often assume this information to be available, at least parti-ally Sontag et al., for example, assume that the magnitude

of the perturbations is unknown but the genes that are directly affected by the perturbed parameters are known [4] In the following we consider both the magnitude and the direct effects of perturbations to be unknown To keep the exposition relatively simple, we assume, however, that

(However, in part 2 of the supplementary material we show how this assumption can be relaxed to take time varying parameter perturbations, known or unknown, and pulse perturbations into account.)

We assume that the network response to the applied perturbations is sufficiently small such that, in the time range of the measurements, the system can be regarded as

Table 2 Comparison between the eigenvalues of the nominal

Jacobian of the five gene Anetwork and the eigenvalues of the

esti-mated reduced Jacobian ^ A 1;2;3;4 i, ffiffiffiffiffiffiffi

1 p

A (nominal) )571.7 )13.28 ± i 3.16 )6.93 )5.15

^

A 1;2;3;4 (estimated) None )13.27 ± i 3.36 )7.12 )4.80

Trang 9

linear Equation (3) then describes the behavior of the

system (Eqn 1) for variations around the steady-state

corresponding term by a constant unknown perturbation

Du, as follows:

1

network to this perturbation is measured at the

con-centrations of the network components relative to the

steady-state concentrations obtained at time step k > 0

Measuring the response of the network until time-step

n+2, where n corresponds to the number of involved

components in the network, and arranging these

concen-tration vectors into matrices we obtain the following

mat-rix version of Eqn (8):

measure-ment matrix M on the right hand side in (Eqn 9) can be

from:

Invertibility of M can be guaranteed under a controllability

condition from linear systems theory (see proof in part 3 of

experi-ment

where the system is linear and no measurement uncertainty

is present In the case of noisy measurements and a

unknowns can be obtained It is then also important to

measure and use more time-steps than the minimum

required In the case of more than n + 2 measured

time-steps, the matrices M and R are constructed as above, but with more columns, corresponding to the measurements in the additional time-steps Thus, M will no longer be a square matrix and the pseudoinverse needs to be used instead:

As the identification of the overall network structure requires a relatively large number of measurement samples,

we consider combining data from several experiments It has to be pointed out that these experiments should be per-formed around the same steady-state, as only then an

Small variations of the initial state around the steady-state are admissible, as long as the system still can be seen as behaving linearly

If r experiments are performed, the result matrix R can

be constructed as Eqn (11):

m i;Dxi

experiment i Note that the measurements are assumed to

The measurement matrix M is constructed as Eqn (13):

2 6 6 4

3 7 7

corres-ponding to the i-th experiment, shifted by one time-step

constructed as Eqn (14):

The 1 and 0 elements in Eqn (13) denote row vectors with unity and zero entries, respectively These vectors have the same width as the corresponding measurement

in Eqn (9) However, in the case of several experiments,

the i-th experiment can be different from the perturbation

in the other experiments, and thus for each experiment one perturbation vector needs to be taken into account (The construction of the matrices M and R is illustrated for a simple example in part 4 of the supplementary material.)

Trang 10

Estimations for the discrete time Jacobian Ad and the

from:

In the case of combined experiments, the total number of

columns of M should at least equal n + r For the

con-struction of R and M at least n + 2r measured time-steps

are required Note that Eqn (15) involves the pseudoinverse

An important side effect of incorporating estimation of

the applied perturbations using measurement data, is that

also nonzero, or unsteady-state, initial conditions can be

handled This follows from the fact that initial unknown

deviations from the steady-state in fact can be represented

as an unknown perturbation Thus, the proposed method

unknown To see this, Eqn (8) is reformulated using the

a lumped perturbation, consisting of the unknown

steady-state of the system does not need to be known In

order to use this approach, it is sufficient to replace the

k¼ xi

where the only difference lies in the fact that now the

Note, however, that the method is still based on the

assumption that the network is behaving linearly around

the same steady-state for all experiments Hence, the initial

states in all experiments should in general not be too far

from the steady-state at which the Jacobian is to be

deter-mined

The advantage of the approach proposed above is that

very general types of perturbations can be applied to the

system, and that information about the perturbations is not

required (As mentioned earlier in the text, in part 2 of the

supplementary material we relax the assumption of

con-stant parameter perturbations.)

Choice of sampling time and dealing

with moiety conservations

Biochemical networks generally contain dynamic modes

with a wide range of time constants In order to identify

the full Jacobian from time-series measurements, the

fastest dynamics are captured Due to experimental limita-tions, it may, however, not be possible to realize the required sampling time Furthermore, as the dynamics of the system in general are unknown in advance, it is hard to determine the required sampling time in advance Herein

we will consider how interactions with dynamics signifi-cantly faster than the sampling time can be identified a pri-ori from the collected data, and how these interactions then can be extracted from the data prior to identification of the network Jacobian We also show that the same approach can be used to deal with moiety conservations within the considered network

Assume the fastest mode of the linearized system (Eqn 2)

will essentially disappear between samples This implies that there exists an almost linear dependency between the meas-urements of the sampled states, and hence that the measurement matrix M will be (almost) rank deficient In general – and we assume the perturbations fulfil the

be equal to the number of modes with time-constant signifi-cantly smaller than the sampling time The linear depend-ency, corresponding to the interactions with dynamics significantly faster than the sampling time, can be deter-mined directly from the collected measurements using a singular value decomposition (SVD) of the measurement

singular directions

A possible solution to the problem with too slow samp-ling is to identify the components taking part in the fast dynamics, that is, components corresponding to nonzero

them for each fast mode Any component can in principle

be chosen, but a reasonable choice is to neglect the one being most dominant with respect to the singularity, that is,

Repeating this procedure for every singular value of M close to zero, will lead to a measurement matrix with full rank, allowing determination of the Jacobian of the net-work, reduced by one component for each fast mode The presence of moiety conservations in metabolic or signaling networks has the same effect on the estimation of

than the sampling time In other words, some of the concen-trations of the intermediates in the networks will be linearly dependent, resulting in a measurement matrix without full row rank Thus, the same approach as presented above for dealing with linear dependencies due to a too large sampling time, can be used to determine the components involved in

Ngày đăng: 07/03/2014, 17:20

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm