1. Trang chủ
  2. » Ngoại Ngữ

REVERSE ENGINEERING AND AUTOMATIC SYNTHESIS OF METABOLIC PATHWAYS FROM OBSERVED DATA USING GENETIC PROGRAMMING SYMPOSIUM ON COMPUTATIONAL DISCOVERY OF COMMUNICATABLE KNOWLEGDE

69 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Reverse Engineering And Automatic Synthesis Of Metabolic Pathways From Observed Data Using Genetic Programming
Tác giả John R. Koza, William Mydlowec, Guido Lanza, Jessen Yu, Martin A. Keane
Trường học Stanford University
Chuyên ngành Biomedical Informatics
Thể loại symposium
Năm xuất bản 2001
Thành phố Stanford
Định dạng
Số trang 69
Dung lượng 2,49 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

THE TOPOLOGY OF A NETWORK OFCHEMICAL REACTIONS  the total number of reactions in the network,  the number of substrates consumed by eachreaction,  the number of products produced by e

Trang 1

REVERSE ENGINEERING AND AUTOMATIC SYNTHESIS OF METABOLIC PATHWAYS FROM

OBSERVED DATA USING GENETIC

PROGRAMMING

SYMPOSIUM ON COMPUTATIONAL DISCOVERY OF COMMUNICATABLE

KNOWLEGDE

SUNDAY MARCH 25, 2001

CSLI STANFORD

John R Koza

Stanford Biomedical Informatics, Department of Medicine

Department of Electrical Engineering Stanford University, Stanford, California koza@stanford.edu

Trang 2

Martin A Keane

Econometrics Inc., Chicago, Illinois

makeane@ix.netcom.com

FROM CHAPTER 1 OF GENETIC

PROGRAMMING III: DARWINIAN INVENTION AND PROBLEM SOLVING (KOZA, BENNETT,

ANDRE, KEANE 1999)

"Most techniques of artificial intelligence, machinelearning, neural networks, adaptive systems,reinforcement learning, or automated logic employspecialized structures in lieu of ordinary computerprograms

"These surrogate structures include if-then productionrules, Horn clauses, decision trees, Bayesian networks,propositional logic, formal grammars, binary decisiondiagrams, frames, conceptual clusters, concept sets,numerical weight vectors (for neural nets), vectors ofnumerical coefficients for polynomials or other fixedexpressions (for adaptive systems), genetic classifiersystem rules, fixed tables of values (as in reinforcementlearning), or linear chromosome strings (as in theconventional genetic algorithm)

Trang 3

FROM CHAPTER 1 OF GENETIC

PROGRAMMING III  CONTINUED

"Tellingly, except in unusual situations, the world'sseveral million computer programmers do not use any ofthese surrogate structures for writing computer programs

"Instead, for five decades, human programmers havepersisted in writing computer programs that intermix amultiplicity of types of computations (e.g., arithmetic andlogical) operating on a multiplicity of types of variables(e.g., integer, floating-point, and Boolean) Programmershave persisted in using internal memory to store theresults of intermediate calculations in order to avoidrepeating the calculation on each occasion when the result

is needed They have persisted in using iterations andrecursions They have similarly persisted for five decades

in organizing useful sequences of operations into reusablegroups (subroutines) so that they avoid reinventing thewheel on each occasion when they need a particularsequence of operations Moreover, they have persisted inpassing parameters to subroutines so that they can reusetheir subroutines with different instantiations of values.And, they have persisted in organizing their subroutinesinto hierarchies

Trang 4

FROM CHAPTER 1 OF GENETIC

PROGRAMMING III  CONTINUED

"All of the above tools of ordinary computerprogramming have been in use since the beginning of theera of electronic computers in the l940s Significantly,

none has fallen into disuse by human programmers Yet,

in spite of the manifest utility of these everyday tools ofcomputer programming, these tools are largely absentfrom existing techniques of automated machine learning,neural networks, artificial intelligence, adaptive systems,reinforcement learning, and automated logic

"On one of the relatively rare occasions when one ortwo of these everyday tools of computer programming isavailable within the context of one of these automatedtechniques, they are usually available only in a hobbledand barely recognizable form

"In contrast, genetic programming draws on the fullarsenal of tools that human programmers have founduseful for five decades It conducts its search for asolution to a problem overtly in the space of computerprograms

"Our view is that computer programs are the bestrepresentation of computer programs We believe that the

Trang 5

search for a solution to the challenge of getting computers

to solve problems without explicitly programming themshould be conducted in the space of computer programs

Trang 6

THE TOPOLOGY OF A NETWORK OF

CHEMICAL REACTIONS

 the total number of reactions in the network,

 the number of substrate(s) consumed by eachreaction,

 the number of product(s) produced by each reaction,

 the pathways supplying the substrate(s) (either fromexternal sources or other reactions in the network) to eachreaction,

 the pathways dispersing each reaction's product(s)(either to other reactions or external outputs), and

 an indication of which enzyme (if any) acts as acatalyst for a particular reaction

THE SIZING FOR A NETWORK OF CHEMICAL

REACTIONS

 all the numerical values associated with the network(e.g., the rates of each reaction)

Trang 7

OUR APPROACH

 establishing a representation for chemical networksinvolving symbolic expressions (S-expressions) andprogram trees that can be progressively bred (andimproved) by means of genetic programming,

 converting each individual program tree in thepopulation into an analog electrical circuit representingthe network of chemical reactions,

 obtaining the behavior of the individual network ofchemical reactions by simulating the correspondingelectrical circuit,

 defining a fitness measure that measures how wellthe behavior of an individual network matches theobserved time-domain data concerning concentrations offinal product substance(s), and

 using the fitness measure to enable geneticprogramming to breed an improved population ofprogram trees

Trang 8

FIVE DIFFERENT REPRESENTATIONS

Reaction Network: The blocks represent chemical

reactions and the directed lines represent flows ofsubstances between reactions

Program Tree: A network of chemical reactions

can also be represented as a program tree whose internalpoints are functions and external points are terminals.This representation enables genetic programming to breed

a population of programs in a search for a network ofchemical reactions whose time-domain behaviorconcerning concentrations of final product substance(s)closely matches observed data

Symbolic Expression: A network of chemical

reactions can also be represented as a symbolic expression(S-expression) in the style of the LISP programminglanguage This representation is used internally by the run

of genetic programming

System of Non-Linear Differential Equations: A

network of chemical reactions can also be represented as

a system of non-linear differential equations

Analog Electrical Circuit: A network of chemical

reactions can also be represented as an analog electrical

Trang 9

circuit Representation of a network of chemical reactions

as a circuit facilitates simulation of the network's domain behavior

Trang 10

time-ILLUSTRATIVE PROBLEM NO 1 

 Acylglycerol lipase (EC3.1.1.23), and

 Triacylglycerol lipase (EC3.1.1.3)

 2 intermediate substances

 sn-Glycerol-3-Phosphate (C00093)

 Monoacyl-glycerol (C01885)

Trang 11

ILLUSTRATIVE PROBLEM NO 1 

PHOSPHOLIPID CYCLE INTERESTING TOPOLOGY

 2 instances of a bifurcation point (where onesubstance is distributed to two different reactions)

 External supply of fatty acid (C00162) is

 glycerol (C00116) is externally supplied and

 glycerol (C00116) is produced by the reaction

catalyzed by Glycerol-1-phosphatase(EC3.1.3.21)

 1 internal feedback loop (in which a substance isboth consumed and produced)

 Glycerol (C00116) is consumed (in part) by the

reaction catalyzed by Glycerol kinase(EC2.7.1.30)

Trang 12

 This reaction, in turn, produces an intermediate

(C00093)

 This intermediate substance is, in turn, consumed

by the reaction catalyzed by phosphatase (EC3.1.3.21)

Glycerol-1- That reaction, in turn, produces glycerol

(C00116)

Trang 13

FOUR REACTIONS FROM THE PHOSPHOLIPID CYCLE

Trang 14

ILLUSTRATIVE PROBLEM NO 2 

SYNTHESIS AND DEGRADATION OF KETONE

 3-oxoacid CoA-transferase (EC 2.8.3.5)

4.1.3.5)

 Hydroxymethylglutaryl-CoA lyase (EC 4.1.3.4)

 1 intermediate substance

 INT-1

Trang 15

LLUSTRATIVE PROBLEM NO 2  SYNTHESIS AND DEGRADATION OF KETONE BODIES

3 NOTEWORHTY TOPOLOGICAL FEATURES

1 instance of a bifurcation point (where one

substance is distributed to two different reactions)

 Acetoacetyl-CoA

2 accumulation points

 Acetyl-CoA is an externally supplied substance

and is produced by the reaction catalyzed byHydroxymethylglutaryl-CoA lyase (EC4.1.3.4)

 Acetoacetate is produced by the reaction

catalyzed by 3-oxoacid CoA-transferase (EC2.8.3.5) and by the reaction catalyzed byHydroxymethylglutaryl-CoA lyase (EC4.1.3.4)

1 internal feedback loop (in which a substance is

both consumed and produced)  Acetyl-CoA is consumed

by the reaction catalyzed by Hydroxymethylglutaryl-CoAsynthase (EC 4.1.3.5)

Trang 16

 This reaction, in turn, produces an intermediate

substance (INT-1)

 This intermediate substance is, in turn, consumed

by the reaction catalyzed byHydroxymethylglutaryl-CoA lyase (EC4.1.3.4)

 That reaction, in turn, produces Acetyl-CoA

Trang 17

THREE REACTIONS INVOLVED IN THE SYNTHESIS AND DEGRADATION OF KETONE

BODIES

Trang 18

GENETIC PROGRAMMING

(1) Generate an initial population of compositions(typically random) of the functions and terminals of theproblem

(2) Iteratively perform the following substeps(referred to herein as a generation) on the population ofprograms until the termination criterion has been satisfied:

(A) Execute each program in the population and

assign it a fitness value using the fitnessmeasure

(B) Create a new population of programs by

applying the following operations Theoperations are applied to program(s) selectedfrom the population with a probability based

on fitness (with reselection allowed)

(i) Reproduction(ii) Crossover (Sexual recombination)(iii) Mutation

(iv) Architecture-altering operations (3) Designate the individual program that is

identified by result designation (e.g., the so-far individual) as the result of the run ofgenetic programming This result may be a

Trang 19

best-solution (or an approximate best-solution) to theproblem

Trang 20

ARCHITECTURE-ALTERING OPERATIONS

 The individual programs that are evolved by geneticprogramming are typically multi-branch programsconsisting of one or more result-producing branches andzero, one, or more automatically defined functions(subroutines)

The architecture of such a multi-branch program

involves

 the total number of automatically defined

functions,

 the number of arguments (if any) possessed by

each automatically defined function, and

 if there is more than one automatically defined

function in a program, the nature of thehierarchical references (including recursivereferences), if any, allowed among theautomatically defined functions

 Architecture-altering operations enable geneticprogramming to automatically determine

 the number of automatically defined functions,

Trang 21

 the number of arguments that each possesses, and

 the nature of the hierarchical references, if any,

among such automatically defined functions

Trang 22

AUTOMATIC SYNTHESIS OF ANALOG

ELECTRICAL CIRCUITS LOWPASS FILTER CIRCUIT

TIME DOMAIN BEHAVIOR OF A LOWPASS FILTER TO A 1,000 HZ SINUSOIDAL INPUT

SIGNAL

TIME DOMAIN BEHAVIOR OF A LOWPASS FILTER TO A 2,000 HZ SINUSOIDAL INPUT

SIGNAL

Trang 23

FREQUENCY DOMAIN BEHAVIOR OF A

LOWPASS FILTER

Trang 24

LOWPASS FILTER CREATED BY GENETIC PROGRAMMING THAT INFRINGES ON GEORGE

CAMPBELL'S PATENT

Trang 25

SQUARING COMPUTATIONAL CIRCUIT CREATED BY GENETIC PROGRAMMING

Trang 26

RISING RAMP  1 OF 4 TIME-DOMAIN SIGNALS USED TO CREATE SQUARING

COMPUTATIONAL CIRCUIT

OUTPUT FOR RISING RAMP INPUT FOR

SQUARING CIRCUIT

Trang 27

AUTOMATIC SYNTHESIS OF CONTROLLERS EVOLVED CONTROLLER THAT INFRINGES

ON JONES' PATENT

Trang 28

AUTOMATIC SYNTHESIS OF ANTENNAS ANTENNA DESIGN CREATED BY GENETIC

PROGRAMMING

Trang 29

ONE-SUBSTRATE, ONE-PRODUCT CHEMICAL

REACTION

One chemical (the substrate) is transformed into another chemical (the product) under control of a catalyst

CHANGING CONCENTRATIONS OF SUBSTANCES IN AN ILLUSTRATIVE ONE- SUBSTRATE, ONE-PRODUCT REACTION

Trang 30

0 10 20 30 40 50 60 0

Trang 31

CHEMICAL REACTIONS

 The action of an enzyme (catalyst) in a one-substratechemical reaction can be viewed as two-step process inwhich the enzyme E first binds with the substrate S at a

rate k1 to form ES The formation of the product P from

ES then occurs at a rate k2 The reverse reaction (for thebinding of E with S) in which ES dissociates into E and S,

occurs at a rate of k-1

E P ES S

 The concentrations of substrates, products,intermediate substances, and catalysts participating inreactions are modeled by various rate laws, including

 first-order rate laws,

 second-order rate laws, power laws, and

 Michaelis-Menten equations

 Michaelis-Menten rate law for a one-substratechemical reaction is

m t

t

K S

S E k dt

P d

 ] [

] [ ] [ ]

1

2 1

k

k k

 Psuedo-first-order rate law

Trang 32

t new

m

K

S E

[ ]

[

]

[

0 0

m new K

k

k  2

Trang 33

E LECTRICAL CIRCUIT REPRESENTING THE ILLUSTRATIVE ONE-SUBSTRATE-ONE-

PRODUCT ENZYMATIC REACTION

Trang 34

SUM-INTEGRATOR

Trang 35

SUBCIRCUIT FOR ONE-SUBSTRATE MICHAELIS-MENTEN EQUATION MICH_1

Subcircuit definition in SPICE for the one-substrateMichaelis-Menten equation MICH_1

*NETLIST FOR MICHAELIS-MENTEN MICH_1XXM4 4 3 2 XDIVV

XXM3 6 5 3 XADDV

XXM2 7 8 4 XMULTV

XXM1 9 5 8 XMULTV

.SAVE V(2) V(3) V(4) V(5) V(6) V(7)V(8) V(9)

.END

Trang 36

ONE-SUBSTRATE, TWO-PRODUCT REACTION

Trang 37

CIRCUIT FOR ILLUSTRATIVE SUBSTRATE, TWO-PRODUCT CHEMICAL

ONE-REACTION

Trang 38

TWO-SUBSTRATE, ONE-PRODUCT REACTION

E P ABE B

 Michaelis-Menten rate law for a two-substratechemical reaction is

t t AB t

B t A

t

B A K B K A K K

E Rate

] [ ] [

1 ]

[

1 ]

[

1 1

] [ 0

][

[

1 A B E k

Rate t

Trang 39

CIRCUIT FOR TWO-SUBSTRATE, PRODUCT CHEMICAL REACTION

Trang 41

ONE-TWO-SUBSTRATE MICHAELIS-MENTEN

EQUATION MICH_2

t t AB t

B t A

t

B A K B K A K K

E Rate

] [ ] [

1 ]

[

1 ]

[

1 1

] [ 0

Trang 43

REPERTOIRE OF FUNCTIONS IN PROGRAM

returns the first of the one or two products produced bythe chemical reaction function designated by its argument

returns the second of the two products (or, the firstproduct, if the reaction produces only one product)

Trang 44

REPERTOIRE OF TERMINALS IN THE

PROGRAM TREE

 Substances

 externally supplied input substances

 intermediate substances created by reactions

 output substances

 Enzymes

 Numerical constants for the rate of the reactions

Trang 45

PROGRAM TREE CORRESPONDING TO METABOLIC PATHWAY FOR PHOSPHOLIPID

CYCLE

Trang 46

REPRESENTATION OF PHOSPHOLIPID CYCLE AS A SYMBOLIC EXPRESSION

Trang 47

01885 ][

00162 [

45 1 ] 00165 [

C C

dt

C d

01885 ][

00162 [

45 1 - 3.1.1.23]

EC ][

00116 ][

00162 [

95 1 ]

01885

[

C C

C C

dt

C

d

 Supply and consumption of the intermediate

substance sn-Glycerol-3-Phosphate (C00093) in the

internal feedback loop

3.1.3.21]

EC ][

00093 [

19 1 - 2.7.1.30]

EC ][

00002 ][

00116 [

69 1 ] 00093

[

C C

C dt

00002 ][

00116 [

69 1 5 1 ] [

C C

dt

ATP d

01885 ][

00162 [

45 1 - 3.1.1.23]

EC ][

00116 ][

00162 [

95 1 2 1 ]

00162

[

C C

C C

][

00116 ][

00162 [

95 1 - 2.7.1.30]

EC ][

00002 ][

00116 [

69 1 - 3.1.3.21]

EC ][

00093 [

19 1 5 0 ]

00116

[

C C

C C

C dt

C

d

Ngày đăng: 18/10/2022, 12:15

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w