META: An Expert System for the Prediction of Metabolic Transformations GILLES KLOPMAN MULTICASE Inc., Beachwood, Ohio, and Case Western Reserve University, Cleveland, Ohio, U.S.A.. For o
Trang 1META: An Expert System for the
Prediction of Metabolic
Transformations
GILLES KLOPMAN MULTICASE Inc., Beachwood, Ohio,
and Case Western Reserve University,
Cleveland, Ohio, U.S.A.
ALEKSANDR SEDYKH Department of Chemistry, Case Western Reserve University, Cleveland, Ohio, U.S.A.
1 OVERVIEW OF METABOLISM
EXPERT SYSTEMS
As the number of chemicals used in industrial processes and
in everyday life rapidly increases, the need to assess a priori their impact on the human organism and the environ-ment becomes critically important Two kinds of chemical interactions are of particular concern Firstly, metabolic
META refers to a copyrighted program owned by KCILLC.
415
Trang 2transformations in the human body and other relevant living organisms, and secondly, transformations under external fac-tors (i.e., sunlight, hydrolysis, atmospheric oxidation, tem-perature, irradiation, electric and magnetic fields) If both types of interactions could be modeled, the impact of any chemical on the biosphere could be predicted and, if neces-sary, controlled
Evidently, the task of modeling human metabolism towers above all the others, since the human body is the ulti-mate goal for most of the evaluations Virtually, any indust-rially produced chemical or its derivative has a chance to end up in the human organism Thus, the need of risk assess-ment (and on the other hand, demands of drug design) make the tool for predicting human metabolism indispensable For obvious ethical reasons, the experimental data on various chemicals in regard to human metabolism is not as abundant as desirable Hence, animal models of human metabolism are extensively used
Accordingly, the first application of the META expert system was to model xenobiotic metabolism of the common mammalian species (1)
2 THE META EXPERT SYSTEM
The META approach consists in compiling separate models (called ‘‘dictionaries’’) for each system of interest, whether it
is a group of similar organisms or an environmental com-partment with certain conditions These dictionaries, essen-tially, are databases of relevant chemical transformations accompanied with supplemental data The META program (‘‘META’’) allows to operate any of the existing dictionaries and handles the entire work session with the user Currently, there are available models for generic mammalian metabolism (1), biodegradation under anaerobic (2) and aerobic (3,4) conditions, and photodegradation in upper layers of lakes (5)
To automate the validation and adjustment of META dictionaries, a special program, META_TREE, was created Based on available experimental data, it finds discrepancies
Trang 3in the dictionary and optimizes its transformations for best performance
Thus, the complete META expert system can be repre-sented by the following triad: the META program, the META dictionaries, and the META_TREE utility
3 META DICTIONARY STRUCTURE
Aside from the structural data, each transformation record also has a priority number (X-field in Fig 1), which reflects the speed or predominance of the underlying chemical or enzymatic reaction that the transform represents The trans-form may also include intrans-formation on some test compounds, which undergo this transformation This information is stored
in the D-field (Fig 1) and used in the optimization of the dic-tionary priorities by META_TREE
The R-field of the transformation record contains refer-ences in a coded form that points to the publications, which describe or support the specified transformation The list of
Figure 1 Example of a transformation rule stored in the dictionary.
Trang 4all referenced publications is collected in an explicit form at the end of the dictionary
4 META METHODOLOGY
A detailed description of the META program algorithm has been given in a series of publications by the Klopman labora-tory (1,6,7)
As was mentioned before, META operates from a diction-ary of transformations Each transformation rule (‘‘transform’’) consists of two structural fragments: a ‘‘target’’ sequence and a
META scans a test molecule for the presence of ‘‘target’’ fragments and replaces them one at a time by the correspond-ing ‘‘transformation’’ fragments, thus produccorrespond-ing a set of pri-mary transformation products At the same time, the program monitors and evaluates the thermodynamic stability
of all the molecules generated and consults a spontaneous reactions module that manages unstable structural moieties Whenever a molecule is found to be unstable, it is trans-formed into a stable product via an appropriate spontaneous reaction transform Upon demand, the first level products will
be processed further so that a complete tree of transformation products can be obtained
In addition, the predicted products are evaluated for excre-tion and toxicity The former is based on the estimaexcre-tion of the Log P (n-octanol=water partition coefficient) (8), the latter employs a separate module of signal transforms, whose ‘‘target’’ sequences are known to occur mainly in toxic compounds (9)
A typical META work session consists of the following steps:
Choosing a model of interest (mammalian meta-bolism, aerobic degradation, anaerobic degradation, and photodegradation)
Entering a test compound as a SMILES code (10), MOL-file (11), or graphic input
Browsing through the tree of primary products and advancing to the next level of products when necessary
Trang 5Retrieving results (chemical structures, calculated properties, literature references, etc.)
META can also be operated in batch-mode, which allows
to screen libraries of chemicals against a particular model In this mode, META generates and evaluates first level products for each test compound and generates a list of them Filtered and formatted results are reported in a log-file
5 META_TREE
According to its description, META reads a dictionary of transforms and applies them to the test molecules in order
to identify possible products Normally, a dictionary contains several hundreds of transformation rules Most of them lack direct kinetic data, and thus, to assign properly their priori-ties is a difficult manual work The META_TREE program resolves this complication It automatically adjusts priorities
of transformations in order to follow experimental metabolic and=or degradation pathways as precisely as possible For that purpose a basic ‘‘genetic algorithm’’ was implemented (6,12)
5.1 Basic Genetic Algorithm Methodology
Genetic algorithms (12) imitate the evolutionary process in nature: 1 generation of diversity, 2 survival of the fittest, and 3 reproduction Its typical task consists of finding in a multiparameter system the best configuration of parameters that satisfies certain requirements Thus, each parameter is called a ‘‘gene’’ and each configuration of parameters is called
an ‘‘individual.’’ A group of individuals forms a ‘‘population’’ representing a pool of approximate solutions to the problem The so-called ‘‘D-function’’ performs the role of the driving factor of the evolution The D-function takes an individual and evaluates its ‘‘fitness.’’
The initial pool of individuals is randomly generated Then the fitness of all individuals is evaluated At this step, the next generation is produced by means of ‘‘crossover’’ and
Trang 6‘‘mutation’’ procedures, which mimic natural reproduction patterns Two individuals at a time are selected from the population (either randomly or depending on the fitness of the individual) and their genes are combined into a pair of
‘‘children.’’ If crossover does not happen, the children will be genetically identical to their respective parents; otherwise each child will bear a partial set of genes Mutation is then applied with some probability, thus randomly modifying some genes of the children The process is repeated until the num-ber of newly generated individuals reaches the size of the initial population
This new generation replaces the previous one and func-tions now as a current population, after which the whole cycle
is repeated
The successive populations diverge from each other less and less until the best individuals in each new population are practically the same At this point, population stability
is achieved and the optimization is deemed complete
5.2 META_TREE Methodology
When a dictionary is being processed, META_TREE identifies the transforms to be optimized Some transforms do not have enough experimental data; others might be intentionally kept duplicates (for example, to demonstrate that the same trans-formation can be carried out by a different mechanism) Each optimizable transform is treated as one ‘‘gene.’’ Thus, each
‘‘individual’’ represents one possible configuration of the transformation priorities in the dictionary
To calculate the fitness of each individual, the META_ TREE D-function uses compounds located at the end of the dictionary file and the D-fields of dictionary transforms (Fig 1) For each such compound, META_TREE generates
a list of the transformations it is assigned to and a list
of transformations that the test compound actually under-goes The latter should include the former, and the priorities of the unaccounted extra transformations (‘‘side-transformations’’) should be inferior to the priorities of the assigned transformations
Trang 7The flexibility of the META_TREE genetic algorithm rests on a group of adjustable parameters, which enables different optimization schemes with a small population size These trial optimizations are compared, and then the most promising direction is used for an in-depth optimization META_TREE was successfully employed in optimizing priorities of the photodegradation (5) and mammal metabo-lism (1) dictionaries, which are of particularly large size (over
1300 transforms each)
The META_TREE utility also carries out the auxiliary function of the general dictionary maintenance This facili-tates the development and further extension of a transforma-tion database by a researcher or programmer, without the need of a comprehensive knowledge of the dictionary design (6)
REFERENCES
1 Talafous J, Sayre LM, Mieyal JJ, Klopman G META 2 A dictionary model of mammalian xenobiotic metabolism J Chem Inf Comput Sci 1994; 34:1326–1333.
2 Klopman G, Saiakhov R, Tu M, Pusca F, Rorije E Computer-assisted evaluation of anaerobic biodegradation products Pure Appl Chem 1998; 70:1385–1394.
3 Klopman G, Tu M Structure-biodegradability study and computer-automated prediction of aerobic biodegradation of chemicals Environ Toxicol Chem 1997; 16:1829–1835.
4 Klopman G, Zhang Z, Balthasar DM, Rosenkranz HS Compu-ter-automated predictions of aerobic biodegradation of chemi-cals Environ Toxicol Chem 1995; 14:395–403.
5 Sedykh A, Saiakhov R, Klopman G META V A model of photodegradation for the prediction of photoproducts of chemicals under natural-like conditions Chemosphere 2001; 45:971–981.
6 Klopman G, Tu M, Talafous J META 3 A genetic algorithm for metabolic transform priorities optimization J Chem Inf Comput Sci 1997; 37:329–334.
Trang 87 Klopman G, Dimayuga M, Talafous J META 1 A program for the evaluation of metabolic transformation of chemicals.
J Chem Inf Comput Sci 1994; 34:1320–1325.
8 Klopman G, Wang S A computer Automated Structure Evaluation (CASE) approach to calculation of partition coefficient J Comput Chem 1991; 12:1025–1032.
9 Klopman G The META-CASETOX system for the prediction of the toxic hazard of chemicals deposited in the environment NATO ASI Ser, Ser 2 1996; 23:27–40.
10 Weininger D SMILES, a chemical language and information system 1 Introduction to methodology and encoding rules J Chem Inf Comput Sci 1988; 28:31–36.
11 Yao Jh Computer treatment of chemical structures (II) Inter-nal storage data structures and file formats of chemical struc-ture representation in computers Jisuanji Yu Yingyong Huaxue 1998; 15:65–69.
12 Wehrens R, Buydens LMC Evolutionary optimization: a tutor-ial Trends Anal Chem 1998; 17:193–203.
GLOSSARY
Biodegradation: Decomposition of a chemical by living systems (usually microorganisms)
D-function: Driving factor of simulated evolutionary process in genetic algorithms
Genetic algorithms: Optimization algorithms that imi-tate the evolutionary process in nature
Hydrolysis: Decomposition of a chemical compound by reaction with water
Log P: Decimal logarithm of the chemicals partition ratio between n-octanol and water
MOL-file: Textual file type standard (developed by MDL) to store molecular structure
Photodegradation: Decomposition of a chemical com-pound under light
SMILES code: Symbol-based representation of a chemi-cal structure