Toxicological Information for Use in Predictive Modeling: Quality, Sources, and Databases.. 98 High Quality Data Sources for Predictive Modeling.. The Use of Expert Systems for Toxicolog
Trang 1PREDICTIVE TOXICOLOGY
Trang 2edited by
Christoph Helma
University of Freiburg, Germany
PREDICTIVE TOXICOLOGY
Trang 3Published in 2005 by
Taylor & Francis Group
6000 Broken Sound Parkway NW
Boca Raton, FL 33487–2742
#2005 by Taylor & Francis Group, LLC
No claim to original U.S Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number-10: 0–8247–2397–X (Hardcover)
This book contains information obtained from authentic and highly regarded sources Reprinted material is quoted with permission, and sources are indicated A wide variety
of references are listed Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity
of all materials or for the consequences of their use.
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com ( http:==www.copyright.com= ) or contact the Copyright Clearance Center, Inc (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978–750–8400 CCC is
a not-for-profit organization that provides licenses and registration for a variety of users For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trade-marks, and are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data Catalog record is available from the Library of Congress
Visit the Taylor & Francis Web site at
http :==www.taylorandfrancis.com
Trang 4Contributors ix
1 A Brief Introduction to Predictive
Toxicology 1
Christoph Helma
What Is Predictive Toxicology? 1
Ingredients of a Predictive Toxicology System 3 Concluding Remarks 7
2 Description and Representation
of Chemicals 11
Wolfgang Guba
Introduction 11
Fragment-Based and Whole Molecule Descriptor
Schemes 13
Fragment Descriptors 14
Topological Descriptors 19
3D Molecular Interaction Fields 23
Other Approaches 27
iii
Trang 53 Computational Biology
and Toxicogenomics 37
Kathleen Marchal, Frank De Smet, Kristof Engelen, and Bart De Moor
Introduction 37
Microarrays 41
Analysis of Microarray Experiments 46
Conclusions and Perspectives 74
4 Toxicological Information for Use
in Predictive Modeling: Quality,
Sources, and Databases 93
Mark T D Cronin
Introduction 93
Requirements for Toxicological Data for Predictive Toxicity 98
High Quality Data Sources for Predictive
Modeling 104
Databases Providing General Sources of
Toxicological Information 104
Databases Providing Sources of Toxicological Information for Specific Endpoints 110
Sources of Chemical Structures 119
Sources of Further Toxicity Data 121
Conclusions 123
5 The Use of Expert Systems for Toxicology Risk Prediction 135
Simon Parsons and Peter McBurney
Introduction 136
Expert Systems 137
Expert Systems for Risk Prediction 147
Systems of Argumentation 153
Summary 167
6 Regression- and Projection-Based Approaches
in Predictive Toxicology 177
Lennart Eriksson, Erik Johansson, and
Torbjo¨rn Lundstedt
Introduction 178
Trang 6Characterization and Selection of Compounds:
Statistical Molecular Design 179
Data Analytical Techniques 182
Results for the First Example—Modeling and Predicting
In Vitro Toxicity of Small Haloalkanes 190 Results for the Second Example—Lead Finding and QSAR-Directed Virtual Screening of
Hexapeptides 203
Discussion 211
7 Machine Learning and Data Mining 223
Stefan Kramer and Christoph Helma
Introduction 223
Descriptive DM 231
Predictive DM 239
Literature and Tools=Implementations 246 Summary 249
8 Neural Networks and Kernel Machines for Vector and Structured Data 255
Paolo Frasconi
Introduction 255
Supervised Learning 258
The Multilayered Perceptron 268
Support Vector Machines 279
Learning in Structured Domains 288
Conclusion 299
9 Applications of Substructure-Based SAR in
Toxicology 309
Herbert S Rosenkranz and Bhavani P Thampatty
Introduction 309
The Role of Human Expertise 311
Model Validation: Characterization and
Interpretation 316
Congeneric vs Non-congeneric Data Sets 335 Complexity of Toxicological Phenomena and Limitations
of the SAR Approach 343
Mechanistic Insight from SAR Models 345
Trang 7Application of SAR to a Dietary Supplement 348 SAR in the Generation of Mechanistic
Hypotheses 354
Mechanisms: Data Mining Approach 355
An SAR-Based Data Mining Approach to Toxicological Discovery 357
Conclusion 361
10 OncoLogic: A Mechanism-Based Expert System for Predicting the Carcinogenic Potential of
Chemicals 385
Yin-Tak Woo and David Y Lai
Introduction 385
Mechanism-Based Structure–Activity Relationships Analysis 387
The OncoLogic Expert System 390
11 META: An Expert System for the Prediction of Metabolic Transformations 415
Gilles Klopman and Aleksandr Sedykh
Overview of Metabolism Expert Systems 415 The META Expert System 416
META Dictionary Structure 417
META Methodology 418
META_TREE 419
12 MC4PC—An Artificial Intelligence Approach to the Discovery of Quantitative Structure–Toxic Activity Relationships 423
Gilles Klopman, Julian Ivanov, Roustem Saiakhov, and Suman Chakravarti
Introduction 423
The MCASE Methodology 427
Recent Developments: The MC4PC Program 433 BAIA Plus 438
Development of Expert System Predictors Based on MCASE Results 443
Conclusion 451
Trang 813 PASS: Prediction of Biological Activity Spectra for Substances 459
Vladimir Poroikov and Dmitri Filimonov
Introduction 459
Brief Description of the Method for Predicting Biological Activity Spectra 461
Application of Predicted Biological Activity Spectra
in Pharmaceutical Research and
Development 471
Future Trends in Biological Activity Spectra
Prediction 474
14 lazar: Lazy Structure–Activity Relationships for Toxicity Prediction 479
Christoph Helma
Introduction 479
Problem Definition 482
The Basic lazar Concept 484
Detailed Description 485
Results 491
Learning from Mistakes 493
Conclusion 495
Trang 9Suman Chakravarti Case Western Reserve University,
Cleveland, Ohio, U.S.A.
Mark T D Cronin School of Pharmacy and Chemistry,
John Moores University, Liverpool, U.K.
Bart De Moor ESAT-SCD, K.U Leuven, Leuven, Belgium Frank De Smet ESAT-SCD, K.U Leuven, Leuven, Belgium Kristof Engelen ESAT-SCD, K.U Leuven, Leuven, Belgium Lennart Eriksson Umetrics AB, Umea˚, Sweden
Dmitri Filimonov Institute of Biomedical Chemistry of Russian Academy of Medical Sciences, Moscow, Russia
Paolo Frasconi Dipartimento di Sistemi e Informatica,
Universita` degli Studi di Firenze, Firenze, Italy
Wolfgang Guba F Hoffmann-La Roche Ltd, Pharmaceuticals Division, Basel, Switzerland
Christoph Helma Institute for Computer Science, Universita¨t Freiburg, Georges Ko¨hler Allee, Freiburg, Germany
ix
Trang 10Julian Ivanov MULTICASE Inc., Beachwood, Ohio, U.S.A Erik Johansson Umetrics AB, Umea˚, Sweden
Gilles Klopman MULTICASE Inc., Beachwood, Ohio, and Department of Chemistry, Case Western Reserve University, Cleveland, Ohio, U.S.A.
Stefan Kramer Institut fu¨r Informatik, Technische Universita¨t Mu¨nchen, Garching, Mu¨nchen, Germany
David Y Lai Risk Assessment Division, Office of Pollution Prevention and Toxics, U.S Environmental Protection Agency, Washington, D.C., U.S.A.
Torbjo¨rn Lundstedt Acurepharma AB and BMC, Uppsala, Sweden
Peter McBurney Department of Computer Science,
University of Liverpool, Liverpool, U.K.
Kathleen Marchal ESAT-SCD, K.U BMC, Leuven, Leuven, Belgium
Simon Parsons Department of Computer and Information Science, Brooklyn College, City University of New York, Brooklyn, New York, U.S.A.
Vladimir Poroikov Institute of Biomedical Chemistry of Russian Academy of Medical Sciences, Moscow, Russia
Herbert S Rosenkranz Department of Biomedical Sciences, Florida Atlantic University, Boca Raton, Florida, U.S.A.
Roustem Saiakhov MULTICASE Inc., Beachwood, Ohio, U.S.A Aleksandr Sedykh Department of Chemistry, Case Western Reserve University, Cleveland, Ohio, U.S.A.
Bhavani P Thampatty Department of Environmental and Occupational Health, Graduate School of Public Health, University
of Pittsburgh, Pittsburgh, Pennsylvania, U.S.A.
Yin-Tak Woo Risk Assessment Division, Office of Pollution Prevention and Toxics, U.S Environmental Protection Agency, Washington, D.C., U.S.A.
Trang 11A Brief Introduction to Predictive Toxicology
CHRISTOPH HELMA Institute for Computer Science, Universita¨ t Freiburg, Georges Ko¨ hler Allee, Freiburg, Germany
1 WHAT IS PREDICTIVE TOXICOLOGY?
The public demand for the protection of human and environ-mental health has led to the establishment of toxicology as the science of the action of chemicals on biological systems Toxicological research is focused presently very much on the elucidation of the cellular and molecular mechanisms of toxi-city and the application of this knowledge in safety evalua-tion and risk assessment This is essentially a predictive strategy (Fig 1): Toxicologists study the action of chemicals
in simplified biological systems (e.g., cell cultures, laboratory animals) and try to use these results to predict the potential impact on human or environmental health
1
Trang 12Predictive toxicology, as we understand it in this book, does something very similar (Fig 1): In predictive toxicology, we try to develop procedures (algorithms in computer science terms) that are capable to predict toxic effects (the output) from chemical and biological information (the input)
Figure 1 summarizes also the key ingredients of
a predictive toxicology system First, we need a description
of chemicals and biological systems as input for predi-ctions This information is processed by the prediction algorithm, to generate a toxicity estimation as output We can also distinguish between data (input and output) and algorithms
Figure 1 Abstraction of the predictive toxicology process.
Trang 132 INGREDIENTS OF A PREDICTIVE
TOXICOLOGY SYSTEM
2.1 Chemical, Biological, and
Toxicological Data
Most of the research in predictive toxicology has been devoted
to the development of algorithms, but for a good performance, the data aspect is at least equally important It is in principle possible to use many different types of information to describe chemical and biological systems The key problem in predic-tive toxicology is to identify the parameters that are relevant for a particular toxic effect The situation is relatively easy, if the underlying biochemical mechanisms are well known In this case, we can determine a rather limited set of para-meters, that might be relevant for our purpose In practice, however, biochemical mechanisms are frequently unknown and=or too complex, to determine a suitable set
of parameters a priori Methods for parameter selection are therefore an important research topic in predictive toxicology
Toxicity data are needed for two purposes: First of all, we need to validate prediction methods, and this can be done by comparing the predictions with realworld measurements But
we can use toxicity data also as input to one of the data driven approaches that are capable of generating prediction models automatically from empirical data (Fig 2) In this case, the quality of the prediction model is largely determined by the quality of the input data
Despite many possibilities, practical applications have focused on a relatively small set of chemical and biological features The most popular chemical features are closely related to the chemical structure (e.g., presence= absence of certain substructures) or to properties, that can be calculated from the chemical structure (e.g., physico-chemical properties) As no experimental work is needed to obtain this type of data, the rationale for their choice is obvious, but other substance-related information (e.g., biological activities in screening assays, IR-spectra) can be used as well
A Brief Introduction to Predictive Toxicology 3
Trang 14Up to now information about biological systems has been rarely considered in predictive toxicology Biological systems have been treated as ensembles of uniform members (e.g., equal individuals), without any biological variance The expli-cit consideration of the biological part of the equation will be
an interesting research topic of the next years.a
Chemical, biological and toxicological data and their repre-sentation are the topics of the first section of this book It con tains the chapters Description and Representation of Chemicals by Guba (1), Computational Biology and Toxicogenomics by Marcha l et al (2), and Toxicological Informa-tion for Use in Predictive Modeling: Quality, Sources, and Data-bases by C ronin (3 )
a
The chapter from Marchal et al (2) provides some examples how to use biological information for predictive purposes.
Figure 2 Abstraction of a data driven approach in predictive toxicology.
Trang 152.2 Prediction Algorithms
For the prediction algorithm, we have the choice between two strategies We can try to mimic a human expert by building
an expert system, or we can try to deduce a prediction model from empirical data by a data-driven approach as in Fig 2 The basics of expert systems and some exemplary appli-cations are the topic of Parson and McBurney’s chapter, The Use of Expert Systems for Toxicology Risk Prediction (4) Two of the programs [META (5) and OncoLogic (6)] discussed
in the section Implementations of Predictive Toxicology Sys-tems are also expert sysSys-tems
If we intend to generate a prediction model from experimen-tally determined toxicity data as in Fig 2, we have the choice between many different methods Statistical methods, for exam-ple, have been successfully applied in quantitative structureac-tivity relationships (QSAR) for decades Eriksson et al.(7) describe statistical techniques in t he chapter entitled Regres-sion- and Projection-Based Approaches in Predictive Toxicology
More recently, techniques originating from artificial intelligence research have been used in predictive toxicology These computer-science oriented developments are summar-ized in two chapters: Machine Learning and Data Mining
by Kramer and Helma (8) and Neural Networks and Kernel Machines for Vector and Structured Data by Frasconi (9) Three programs of the section Implementations of Predictive Toxicology Systems [MC4PC (10), PASS (11), lazar (12)] use such a data-driven approach
I want to stress the point that similar predictions can be obtained with a variety of methods The choice of the method for a particular purpose will depend largely on the scope
of the application, present research trends and the personal preferences of the individual researcher
2.3 Application Areas
The primary aim of predictive toxicology is, of course, the pre-diction of toxic activities of untested compounds This enables chemical and pharmaceutical companies, for example, to eval-uate potential side effects of candidate structures even without
A Brief Introduction to Predictive Toxicology 5