Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints

Drug-induced liver injury (DILI) is a critical issue in drug development because DILI causes failures in clinical trials and the withdrawal of approved drugs from the market. There have been many attempts to predict the risk of DILI based on in vivo and in silico identification of hepatotoxic compounds.

Trang 1

R E S E A R C H Open Access

Prediction models for drug-induced

hepatotoxicity by using weighted

molecular fingerprints

Eunyoung Kim and Hojung Nam*

From DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics

Indianapolis, IN, USA 24-28 October 2016

Abstract

Background: Drug-induced liver injury (DILI) is a critical issue in drug development because DILI causes failures in clinical trials and the withdrawal of approved drugs from the market There have been many attempts to predict the risk of DILI based on in vivo andin silico identification of hepatotoxic compounds In the current study, we propose thein silico prediction model predicting DILI using weighted molecular fingerprints

Results: In this study, we used 881 bits of molecular fingerprint and used as features describing presence or absence of each substructure of compounds Then, the Bayesian probability of each substructure was calculated and labeled (positive

or negative for DILI), and a weighted fingerprint was determined from the ratio of DILI-positive to DILI-negative

probability values Using weighted fingerprint features, the prediction models were trained and evaluated with the

Random Forest (RF) and Support Vector Machine (SVM) algorithms The constructed models yielded accuracies of 73.8% and 72.6%, AUCs of 0.791 and 0.768 in cross-validation In independent tests, models achieved accuracies of 60.1% and 61.1% for RF and SVM, respectively The results validated that weighted features helped increase overall performance of prediction models The constructed models were further applied to the prediction of natural compounds in herbs to identify DILI potential, and 13,996 unique herbal compounds were predicted as DILI-positive with the SVM model

Conclusions: The prediction models with weighted features increased the performance compared to non-weighted models Moreover, we predicted the DILI potential of herbs with the best performed model, and the prediction results suggest that many herbal compounds could have potential to be DILI We can thus infer that taking natural products without detailed references about the relevant pathways may be dangerous Considering the frequency of use of

compounds in natural herbs and their increased application in drug development, DILI labeling would be very important Keywords: Drug toxicity prediction, Drug-induced liver injury, Machine learning, Data mining

Background

As the leading cause of development failure in clinical

trials and withdrawal of drugs from the market,

drug-induced liver injury (DILI) is one of the most important

factor in drug development [1] The severe adverse effects

of DILI, which include acute liver failure and jaundice,

must be considered in drug development The toxicity of

these drugs is attributable to their conversion in the liver

to highly reactive metabolites that cause organ damage [2–4] However, determining DILI potential is a very challenging task, primarily because animal studies do not efficiently predict DILI potential in human For example, in a phase II clinical trial, acute liver toxicity in-duced by fialuridine led to the deaths of five subjects, in contrast to its safe use in animal studies [5] In a study of

221 pharmaceutical products, the rate of concordance of hepatotoxicity in humans and animals was low, approxi-mately 55%, whereas the rate of concordance was much higher in other target organs, including the hematological

* Correspondence: hjnam@gist.ac.kr

School of Electrical Engineering and Computer Science, Gwangju Institute of

Science and Technology (GIST), Buk-gu, Gwangju 61005, Republic of Korea

© The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made The Creative Commons Public Domain Dedication waiver

Trang 2

systems [6] In addition, clinical features or laboratory tests

for predicting DILI potential have not been identified [7, 8]

Moreover, the statistical power of clinical trials is

insuffi-cient Severe idiosyncratic hepatotoxicity occurs at very low

frequency, and patient samples in clinical trials number

only in the thousands Due to this low statistical power,

even well-controlled clinical trials can fail to predict DILI

To overcome these problems, many researchers have

sought to evaluate the toxicity of compounds in vitro

and/or in vivo However, considering the number of

compounds, this approach is time-consuming and costly,

and thus there has been much effort to develop prediction

models to determine if a compound could cause liver

tox-icity Computational modeling approaches have been

adopted by pharmaceutical companies to help evaluate

the efficacy, toxicity, and metabolism of pharmaceutical

ingredients [9] In the early stages of the development

of prediction models, the predictive power of the

con-structed models was not satisfactory, and models often

relied on experimental data for better performance Some

researchers used molecular signatures, such as for alanine

transaminase (ALT), aspartate aminotransferase (AST),

and alkaline phosphatase (ALP), all of which are

com-monly assessed in the diagnostic evaluation of

hepatocel-lular damage [10] In more recent years, machine-learning

algorithms for prediction models have also been developed

to obtain better predictions [11, 12] However,

experimen-tal data are limited utility in constructing prediction

models Therefore, several researchers have focused on

computational predictions using compound properties

and structural characteristics Greene et al developed

structure-activity relationships for potentially hepatotoxic

compounds [13] Compounds were categorized into four

classes associated with hepatotoxicity: no evidence, weak

evidence, animal hepatotoxicity and human

hepatotox-icity The resultant hepatotoxicity alerts yielded a

concord-ance of 56%, a specificity of 73%, and a sensitivity of 46%

Ekins et al built a classification model based on the

Bayes-ian modeling method with molecular descriptors and

fin-gerprint descriptors [14] The evaluation of the classifier

demonstrated a concordance of 60% for internal validation

and 64% for external validation Rodgers et al also

devel-oped a quantitative structure-activity relationship (QSAR)

model using liver adverse effects of drugs (AEDs) as a

dataset They used information on enzyme markers of

hepatotoxicity, but these markers can fluctuate due to

other factors throughout the day [15] Moreover, Huang et

al developed a prediction model based on QSAR using a

variety of descriptors including fingerprints Their model

performed well with an accuracy of 79.1% in internal

val-idation They further predicted the potential

hepatotox-icity of Traditional Chinese Medicines [16] Zhang et al

also developed an in silico prediction model for DILI They

algorithms and obtained a concordance of 66% using the Support Vector Machine algorithm and FP4 fingerprint, in addition to identifying important substructure patterns related to liver toxicity [17] Despite these extensive efforts

to predict DILI, there are no standard QSAR models for DILI, in contrast to the availability of QSAR models for mutagens Moreover, less is known about the substructures that are significantly associated with DILI [18–20]

Thus, in this study, we focused on improving DILI pre-diction models using Bayesian weighted substructures and identifying frequently appearing substructures that might

be key for DILI (Fig 1) First, datasets from the Liver Tox-icity Knowledge Base (LTKB) and the DrugBank database were obtained and pre-processed [21] We then extracted substructure feature information from 312 compounds The weighted features were obtained from the calculation

of the Bayesian probability for each substructure repre-sented in a compound fingerprint The prediction models were trained by two algorithms and evaluated with an in-dependent test set of unseen 398 compounds Finally, the constructed models were used to predict the hepatotoxic potential of herb-related compounds from herb databases Moreover, several frequent substructures related to DILI-positive compounds were reported as alerts

Methods

Data preparation

The Liver Toxicity Knowledge Base Benchmark Dataset (LTKB-BD) and the DrugBank database were used as training datasets LTKB-BD is a benchmark dataset pro-vided by the National Center for Toxicological Research (NCTR), U.S FDA [21, 22] This dataset contains a list

of drugs with DILI potential in humans in accordance with FDA-approved prescription drug labels Drugs in the dataset are categorized into one of three groups based

on their description and severity: most-DILI-concern, less-DILI-concern, and no-DILI-concern Drugs with a black box warning of hepatotoxicity or that were withdrawn from the market were classified into the most-DILI-concern category The drugs in that class were labeled due

to their fatal hepatotoxicity, including liver necrosis, jaun-dice, and acute liver failure The less-DILI-concern drugs included those with moderate DILI warnings, and drugs without any DILI indication were classified as no-DILI-concern drugs In this study, we began by labeling 222 DILI-concern drugs and 65 no-DILI-concern drugs from the LTKB-BD as positive and negative, respectively We then retrieved simplified molecular-input line-entry system (SMILES) information using ChemSpider python API by name matching [23, 24] The SMILES information was further used to obtain molecular fingerprints for use as features in model training and construction

We selected only one-matched compounds for higher

Trang 3

confidence because ChemSpider API offers a partial

matching service Finally, we obtained 180 positive

and 53 negative compounds

Moreover, we retrieved additional negative data from

the DrugBank database to balance the data size From

the DrugBank database, we extracted FDA-approved

drugs, with a focus on drugs approved for more than

10 years The database provides a ‘started-market-date’

and an‘ended-market-date’, and thus we set the limits to

‘2006’ for the started-market-date and to ‘none’ for the

ended-market-date We again queried ChemSpider API

to obtain the SMILES information for these drugs, and

we removed the drugs overlapping with the LTKB

data-set by comparing the SMILES information Finally, we

identified 79 negative compounds from the DrugBank

database In total, 180 positive compounds and 132

negative compounds were used as the training dataset as

listed in Table 1

Molecular fingerprints

Molecular fingerprints are a representation of the struc-ture of a compound Fingerprints are widely used in chemical informatics because they consist of bitstrings, which facilitate molecule comparisons Each bit of a fin-gerprint represents a specific substructure of a molecule, and the annotation of the substructure depends on the type of fingerprint In the current study, we used Pub-Chem fingerprints (ftp://ftp.ncbi.nlm.nih.gov/pubchem/ specifications/pubchem_fingerprints.pdf ), which have a

Features

• PubChem fingerprint

Positive

Negative

1 0 1 0 0 1 0 1

1 0 0 1 0 0 1 1

Bayesian probability P(P|S) =———P(P,S)

P(S)

Frequent in negative

Frequent in positive

Log2( ) ——— P(P|S)

P(N|S)

Weight: × 10

881 substructures

Weighted fingerprint

Dataset

LTKB-DB

DrugBank

Pre-processing

• DILI-Positive

• DILI-Negative

- no-DILI concern

- FDA-approved ( > 10 yrs )

312 training dataset Positive (180) / Negative (132)

Training&Validation

Model construction (Random Forest, SVM)

Cross-validation

Independent Test

398 Independent test sets Positive(224) /Negative(174)

Data: previous studies

Greene Xu

Prediction

Herb

DB

KAMPO

TCM-ID

TCMID

Extract

herb-related

compounds

17,826

compounds

Positive

13,996

Negative

3,830

(SVM)

Fig 1 Overview of prediction model construction

Table 1 The number of compounds used in training and the independent test

Datasets DILI-positive DILI-negative Total

Independent test Green & Xu 224 174 398

Trang 4

element, the count of a ring system, the atom pairs, the

atom’s nearest neighbors, and the SMARTS patterns

The PubChem fingerprint was chosen for substructure

reporting in the present study because it describes the

structure of a molecule in detail with a long bit-vector

To retrieve fingerprint information, we used the

PaDEL-Descriptor, which is software used to calculate molecular

descriptors including 1D, 2D, and 3D descriptors and 12

types of fingerprints for the PubChem fingerprint [25]

The software can be downloaded online and supports a

graphical interface

Bayesian theory for feature weight calculation

A molecular fingerprint is a binary vector and thus is

composed of zeros and ones The fingerprint indicates

the presence of a substructure in a molecule In this study,

we focused on substructure information in DILI-positive

compounds, and therefore, we used Bayesian theory to

identify frequent substructures in DILI-positive compounds

that might cause hepatotoxicity First, we calculated the

probability that a compound was DILI-positive/negative

given that a structure was present/absent (Formula 1),

where P and N each represents positive and negative label,

and S indicates a substructure

P PjSð Þ ¼P P; SP Sðð ÞÞ¼P SjPð ÞP PP SjPð Þ þ P SjNð ÞP Pð Þð ÞP Nð Þ ð1Þ

However, if we calculate the Bayesian probability as in

the equation above, a substructure will have a probability

value of zero if it is absent from both positive and

nega-tive compounds A zero probability does not indicate

that a substructure is always absent in either case If we

increase the size of the dataset, those bits might appear

Therefore, to avoid zero probabilities, we used Laplace

smoothing, which is a technique that pretends we

ob-served every outcome k extra times (Formula 2)

PLAP;kð Þ ¼x N þ k Xc xð Þ þ kj j; PLAP;kð Þ ¼xjy c yc x; yð Þ þ k Xð Þ þ kj j

ð2Þ

We then calculated the log odds ratio for each

sub-structure (Formula 3)

Log2 P PjSð Þ

P NjSð Þ

ð3Þ

If the ratio value of a substructure is high, it means

that the substructure appeared more frequently in

DILI-positive compounds We then set the threshold to give

weight using the log odds ratio values The values of the

selected substructures that were greater than the

thresh-old were weighted by multiplying and amplifying the

structures with odds ratio below the threshold received

a weight value of one Here, we only gave weight to high log odds ratios because we wanted to predict DILI-positive compounds, which are toxic and therefore more critical to predict than negative compounds The calculated weight vector was then multiplied element-by-element to the ori-ginal fingerprint The overall process of weight calculation

is illustrated in Fig 2

The Random Forest (RF) and the Support Vector Machine (SVM) algorithms were used to construct the classification and prediction model The RF algorithm

is an ensemble learning algorithm that operates by constructing a large number of decision trees and collect-ing them When it devises a prediction, it runs a new input for every decision tree and votes on how it is to be classi-fied The main advantage of the RF algorithm is that it avoids overfitting problems, which occur frequently when dealing with a small dataset The implementation of the algorithm is found in MATLAB Statistics and Machine Learning Toolbox (MATLAB and Statistics Toolbox Re-lease 201#, The MathWorks, Inc., Natick, Massachusetts, United States) The TreeBagger function was used for the RF algorithm SVMs are among the most popular supervised machine-learning algorithms for pattern recognition and are also used for classification SVM constructs a hyperplane that is used for classification using specified training examples, each including a cat-egory label The constructed model can then be used to predict the DILI potential of a new drug The imple-mentation of the SVM we used is A Library for Support Vector Machines (LIBSVM) [26] When training a model, we used similarity matrices calculated using the Tanimoto coefficient, a similarity metric that uses the ratio of the intersecting set to the union set because the constructed space would be very high-dimensional with 881 features The use of similarity matrices re-duces the dimensions to the data size

When training the models, we performed 10-fold cross-validation, which divides the training dataset into ten subsamples Nine subsamples are used for training, and one subsample is used for testing We constructed each model with different thresholds and multiplication numbers, and we compared the performances to select the best model for prediction

Independent test

The data from previous studies were used for further evaluation We collected the independent test set from two studies: Greene et al and Xu et al [13, 27] Greene’s dataset was categorized into four groups: HH (evidence

of human hepatotoxicity); NE (no evidence of hepato-toxicity in any species); WE (weak evidence of human hepatotoxicity); and AH (evidence for animal hepatotoxicity

Trang 5

Fig 2 The process of feature weight calculation First, the Bayesian probabilities for each substructure were calculated Then, substructures selected based on a log odds ratio threshold were weighted, while others remained binary When calculating the weight vector, the feature values (x) of selected substructures were amplified by a user parameter n The constructed weight vector was then multiplied with the original feature matrix

Trang 6

compounds in the HH and NE categories as positive and

negative, respectively After combining the two

data-sets, we pre-processed the resultant dataset in the same

manner as the training set The SMILES information

was retrieved from ChemSpider and was used to

elim-inate duplicates from the training set and elimelim-inate

label contradictions between the two sets In total, we

obtained 398 compounds, including 224 positive and

174 negative

Prediction of natural products

The constructed classification model was then applied to

predict the potential hepatotoxicity of natural products

We collected herbal compound information from the

TCMID, TCM-ID, and KAMPO databases [28–30], all

of which contain information about the efficacy of herbs

and their constituent compounds The natural product

dataset was also standardized by ChemSpider, and a

fin-gerprint was obtained Finfin-gerprints were not able to be

retrieved for a few compounds, primarily very complex,

large molecules with a mass greater than 1000 Da These

compounds were excluded, resulting in a final total of

17,826 compounds

Results

Frequent substructures in hepatotoxic compounds

One of the main purposes of this research was to identify

important substructures in DILI-positive compounds The

frequently appearing substructures can be inferred from the

weighted substructures We first calculated the probabilities

of each substructure to be in positive and negative labeled

compounds respectively Then with the log odds ratio

of positive to negative we selected substructures to be

weighted We determined the weighted substructures

by high log odds ratio values, since we focused on

substructures which are frequent in DILI-positive

identified 24 substructures.The following substructures with other various threshold values are described in Additional file 1: Table S1–S3

Model performance

We compared the model without weighted features to the model with weighted features to assess whether giving weights to the frequently appearing substructures affected performance As shown in Fig 3, models with weighted features performed better in both algorithms Although the RF model previously performed poorly, with the weighted feature, the AUC, AUPR, and accuracy increased significantly to 0.79, 0.82, and 74%, respectively Likewise, the SVM performance also increased, although models without features were already classified quite well The AUC, AUPR, and accuracy values were 0.77, 0.83, and 73%, respectively All models with different thresholds and multiplication numbers were compared The RF model performed best with a threshold of 1.5 and a multiplica-tion number of 15, and the SVM model performed best with a threshold of 2 and multiplication number of 15 A performance comparison using different thresholds can be found in Additional file 2: Figure S1–S2

Furthermore, we compared the performance of the constructed models in an independent test to evaluate the performance with unseen data set Figure 4 shows the increased performance with the weighted features Although the sensitivities were high in the non-weighted models, the specificities were very poor Using the weighted feature, the specificity of both models in-creased to greater than 0.4, and the overall accuracy values increased slightly

We implemented a model from Zhang’s study for fur-ther performance comparison They developed prediction models with various fingerprints and machine-learning al-gorithms We constructed an SVM model with the dataset

RF - Cross-validation

SVM - Cross-validation

0.693 0.791 0.703 0.820

0.826 69.272.6

Fig 3 Performance of the models in cross-validation Performance in both RF and SVM increased with weighted features

Trang 7

provided by Zhang et al using FP4 fingerprints and

applied our proposed feature weight calculation method

Our method increased the accuracy from 75% to 87%

(Fig 5) Although the sensitivity decreased slightly, the

specificity increased dramatically from 0.379 to 0.755,

in-dicating that our method performs well in predicting both

negative and positive compounds As a more precise

com-parison, we randomly selected 59 positive and 29 negative

compounds from the LTKB dataset a hundred times, and

our method resulted in a higher average accuracy of

86.4% This result indicates that our method exhibits

su-perior classification and prediction of DILI compounds

under the same conditions

Prediction of hepatotoxic compounds in natural products

The hepatotoxic potential of the herb-related compounds was predicted using the constructed models Since the pa-rameters and algorithms in each model vary, the results differed slightly, but the models predicted that more than 60% of compounds in natural products have hepatotoxic potential RF predicted 11,944 compounds as hepatotoxic, whereas SVM predicted 13,996 compounds as DILI-positive Although the two prediction models yielded dif-ferent outcomes, the predicted positive compounds greatly overlapped, as shown in Fig 6

Discussion

In the current study, we calculated the weighted feature using Bayesian theory and constructed DILI prediction models using the updated feature with two algorithms:

RF and SVM When calculating the weight vector, we fo-cused on giving weight to those features that appeared more frequently in DILI-positive compounds than in DILI-negative compounds because it is more important

to identify hepatotoxic compounds that might cause critical adverse reactions when developed into drugs Therefore, we set a cutoff to select the substructures to

be weighted by their log odds ratio values The thresh-old ranged from 0.5 to 2.5 and resulted in different per-formances With an excessively low threshold, the number of weighted substructures was too large, causing the overall values of the weight vector to increase without differentiating specific substructures and, consequently, poor model performance By contrast, the use of an excessively high threshold would weight too few sub-structures, resulting in a decrease of performance The parameter multiplied with the selected substructure also affected the performance, but the effect was not significant This result indicates that amplification of

RF - Independent test

0.746 0.710

0.379 0.460 58.5 60.1

SVM - Independent test

0.7370.763

0.3850.414

58.361.1 ACC (%)

Fig 4 Performance of the models in the independent test The gap between sensitivity and specificity decreased and the accuracy increased with weighted features in both models

Independent test performance

0.932

0.906

0.379

0.755 75

87.1

Fig 5 Performance comparison between the previous study and

the proposed method Our method increased the performance

overall compared with that reported by Zhang In particular, the

specificity increased dramatically, although the sensitivity

decreased slightly

Trang 8

values is important but that the degree of

amplifica-tion does not significantly affect model performance

Both constructed models resulted in good

perform-ance in cross-validation considering AUC and accuracy;

however, the accuracy of the independent test slightly

decreased compared to the results of cross-validation

The low accuracy was due to low specificity, indicating

that the model tends to predict more compounds as

positive than it predicts as negative This problem

oc-curred because we focused on predicting DILI-positive

compounds by weighing the related substructures and

used a sensitivity threshold of 0.8, which could be

rela-tively high Because it is safer to predict negative

com-pounds as positive (classifying nontoxic comcom-pounds as

toxic) than to classify toxic compounds as nontoxic, we

did not lower the threshold but attempted to reduce the

gap between sensitivity and specificity using a weighted

feature This approach helped increase the accuracy

Although the increase in accuracy was not dramatic, the

model classified the independent test set more precisely,

positive to positive and negative to negative The results

also demonstrated that the weighted substructures

affected the prediction of DILI-positive compounds

In this study, we also determined frequently occurring

substructures in DILI-positive compounds Although the

substructures with the highest probability are general, as

the threshold lowers, more details in the SMARTS

pat-terns can be observed We obtained general structures

because of the characteristic of PubChem fingerprints, which divide a structure into lower levels

The prediction of the DILI potential of natural products indicated that many compounds are related to drug-induced hepatotoxicity (Fig 6) If compounds found in the intersection of the predicted results from the two algo-rithms are considered highly hepatotoxic, 63% of natural products from the herb databases have the potential to cause liver toxicity We reported five compounds of 11,195 as examples in Fig 7, including the names, struc-tures, and related herbs that contain each compound Conclusions

We introduced a DILI prediction model with weighted features The weighted features were calculated using Bayesian probability giving information of frequency of each substructure in DILI-positive and DILI-negative compounds As a result, the weighted features increased the model performance in both cross-validation and independent test with unseen dataset Moreover, we applied the constructed model to prediction of DILI potential in herbs The results show that large number of predicted positive compounds indicates that even com-pounds found in nature can be toxic and harmful to the human body This finding is important because some people in Eastern countries rely on herbal medicine and believe it is safer than taking general drugs However, natural products are not always beneficial to health In

13,996

3,830

SVM

Positive Negative 11,944

5,882

Random Forest

Positive Negative

SVM-positive (13,996)

RF-positive

c

Fig 6 The proportion of predicted compounds in herbs a RF predicted 67% of compounds as DILI-positive b SVM predicted 79% of compounds

as DILI-positive c The number of overlapping compounds predicted by the two algorithms

Trang 9

addition, natural products have come to the forefront in drug discovery and development Therefore, herbs that are used as home remedies or that are under development must be carefully administered, considering their toxic effects on the human body In addition, we listed frequent substructures in DILI-positive compounds to facilitate drug screening in less time and at lower cost

As an additional approach, we can improve the predic-tion models using structural informapredic-tion other than two-dimensional structural information The frequent substructures we reported here based on the fingerprint annotation can be further developed to aid the identifi-cation of toxicophores using neural networks

Additional files

Additional file 1: Table S1 Description of frequent appearing substructures in DILI-positive compounds (Log odds ratio: 2.5) Table S2 Description of frequent appearing substructures in DILI-positive compounds (Log odds ratio: 2) Table S3 Description of frequent appearing substructures

in DILI-positive compounds (Log odds ratio: 2) (PDF 55 kb) Additional file 2: Figure S1 Performance change by different cutoff Figure S2 Performance change by weight values (PDF 326 kb)

Acknowledgments None.

Funding This work was supported by the Bio-Synergy Research Project (NRF-2014M3A9C4066449) of the Ministry of Science, ICT and Future Planning through the National Research Foundation, by the National Research Foundation of Korea grant funded by the Korea government (MSIP) (NRF-2015R1C1A1A01051578), and by the GIST Research Institute (GRI) in

2017 Publication charge for this work was funded by the Bio-Synergy Research Project (NRF-2014M3A9C4066449).

Availability of data and materials The Liver Toxicity Knowledge Base Benchmark Dataset (LTKB-BD) is developed by NCTR scientists and available on the U.S Food and Drug Administration (http://www.fda.gov/ScienceResearch/BioinformaticsTools/ LiverToxicityKnowledgeBase/) The additional negative dataset from DrugBank is also available online (https://www.drugbank.ca/).

Authors ’ contributions

EK and HN conceived of the study EK wrote the manuscript HN helped draft the manuscript and participated in the editing of the manuscript All authors have read and approved the final manuscript.

Competing interests The authors declare that they have no competing interests.

Consent for publication

2-(3,4-Dihydroxyphenyl)-5,7-dihydroxy-4-oxo-4H-chromen-3-yl L-ribopyranoside (C20H18O11)

Herb: Agrimonia pilosa, Phytolacca americana

7,7'-Dimethoxy-2H,2'H-6,8'-bichromene-2,2'-dione (C20H14O6)

Herb: Sophora subprostrata, Sophora flavescens

a

b

c

Avenanthramide A (C16H13NO5)

Herb: Prunus armeniaca

d

e

Cimicifugoside (C35H52O9)

Herb: Actaea simplex

2',6'-Dihydroxy-3',4'-dimethoxychalcone (C17H16O5)

Herb: Onychium auratum, Lindera umbellate,

Didymocarpus pedicellata

Fig 7 Examples of predicted DILI-positive compounds and related herbs Each compound is represented with its name, formula, structure and its related herbs Each compound is related to following herbs - a Agrimonia pilosa, Phytolacca americana b Sophora subprostrata, Sophora flavescens c Actaea simplex d Prunus armeniaca e Onychium auratum, Lindera umbellate, Didymocarpus pedicellata

Trang 10

Not applicable.

About this supplement

This article has been published as part of BMC Bioinformatics Volume 18

Supplement 7, 2017: Proceedings of the Tenth International Workshop on

Data and Text Mining in Biomedical Informatics The full contents of the

supplement are available online at https://bmcbioinformatics.biomedcentral.com/

articles/supplements/volume-18-supplement-7.

Springer Nature remains neutral with regard to jurisdictional claims in

published maps and institutional affiliations.

Published: 31 May 2017

References

1 Lee WM Drug-induced hepatotoxicity New England J Med 2003;349

(5):474 –85.

2 Kassahun K, Pearson PG, Tang W, McIntosh I, Leung K, Elmore C, Dean D,

Wang R, Doss G, Baillie TA Studies on the metabolism of troglitazone to

reactive intermediates in vitro and in vivo Evidence for novel

biotransformation pathways involving quinone methide formation and

thiazolidinedione ring scission Chem Res Toxicol 2001;14(1):62 –70.

3 Park BK, Kitteringham NR, Maggs JL, Pirmohamed M, Williams DP The role

of metabolic activation in drug-induced hepatotoxicity Annu Rev

Pharmacol Toxicol 2005;45:177 –202.

4 Walgren JL, Mitchell MD, Thompson DC Role of metabolism in

drug-induced idiosyncratic hepatotoxicity Crit Rev Toxicol 2005;35(4):325 –61.

5 McKenzie R, Fried MW, Sallie R, Conjeevaram H, Di Bisceglie AM, Park Y,

Savarese B, Kleiner D, Tsokos M, Luciano C, et al Hepatic failure and lactic

acidosis due to fialuridine (FIAU), an investigational nucleoside analogue for

chronic hepatitis B N Engl J Med 1995;333(17):1099 –105.

6 Olson H, Betton G, Robinson D, Thomas K, Monro A, Kolaja G, Lilly P,

Sanders J, Sipes G, Bracken W, et al Concordance of the toxicity of

pharmaceuticals in humans and in animals Regul Toxicol Pharmacol.

2000;32(1):56 –67.

7 Grant LM, Rockey DC Drug-induced liver injury Curr Opin Gastroenterol.

2012;28(3):198 –202.

8 Zhou Y, Qin S, Wang K Biomarkers of drug-induced liver injury Curr

Biomark Find 2013;3:1 –9.

9 Gibb S Toxicity testing in the 21st century: a vision and a strategy Reprod

Toxicol 2008;25(1):136 –8.

10 Jennen D, Polman J, Bessem M, Coonen M, van Delft J, Kleinjans J.

Drug-induced liver injury classification model based on in vitro human

transcriptomics and in vivo rat clinical chemistry data Systems Biomed.

2014(ahead-of-print):e29400.

11 Mishra M, Fei H, Huan J Computational prediction of toxicity International

journal of data mining and bioinformatics 2013;8(3):338-348.

12 Meenakshi Mishra BP, Jun Huan Bayesian Classifiers for Chemical Toxicity

Prediction In: Bioinformatics and Biomedicine (BIBM), IEEE International

Conference: 12-15 Nov 2011; Atlanta, GA, USA IEEE 2011.

13 Greene N, Fisk L, Naven RT, Note RR, Patel ML, Pelletier DJ Developing

structure-activity relationships for the prediction of hepatotoxicity Chem

Res Toxicol 2010;23(7):1215 –22.

14 Ekins S, Williams AJ, Xu JJ A predictive ligand-based Bayesian model for

human drug-induced liver injury Drug Metab Dispos 2010;38(12):2302 –8.

15 Rodgers AD, Zhu H, Fourches D, Rusyn I, Tropsha A Modeling liver-related

adverse effects of drugs using knearest neighbor quantitative structure-activity

relationship method Chem Res Toxicol 2010;23(4):724 –32.

16 Huang SH, Tung CW, Fulop F, Li JH Developing a QSAR model for

hepatotoxicity screening of the active compounds in traditional Chinese

medicines Food Chem Toxicol 2015;78:71 –7.

17 Zhang C, Cheng F, Li W, Liu G, Lee PW, Tang Y In silico prediction of drug

induced liver toxicity using substructure pattern recognition method Mol

Inf 2016;35(3-4):136 –44.

18 Custer LL, Sweder KS The role of genetic toxicology in drug discovery and

optimization Curr Drug Metab 2008;9(9):978 –85.

19 Valerio Jr LG, Cross KP Characterization and validation of an in silico

toxicology model to predict the mutagenic potential of drug impurities.

Toxicol Appl Pharmacol 2012;260(3):209 –21.

of Salmonella mutagenicity and its application in the safety assessment of drug impurities Toxicol Appl Pharmacol 2013;273(3):427 –34.

21 Chen M, Vijay V, Shi Q, Liu Z, Fang H, Tong W FDA-approved drug labeling for the study of drug-induced liver injury Drug Discov Today 2011;16(15-16):697 –703.

22 Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, et al DrugBank 4.0: shedding new light on drug metabolism Nucleic Acids Res 2014;42(Database issue):D1091 –1097.

23 Pence HE, Williams A ChemSpider: an online chemical information resource.

J Chem Educ 2010;87(11):1123 –4.

24 Williams AJ TV, Golotvin S, Kidd R, McCann G ChemSpider - building a foundation for the semantic web by hosting a crowd sourced databasing platform for chemistry J Cheminf 2010;2 Suppl 1:O16.

25 Yap CW PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints J Comput Chem 2011;32(7):1466 –74.

26 Chang C-C, Lin C-J LIBSVM: a library for support vector machines ACM Trans Intell Syst Technol 2011;2(3):27.

27 Xu JJ, Henstock PV, Dunn MC, Smith AR, Chabot JR, de Graaf D Cellular imaging predictions of clinical drug-induced liver injury Toxicol Sci 2008;105(1):97 –105.

28 Japanese Traditional Medicine and Therapeutics [https://kampo.ca/]

29 Ji ZL, Zhou H, Wang JF, Han LY, Zheng CJ, Chen YZ Traditional Chinese medicine information database J Ethnopharmacol 2006;103(3):501.

30 Xue R, Fang Z, Zhang M, Yi Z, Wen C, Shi T TCMID: Traditional Chinese Medicine integrative database for herb molecular mechanism analysis Nucleic Acids Res 2013;41(Database issue):D1089 –1095.

• We accept pre-submission inquiries

• Our selector tool helps you to find the most relevant journal

• We provide round the clock customer support

• Convenient online submission

• Thorough peer review

• Inclusion in PubMed and all major indexing services

• Maximum visibility for your research Submit your manuscript at

www.biomedcentral.com/submit

Submit your next manuscript to BioMed Central and we will help you at every step:

Định dạng
Số trang	10
Dung lượng	1,07 MB