STRUCTURE ACTIVITY RELATIONSHIPS 5.1 Introduction In the 1st part of this section, the selection and determination of the physicochemical parameters used to characterize the chalcones f
Trang 1SECTION FIVE STRUCTURE ACTIVITY RELATIONSHIPS
Trang 25 STRUCTURE ACTIVITY RELATIONSHIPS
5.1 Introduction
In the 1st part of this section, the selection and determination of the
physicochemical parameters used to characterize the chalcones for quantitative
structure-activity relationship studies are presented This is followed by a discussion
of the multivariate analysis of these properties and their relationship to antimalarial activity The 2’-hydroxychalcones are not included in this part of the analysis In the
2nd part of this section, the structure activity relationships for antimalarial and
antileishmanial activities of the chalcones are compared in qualitative terms and by comparative molecular field analysis (CoMFA) with the aim of determining if
structure-activity correlations for these two activities are mutually exclusive or share some elements of similarity All the chalcones (including the 2’-hydroxychalcones) are included in this analysis
5.2 Selection and Determination of Physicochemical Parameters
5.2.1 Selection of parameters
Parameters for quantitative structure-activity relationships can be classified in several ways They can be categorized as global or substituent parameters according
to whether they describe the properties of the whole molecule (global) or that of a
substructure (substituent) The parameters may describe various aspects of the
molecule or substituent, namely its steric (size and shape) properties, lipophilicity
(affinity for a lipophilic environment) and electronic (charge distribution, electron
withdrawing/donating tendencies) characteristics These properties are important in determining how a compound interacts with its chemical or biological environment In
Trang 3this investigation, eleven descriptors representing size, lipophilicity and electronic
properties are used to characterize the chalcones (Table 2, Appendix)
Lipophilicity is a property that has attracted more attention than other
descriptors used in structure-correlation studies This is because lipophilicity is an
important determinant of a compound’s ability to transverse biological membranes and influences the rate and extent to which it interacts with its site of action Lipophilicity
is most commonly measured as partition coefficient (P), which is experimentally
determined by a variety of methods such as the traditional shake flask method using octanol and buffer, chromatographic methods and potentiometry 119 It can also be
determined in silico Most molecular modeling programs are capable of assigning “log
P” values to energy-minimized structures in a fraction of a second
In this investigation, the lipophilicity was evaluated by experimental and in
silico methods The experimental method involves determining the retention times of
a chalcone as it elutes out from a hydrophobic stationary phase using a mobile phase comprising of methanol-water in different proportions The retention time reflects the partitioning of the compound between the non-polar stationary phase and the polar mobile phase A lipophilic compound will be preferentially adsorbed onto the non-polar column and exhibits a longer retention time than a less lipophilic compound for the same mobile phase Retention time is also influenced by the proportion of
methanol (less polar than water) in the mobile phase: as the proportion of methanol in the mobile phase increases, the retention time decreases Retention times are corrected for the dead-space in the column and given as a logarithmic term (log k’ or capacity factor) Plotting capacity factors on the y-axis against % composition of methanol (x axis) gives a straight line, from which extrapolation to the y-axis gives the capacity factor of the chalcone at 0% methanol (= 100% water or buffer solution) This value
Trang 4(log kw) is taken as a measure of the lipophilicity of the compound at pH 7.0, which is the pH of the buffer There are some missing log kw values, notably among the
hydroxylated chalcones due to difficulties in the experimental determination
Capacity factors were not determined at other pH values besides pH 7 A
determination at pH 5 may be warranted in view of the hypothesis that the chalcones interfere with hemoglobin degradation, a process which occurs within the acidic food vacuole of the parasite However, the chalcones investigated in this study are neither acids nor bases Therefore their ionization characteristics should be similar at either pH
7 or 5, and the choice was to carry out determinations at physiological pH
The in silico determination of lipophilicity is made using the Sybyl software
which gives the ClogP of the chalcone in question The correlation between the
experimentally determined and in silico lipophilicity values is examined using Pearson
correlation The two values are significantly correlated to one another at the level of p
< 0.05 (n = 80, r = 0.256) (Table 3, Appendix) despite the fact that the experimentally
determined log kw values were determined at pH 7.0 while the ClogP values are for the non-ionized compounds Noting that the chalcones are essentially neutral compounds
[with the exception of the quinolinyl (e.g 27, 28) and pyridinyl (e.g 207, 209)
chalcones and members carrying aromatic amino substituents on ring B (e.g 4, 103)],
ionization differences may not matter in this case
Size parameters are generally intrinsic descriptors of molecular shape and bulk Typical examples are molecular weight, volume and surface area In this investigation, Connolly surface area and volume parameters (log V, log A), molecular refractivity (MR), and a “dummy” parameter to indicate the occurrence of disubstitution in ring A are used to model the bulk and shape of the compounds Except for the “dummy”
parameter, the other parameters are determined in silico The Connolly parameters
Trang 5measure the solvent accessible molecular surface Molar refractivity is obtained from the Lorentz-Lorenz equation:
Molar refractivity = [(n2-1) × molecular weight] / [(n2+1) × density]
Since n (refractive index) does not vary significantly for organic compounds and molecular weight / density is equivalent to volume, molar refractivity is often used
as a crude steric parameter characterizing bulk (but not shape) of the molecule
The inclusion of the “dummy” parameter is prompted by the observation that about ¼ of the chalcones are disubstituted on ring A, which may influence the overall size and shape of the ring The presence of disubstitution is assigned a value of 1,
while monosubstituted or unsubstituted rings are assigned “zero”
As seen from Table 3 in Appendix, the size parameters (with the exception of
the dummy parameter) are significantly correlated to one another (p <0.01) In
addition, they are well correlated to the lipophilicity parameters, which is commonly observed in QSAR studies The dummy parameter is well correlated to Clog P and charge on the carbonyl oxygen (an electronic parameter) (p < 0.01) but not to other steric descriptors
The electronic characteristics of the chalcones were characterized by four
parameters, namely total dipole moment, the energies of the highest occupied
molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO), and
charge on carbonyl oxygen All these parameters are determined in silico In the case
of the trimethoxychalcones, an additional electronic parameter, chemical shift
difference of the carbonyl carbon (∆δ), was determined experimentally to assess the
electron donating/ withdrawing character of ring A, as described in Section 3.3.3
The global polarity of chalcones is given by its dipole moment HOMO and LUMO are measures of the ease of electron loss and electron gain of the compound
Trang 6respectively When the HOMO of a compound has a low energy value, the tendency for electron donation is poor On the other hand when the LUMO has a low energy value, electron gain into this orbital is facilitated The charge on the carbonyl oxygen reflects the degree of polarization of the carbonyl linkage A highly polarized carbonyl bond will give a higher negative charge on the oxygen atom The degree of
polarization will be influenced by the nature of rings A and B on the chalcone Table
3 in Appendix shows that the electronic parameters are highly correlated to one
another and to the size parameters as well The reasons for these correlations are not within the scope of this investigation
5.2.2 Determination of physicochemical parameters of chalcones by experimental methods
5.2.2.1 Determination of lipophilicity by reversed phase HPLC
Lipophilicity was determined from their capacity factors (k’) by a reversed phase HPLC method Separation was achieved on a LiChrosorb RP-18 (10 µM) stationary phase with a methanol-0.02 M phosphate buffer(pH 7.0) mobile phase At least four mobile phase compositions were investigated for each compound, with the methanol content ranging from 50-85% w/w for each composition Determinations were carried out at 30oC, with the flow rate adjusted to 1.0-1.5 ml/min depending on the mobile phase composition, with UV detection set at 280 and 330 nm A stock
solution (10 mg/ml) of the compound was prepared in methanol For each mobile
phase composition, equal volumes (20 µl) of the stock solution and an acetone stock solution (10% v/v acetone in the mobile phase) were diluted to 200 µl with the mobile phase and an aliquot (10 µl) was injected for the determination of retention time
Triplicate determinations were done for each concentration of test compound at each
Trang 7mobile phase composition The capacity factor (k’) was determined from log k’ = log [(Vs – Vo) / Vo], where Vs and Vo are the retention volumes of the test compound and acetone respectively Linear regression of log k’ of each compound against mobile phase composition, and extrapolation to 100% aqueous phase gave log kw of the
compound at pH 7.0
5.2.2.2 Determination of the chemical shift of the carbonyl carbon
The method is described in Section 3.3.3.
5.2.2.3 Determination of physicochemical parameters of chalcones by molecular
method Orbital energies for HOMO and LUMO were calculated from MOPAC
(QCPE program #455, version 6.0), which is interfaced with SYBYL
5.3 Multivariate Analysis of Structure Activity Relationships (SAR) of
Antimalarial Chalcones
5.3.1 Introduction
Data collected in science and technology are increasingly multivariate in
character, involving multiple variables (K) measured on multiple observations (N)
Trang 8One consequence of this trend is that data sets tend to assume a “short and fat”
appearance, because a disproportionately large number of variables are collected for a relatively small number of observations For example, in chemical synthesis, the
difficulties encountered in synthesis tend to keep the number of compounds (N) in the range of 10 to 100 However, the number of variables (K) that can be collected for these compounds may run to hundreds or even thousands, being greatly facilitated by molecular modeling methods like comparative molecular field analysis (CoMFA) and GRID 120 This creates problems during the analysis of the data set Multiple regression techniques like Hansch analysis are of limited use in this situation as they have been developed to handle few and independent x-variables recorded for a relatively large number of compounds (that is, “long and thin” data sets) In a situation where K >> N and where collinearity is likely to exist among the x variables, multivariate data
analysis is the preferred option
In this investigation, conventional Hansch analysis was initially employed to derive structure activity relationships for the antimalarial chalcones This was
prompted by the fact that the available data set has a “long and thin” rather than “short and fat” appearance, as there are only 11 x-variables and more than 100 compounds being considered However, no reasonable correlation could be obtained The presence
of correlated physicochemical parameters (Table 3, Appendix) and missing
descriptors for some compounds are probable reasons for the failure to adequately
represent activity by a single regression equation In fact, the prevailing conditions would make analysis by multivariate tools more appropriate
In this investigation, two multivariate data analytical methods are used
Principal component analysis (PCA) is a multivariate projection method designed to give an overview of the dominant patterns and trends in the x-data matrix PCA is
Trang 9essentially a pattern recognition technique The x-variables (physicochemical
descriptors) are reduced to a smaller number of latent variables (or principal
components) that retain the maximum information from the original data matrix
Compounds that are characterized by common principal components will cluster
together, indicating that they are “similar” with respect to the variables considered 120
The 2nd technique employed is that of partial least squares projection to latent structures (PLS) PLS is a regression extension of PCA widely employed when there
is the need to connect the information in 2 blocks of variables, which in this case
comprises the x-variables (summarized as principal components) and biological
activity (y-variable) The usefulness of PLS lies in its ability to analyze data with
several noisy, collinear and incomplete variables in both x and y data matrices.120
PCA and PLS are carried out using the SIMCA-P (Umetrics AB, version 8.0, Umea, Sweden, 1999) software
5.3.2 Statistical methods
Multiple linear regression and Pearson correlation analyses were carried out using SPSS 10 (SPSS Inc., Chicago, IL) The following statistical parameters were determined for each regression equation: 95% confidence interval variables, measure
of explained variance r2, Fischer significance ratio F at P = 0.05 and standard error SE Cross-validated r2 and SE were determined using QSAR module of SYBYL 6.6
Multivariate data analyses were performed with SIMCA-P (Umetrics AB,
version 8.0, Umea, Sweden, 1999) using default settings
5.3.3 Results and discussion
5.3.3.1 Principal component analysis (PCA)
Trang 10Principal component analysis (PCA) was carried out separately on the
alkoxylated and hydroxylated chalcones The outcome of each analysis was viewed in
a score plot which provides a summary of the relationships among the compounds Compounds close to each other have similar physicochemical properties (which have been summarized as principal components) while compounds far away from each other have dissimilar profiles The Hotelling eclipse is also depicted in the score plot
Compounds outside the eclipse are outliers, that is, their location in the score plot has arisen by chance (> 5 of out 100 times: p > 0.05)
The score plot of alkoxylated chalcones is given in Figure 5.1 It can be seen
that the trimethoxychalcones (∆) are clustered in the upper right quadrant of the
Hotelling eclipse In contrast, most of the methoxychalcones are found in the lower left quadrant while members of the ethoxy and dimethoxychalcones are distributed over the upper left and lower right quadrants The clustering of specific ring B
alkoxylated chalcones suggests that these compounds share similar physicochemical properties that are encoded in the principal components Such a pattern implies that QSAR modeling of the alkoxylated chalcones as a whole is be of little value An
appropriate approach is to distinguish the chalcones according to their clustering
patterns and to investigate the QSAR of these clusters separately Such an approach has been applied by others 121 with good results Thus, the relationship between
activity and physicochemical properties is explored separately for the methoxy and trimethoxychalcones, and jointly for the ethoxy and dimethoxychalcones
In the case of the hydroxylated chalcones, the score plot shows a homogenous distribution of the hydroxy and dihydroxy members throughout the 4 quadrants
(Figure 5.2) Therefore, it is reasonable to consider both series together as they
appear to share common characteristics
Trang 11Figure 5.1 Score plot of principal components t1 against t2 for alkoxylated chalcones
4’-ethoxychalcones (∆); 2’,4’-dimethoxychalcones (O); 4’-methoxychalcones (×); trimethoxychalcones (∆) The location of 2’,4’-dimethoxy-4’-butoxychalcone (41) is indicated
2’,3’,4’-in the score plot The ellipse corresponds to the confidence region based on Hotell2’,3’,4’-ing T2
(0.05)
Figure 5.2 Score plot of principal components t1 against t2 for hydroxylated chalcones
4’-Hydroxychalcones (∆); 2’,4’-Dihydroxychalcones (Ο)
The ellipse corresponds to the confidence region based on Hotelling T2 (0.05)
To summarize, PCA has been employed to categorize the chalcones according
to shared physicochemical properties as revealed by their positions in the score plots The alkoxylated chalcones appear to be a heterogenous class with three distinguishable groupings: trimethoxychalcones, methoxy chalcones and a combined group of
Trang 12dimethoxy and ethoxychalcones The hydroxylated chalcones are more homogenous
and it was not considered necessary to distinguish between the dihydroxy and
4-hydroxy members for the next stage of analysis
5.3.3.2 Projection to latent structures (PLS) analysis
Whereas the main role of PCA is to classify and summarize a data set, the
primary function of PLS is to relate two data matrices to each other In this case, the
two data matrices are the biological activity (Y) and the principal components
summarizing the descriptors of the compounds under study Following the lead given
by PCA, the chalcones are divided into groups (trimethoxychalcones,
methoxychalcones, dimethoxy- and ethoxychalcones, hydroxylated chalcones) and
PLS is carried out separately on each group A summary of the models derived for
each class and their relevant statistical characteristics are given in Table 5.1
Table 5.1 Summary of PLS models used to establish SAR for alkoxylated and
hydroxylated chalcones
0.00 0.377 0.595 0.723
-
-
- 0.276 (8) e
0.321 0.585 0.762
-
- 0.283 (6) e
0.486 0.606 0.779
-
- 0.689 (7) e
0.214 0.602
- 0.425 (13) f
0.563 0.598 0.833
-
- 0.299 (13) f
a Number of significant components obtained in the PLS analyses
b N = number of compounds used in computation of model
Trang 13c RMSEE = Root mean square error of estimation, which is obtained from the plot of
predicted against observed activity for the model under consideration
d Training set
e RMSEP (root mean square error of prediction) for all compounds except those in the
training set and compounds identified as outliers
f RMSEP for all compounds including those identified as outliers, but excluding compounds
in the training set
antimalarial activity (IC50 15.8 µM) It ranks high (6th) in terms of in vitro activity
among the 18 compounds of this series, indicating that substitution of the phenyl ring
A, in most instances, resulted in a fall in activity As to which A-ring substituents
enhanced activity, a casual examination suggests that these would be
electron-withdrawing groups (Refer to Table 4.1 in Section 4.3.1) Another noteworthy point relates to the location of the butoxy derivative 41 in the PCA score plot of the entire series of alkoxylated chalcones (Figure 5.1) This compound which has been reported
to have outstanding antimalarial activity by others 70 is located in the same quadrant (upper left) as the trimethoxychalcones This is taken to imply that the
physicochemical properties of the butoxy analogue 41 is more alike that of the
trimethoxychalcones than the other alkoxylated series Therefore, 41 is considered
together with the trimethoxychalcones in the PLS analysis
The 1st model derived from PLS analysis of the trimethoxychalcones (including
41 and all 11 physicochemical descriptors) failed to give a significant model (Model
1, r2 = 0.497, q2 = 0.000) The cross validated r2 (q2) has a value of zero which
indicates that this model is as good as having no model at all! In general, models with
Trang 14q2 values of 0.3 to 0.5 are considered to reflect acceptable predictive abilities In order
to find out why the model has failed, the t-u score plot was examined The score plot
displays the observations in the projected X (t) and Y (u) space and shows how well the Y space (biological activity) correlates to the X space (descriptors) Ideally, a
straight-line plot through the origin should be obtained In this case, 5 outlier
compounds (4,11,13, 28, 129) could be identified from the score plot These outliers
were omitted and the PLS repeated using a smaller number of compounds (n = 14) This time, an improved model (2), which explained 77% (r2 = 0.766) and predicted 38% (q2 = 0.377) of the activity was derived (Table 5.1)
With 11 descriptors in model (2), it is highly probable that some descriptors are more important than others Removal of the less important variables is one means of improving the predictability of the model The relative importance of each variable can be assessed from its VIP (variable importance in the projection) value VIP is the weighted sum of squares of the PLS weights w* taking into account the amount of explained Y variance of each dimension 122 The most attractive property of VIP is its intrinsic parsimony: it provides the most condensed summary of a PLS model For a given model and problem, there is only one set of VIP values, with variables having VIP > 1 generally considered to make a greater contribution to activity However, omission of variables based on the VIP values must be done cautiously as it is possible
to omit too many variables and derive an apparently good model without sound
predictive power One way to ensure that this does not happen is to view the VIP plot and the accompanying values concurrently Variables with closely similar values will have the appearance of “steps” and these may be omitted if they fall below the
threshold value (usually set at 0.7 to 0.8) On the other hand, important variables will
Trang 15have the appearance of descending steps, indicating their relative importance, an
example of which is given in Figure 5.3
Figure 5.3 VIP plot from PLS analysis of data from model 3
LUMO = lowest empty molecular orbital;
HOMO = highest occupied molecular orbital;
Charges = charge on carbonyl oxygen;
log kw = hydrophobicity parameter obtained from reverse phase HPLC, where kw is the
capacity factor in 100% water;
TDM = total dipole moment
Based on these criteria, disubstitution, molar refractivity, difference in 13C
NMR chemical shift of carbonyl carbon, log A and log V were omitted because of their low VIP values and location on the same “low” step in the plot Of the remaining descriptors, the lipophilicity parameters (ClogP and log kw) have comparable VIP
values (0.935, 0.911) and it was decided that only one parameter needed to be retained
in the final model The experimentally determined log kw was preferred Model 2 was thus reevaluated with a smaller number (5) of physicochemical parameters The new model 3 shows a marked improvement in predictive ability (q2 = 0.595, Table 5.2) and
is taken as the final PLS model for the trimethoxychalcones The score plot of model 3
Trang 16is given in Figure 5.4 It is used to derive a training set of compounds and to obtain
structure-activity correlations for the trimethoxychalcones
Figure 5.4 Coefficients plot from PLS analysis of data from model 3
Parameters with positive coefficient values relate directly to activity, while those with negative coefficients are inversely related to activity In this plot, only dipole moment (TDM) is
directly related to activity
To test the predictive power of model 3, a “training set” of compounds was selected from the existing 14 compounds Selection is made by viewing the score plot
of model 3 (Figure 5.4) and choosing compounds that are evenly distributed
throughout the plot Six compounds (3, 35, 40, 41, 128, 134) were selected and
evaluated as a model (Model 4) The statistical parameters of model 4 were found to
be reasonable (r2 = 0.985, q2 = 0.723) and the model was used to predict the activity of the remaining 8 compounds (test set) The outcome of this prediction is assessed from the root mean square error of prediction (RMSEP), which had an acceptable value of 0.276 Model 4 is also used to predict the activity of all the trimethoxychalcones,
including 41 and the 5 outliers which had been omitted earlier This time, a higher
RMSEP value (0.514) was obtained, indicating that model 4 is less successful in this respect
Trang 17Next, model 3 is used to explain structure-activity correlations among the
trimethoxychalcones It is seen from the VIP and regression coefficients plots that
electronic parameters are the main contributors to antimalarial activity (Figure 5.3, Figure 5.4) Like the VIP plot, the regression coefficient plot identifies the most
important variables but also indicates their relationship to activity (direct or inverse) The 5 parameters, in order of increasing importance, are dipole moment < log kw < charge on carbonyl oxygen < HOMO < LUMO These parameters are inversely
related to activity as seen from the coefficient plot (Figure 5.4) Thus, one would
expect good activity in a trimethoxychalcone which has low energies for its LUMO and HOMO, a weakly polarized carbonyl function which would result in a small
charge on the carbonyl oxygen, low lipophilicity and a small total dipole moment
The energies of HOMO and LUMO serve as indicators of the electron donating and electron acceptor abilities of the molecule respectively The higher the energy of HOMO, the better its electron donating ability while a lower energy LUMO
predisposes towards good electron acceptor ability Since the inverse relationship
associates LUMO and HOMO of lower energies with better activity, one can conclude that good electron acceptor ability (low HOMO) and poor electron donor ability (low LUMO) are important for good activity Noting that the substitution pattern on ring B
is constant, the substitution pattern on ring A that would satisfy the low LUMO/ low HOMO requirement would be that of electron withdrawing substituents Such
substituents will make ring A electron deficient and a better electron acceptor (or poor donor) and according to the present structure activity relationship, will give rise to a more active antimalarial chalcone Moreover, the inductive/mesomeric effect of the electron deficient ring A can be transmitted along the α,β-unsaturated carbonyl chain,
Trang 18resulting in a less polarized carbonyl linkage This fits in with the requirement for a small negative charge on the carbonyl oxygen (less polarized carbonyl bond)
5.3.3.2.2 Methoxychalcones
The most active compound in this series is
1-(4’-methoxyphenyl)-3-(3-quinolinyl)-2-propen-1-one (31), with an IC50 of 4.8 µM Interestingly, the most active
trimethoxychalcone (27) also has 3-quinolinyl as ring A This is a recurring feature
observed among all four series of alkoxylated chalcones The methoxychalcone with
an unsubstituted A ring (135) has poor antimalarial activity (IC50 = 55.5 µM) When
ring A is substituted with polar, electron withdrawing groups like nitro (115) and
cyano (117), activity is reduced Substitution with other groups resulted in either
improvement or little change in activity
A significant PLS model 5 was obtained only after omission of outlier
compounds from the series The outliers 19, 31, 112were detected visually from the PLS score plot of the 14 methoxychalcones and from their large residual values when the observed and model- predicted values were compared There is some concern over
the omission of 19 and 31 as these compounds are comparatively more active than the
rest (IC50 < 10 µM) However, their omission was necessary to give a significant component model, which accounted for 63% and predicted 32% of antimalarial
one-activity As before, the model was improved by removing less important descriptors identified from the VIP and coefficients plot The improved model (6) could predict the activity of 11 methoxychalcones using 5 descriptors with q2 of 0.585 The
predictive power of the model was further confirmed by selecting a training set of 5 compounds (model 7) and using it to predict activity of the remaining 6 compounds as
a test set (RMSEP=0.283) The training set was also used to predict the activity of all
Trang 19the methoxychalcones (n = 9, including 3 outliers) Prediction was only satisfactory with RMSEP of 0.520
The VIP plot of model 6 identified disubstitution of ring A (most important), log kw and HOMO (least important) as parameters contributing to activity Activity is directly related to disubstitution of ring A and HOMO, but inversely related to log kw The emphasis on disubstitution may be biased as there are only 3compounds (111, 22, 113) with disubstituted rings A in model 6 (n=11) The lipophilicity parameter log kw
is important in determining activity As in the case of the trimethoxychalcones, the
relationship is inverse, i.e lower lipophilicity in the compound favors good
antimalarial activity The direct correlation between activity and HOMO suggests that the electron donating ability of the chalcone is important for good activity
5.3.3.2.3 Dimethoxychalcones and ethoxychalcones
The overlapping distribution of dimethoxy and ethoxychalcones in the PCA
score plot (Figure 5.1) suggests that there is a good reason to consider these 2 series
together in PLS analyses However, no significant PLS model could be obtained for the combined series and it was decided that they should be evaluated separately
The ethoxychalcones are singular in that no active compound (IC50 < 10 µM) is
identified from this series The unsubstituted ring A ethoxychalcone (136) has poor
activity (IC50 = 43 µM) and activity is further depressed by substitution / replacement
with 4-cyano (127), 2,4-dichloro (121) and 4-quinolinyl (34) The most active
compounds are the 3-quinolinyl (33) and 4-trifluoromethyl (122) derivatives, closely
mirroring what is observed for the trimethoxy series No significant PLS model could
be obtained even after removing outliers or reducing the number of descriptors This implies that the present descriptors cannot adequately describe the variation in activity
Trang 20of the ethoxychalcones New variables are necessary if a successful PLS is to be
obtained
The dimethoxy series is outstanding in having the greatest representation of active as well as poorly active (IC50 > 100 µM) chalcones As the unsubstituted ring A dimethoxychalcone was not synthesized, it is not possible to draw conclusions on the effect of substitution on ring A A significant PLS model 8 was derived after
omitting 5 outlier compounds (5, 103, 106, 104, 110) (r2 = 0.694, q2 = 0.486) and this was further improved when descriptors which contributed little to activity (identified through coefficient and VIP plots) were omitted from the model The final model 9 accounted for 68% and predicted 61% of activity As before, a training set of 5
compounds (model 10) was selected and used to predict the activity of the remaining 7 compounds (test set) Only a satisfactory prediction was obtained in this case (RMSEP
= 0.689) Prediction was poorer (RMSEP = 1.346) when applied to the entire cohort of dimethoxychalcones (n = 12, including outliers)
The main parameters influencing activity of dimethoxychalcones are the size parameters (log A, logV), followed by log kw, molar refractivity and HOMO (least important) The effect of log kw on activity is inverse, as observed for the methoxy and trimethoxychalcones The other parameters affected activity directly
5.3.3.2.4 Hydroxy and dihydroxychalcones
Since the PCA score plot of the hydroxylated chalcones (n =30) is evenly
distributed throughout the four quadrants (Figure 5.2), these two series should be
considered together for PLS analysis Nevertheless, attempts were made to analyze the
2 series separately but these were not successful The predictive q2 of the PLS models
Trang 21were poor, even after removal of outlying compounds Therefore, it was decided that the 2 series should be considered together
Applying PLS to the combined hydroxylated chalcones necessitated the
removal of nearly half of the compounds as outliers, a clearly undesirable move To avoid this, a training set of compounds was directly selected from the 30 hydroxylated chalcones (model 11) The number of descriptors in this model was then reduced to give a final training set of 17 compounds and 4 descriptors (dipole moment, branching, molar refractivity, log kw) (model 12: r2 = 0.818, q2 = 0.602) It was able to predict the activity of the remaining 13 compounds with a satisfactory level of accuracy (RMSEP
= 0.425)
The coefficient and VIP plots of model 12 identified 4 descriptors as being important determinants of activity These are dipole moment, disubstitution, molar refractivity and log kw, in order of decreasing importance Of these 4 descriptors,
dipole moment and log kw influenced activity directly while an inverse relationship was noted for disubstitution and molar refractivity Molar refractivity is significantly correlated to size descriptors for this series (p < 0.01) Thus, an active hydroxylated chalcone should have polar and small substituent(s) on ring A
5.3.3.2.5 Active hydroxylated and alkoxylated chalcones
In a final analysis, the active members of the hydroxylated and alkoxylated chalcones (n = 19) are gathered to form a separate series, a move prompted by
observations that many of the active compounds are outliers For example, 2 of the 3 outliers in the methoxy series are “actives” (IC50 < 10 µM) In the dimethoxy series, one active compound was identified as an outlier Even when the “actives” are not discarded as outliers, it is noted from the test sets that many of the active compounds