Medicinal plants contain a large variety of chemical compounds in highly variable concentrations, so the quality control of these materials is especially complex. With this purpose, regulatory institutions have accepted chromatographic fingerprints as a valid tool to perform the analyses.
Trang 1Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/chroma
models
A Gisbert-Alonso, A Navarro-Martínez, J.A Navarro-Huerta, J.R Torres-Lapasió∗,
M.C García-Alvarez-Coque
Department of Analytical Chemistry, Faculty of Chemistry, University of Valencia, C/ Dr Moliner 50, Burjassot 46100, Spain
a r t i c l e i n f o
Article history:
Received 23 February 2022
Revised 30 March 2022
Accepted 11 October 2022
Available online 13 October 2022
Keywords:
Medicinal plants
Global retention models
Bandwidth models
Multi-linear gradient elution
Prediction of chromatographic fingerprints
a b s t r a c t
Medicinal plants contain a large variety of chemical compounds in highly variable concentrations, so the quality control of these materials is especially complex With this purpose, regulatory institutions have accepted chromatographic fingerprints as a valid tool to perform the analyses In order to improve the results, separation conditions that maximise the number of detected peaks in these chromatograms are needed This work reports the extension of a simulation strategy, based on global retention models pre- viously developed for selected compounds, to all detected peaks in the full chromatogram Global models contain characteristic parameters for each component in the sample, while other parameters are com- mon to all components and describe the combined effects of column and solvent The approach begins
by detecting and measuring automatically the position of all peaks in a chromatogram, obtained prefer- ably with the slowest gradient Then, the retention time for each detected component is fitted to find the corresponding solute parameter in the global model, which leads to the best agreement with the measured experimental value The process is completed by developing bandwidth models for the se- lected compounds used to build the global retention model based on gradient data, which are applied to all peaks in the chromatogram The usefulness of the simulation approach is demonstrated by predict- ing chromatographic fingerprints for three medicinal plants with specific separation problems (green tea, lemon balm and linden), using several multi-linear gradients that lead to problematic predictions
© 2022 The Authors Published by Elsevier B.V This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/)
1 Introduction
In traditional medicine, preparations derived from plant tissues
have been used for thousands of years in the prevention and treat-
ment of diseases The therapeutic activity of medicinal plants is
due to the presence of biologically active chemical compounds,
which can act synergistically [ 1, 2] Due to the efficacy of treat-
ments based on these natural products and their low toxicity, its
use has been extended in recent years [3] Several factors can af-
fect the quality of medicinal plants, such as soil type, geographical
location, environmental conditions during growth, harvest season
and methods, storage conditions, and procedures for their prepa-
ration Therefore, the products must follow a quality control that
certifies the consumer their safety and pharmacological efficacy
∗ Corresponding author
E-mail address: jrtorres@uv.es (J.R Torres-Lapasió)
However, the high chemical diversity of natural products, in very different concentrations, makes quality control extremely difficult [2] To solve the problems found in the sanitary control of medic- inal plants, due to their complex composition, the World Health Organization (WHO), the United States Food and Drug Administra- tion (FDA), and the State Food and Drug Administration of China (SFDA), have accepted chromatographic fingerprints as a valid tool
to guarantee their quality [4–7] Probably, the most problematic aspect that prevents the devel- opment of methods to optimise fingerprint resolution is finding re- tention models that describe all the components in the samples, in situations where there are no standards [8–11] Recently, we have developed an approach to describe the retention behaviour of un- known compounds in a chromatogram using global models [ 12, 13] The purpose is to get a set of model parameters to predict the be- haviour of a group of compounds (known or unknown), as an al- ternative to the use of parameters focused to each compound In global models, some parameters are specific of each solute, while
https://doi.org/10.1016/j.chroma.2022.463561
0021-9673/© 2022 The Authors Published by Elsevier B.V This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
Trang 2other parameters describe the combined effects of column and sol-
vent, and are common for all solutes
Our proposal consists in, once the chromatograms of the sample
are obtained according to a certain experimental design described
in Part I, the peaks for several compounds (which we have called
“reference peaks”) are selected to get the chromatographic infor-
mation required to build the global model The peaks for the refer-
ence compounds are preferably those with the highest intensity, or
at least peaks that can be tracked in all training gradients For this
purpose, the identity of these compounds is not needed There are
some rules for the selection of the reference peaks: they are only
subjected to the condition that the equivalent peaks should be eas-
ily recognizable in all gradients For instance, reference peaks could
be very intense peaks that stand out from the others due to their
intensity or position, or that give rise to easily identifiable patterns
with their neighbouring peaks The presence of outliers or abnor-
mal scattering in the correlation plots of the individual models re-
veal incidental mistakes in peak identification
Part I of this work [13] reports the construction of global re-
tention models for the reference compounds in chromatographic
fingerprints of extracts of medicinal plants, using the information
obtained from appropriate experimental designs The applied de-
signs were based on a common scouting linear gradient and con-
sist of several related multi-linear gradients, which also facilitated
peak tracking [14]
In Part II, the global retention models obtained with the ref-
erence compounds are extended to include all other components
in the chromatogram giving rise to detectable peaks The infor-
mation required to update the global retention model is prefer-
ably obtained from the chromatogram corresponding to the slow-
est experimental condition, amongst those in the training design
after baseline correction [15] The extended model including all de-
tectable peaks in that chromatogram was used to predict full chro-
matograms at any new arbitrary experimental condition The con-
struction of bandwidth models for the reference compounds allows
full chromatogram predictions in gradient elution Simulated chro-
matograms were tested with extracts of Camellia sinensis (green
tea), Melissa officinalis (lemon balm), and Tilia platyphyllos (lin-
den), with satisfactory results
2 Experimental
2.1 Preparation of extracts of medicinal plants
The reversed-phase liquid chromatographic (RPLC) separation of
extracts of three medicinal plants (green tea, lemon balm and lin-
den) was studied Lemon balm and linden were purchased in bulk
from a local store, while green tea was marketed in individual bags
in a supermarket The extracts of the three plants were processed
following the recommendations of Alvarez-Segura et al [16] Due
to sample heterogeneity, dry portions of each plant were grinded
One gram of the powder was weighted and transferred to a Falcon
tube, to which 15 ml of a solution prepared with nanopure water
(Adrona B30 Trace, Burladingen, Germany), and 70% ( v / v ) methanol
(Scharlau, Barcelona, Spain) was added The Falcon tube content
was sonicated during 60 min at 80 °C Finally, the solution was
centrifuged at 30 0 0 rpm during 5 min
2.2 Chromatographic separation
The supernatant was taken from the Falcon tube with a syringe,
and filtered through a 0.45 μm pore size Nylon membrane (Mi-
cron Separations, Westboro, MA, USA) into a vial, before injection
The separation was performed using gradient elution with hydro-
organic mixtures, prepared by mixing nanopure water and HPLC
grade acetonitrile (Scharlau), both containing 0.1% ( v / v ) formic acid
(Acros Organics, Fair Lawn, NJ, USA) Peak monitoring was carried out between 210 and 280 nm with 10 nm increments Other details are given in Part I [13]
To establish the acetonitrile working limits in the experimental design for each medicinal plant, a preliminary scouting gradient was used where the modifier concentration was increased linearly from 5 to 100% ( v / v ) in 60 min [13] Sets of training gradients were proposed attending to the peak distribution in the chromatograms observed with the scouting gradient All the gradients included the necessary additional steps for column cleaning to remove the most hydrophobic components, and re-equilibration before the next in- jection
For each medicinal plant, a training experimental design, con- sisting of 6–7 multi-linear gradients with an intermediate node of variable position ( Fig.1), was used These designs allowed explor- ing extreme compositions, without giving rise to excessive reten- tion times for the most hydrophobic components, or too short for the most hydrophilic A final advantage is that this type of designs facilitates tracking the identity of the peaks of the reference com- pounds when the elution conditions are varied In Fig.1, it can be seen that the modifier concentration ranges, covered by the gra- dients for each medicinal plant, are rather different, reflecting the differences in the nature of the components in each sample, and consequently, in the distribution of chromatographic peaks The construction of the training experimental design for each type of sample, as well as other details for the chromatographic separation, are given in Part I [13] To verify the prediction per- formance of the global models, several gradients not included in the experimental design (validation gradients) were used (gradi- ents tagged as E in Fig.1)
2.3 Software
All data treatment was carried out with Matlab 2020a (The MathWorks Inc., Natick, MA, USA) Baseline subtraction in the ex- perimental chromatograms was done with the BEADS algorithm [15] Automatic peak detection and measurement was carried out using Matlab functions developed in our laboratory [16] These functions automatically analyse baseline-free signals to locate the peaks and obtain the values of retention times, half-widths and peak areas, together with other additional information
3 Theory
3.1 Global retention models for the reference compounds
The approach proposed in this work to simulate chromato- graphic fingerprints needs previous fitting of a global model for a set of selected compounds, with peaks distributed along the chro- matogram (i.e., the so-called “reference compounds”) Knowledge
of the chemical nature of these compounds is not needed, but their identity should be established unequivocally in the chromatograms run with all training gradients Also, the peaks should be intense enough for a proper detection under weak elution conditions Guidelines for selecting the peaks for the reference compounds are given in Part I of this research [13] There, the performance of global retention models based on the equations proposed by Sny- der [17], Schoenmakers [18], and Neue-Kuss [19], was compared From these, the Neue-Kuss equation:
offered the best results Therefore, only this equation will be con- sidered in Part II of this work, reformulated as:
Trang 3Fig 1 Training (G) and validation (E) gradients, used to obtain the global models
and evaluate the accuracy of the predictions of chromatographic fingerprints, re-
spectively, for: (a) green tea, (b) lemon balm, and (c) linden
to get model parameters less dissimilar in scale, which facilitates convergence [13]
The global model can be represented by the [ b, , log k 0,1, log k 0,2, …, log k 0,ns] vector, where b and are the common col- umn/solvent parameters, and log k 0,i, the specific solute parame- ters The steps needed to fit the global model are briefly outlined below (see Part I for more details):
(i) First, the retention data for each reference compound i are in- dividually fitted to Eq.(2), in order to obtain the values of the
b i, iand log k 0,iparameters For this purpose, the whole set of experimental retention times measured with all training gradi- ents is used
(ii) The medians of the parameters that describe the behaviour of column and solvent for each reference compound, obtained in step (i) ( b m and m), are taken as initial estimates of the global parameters, while the log k 0,ivalues for each compound i are fitted
(iii) Parameters b and are then fitted, this time keeping fixed the values of log k 0,ifound in the previous step, and attending si- multaneously to the prediction of all solutes and training gra- dients
(iv) Finally, all parameters defining the [ b, , log k 0,1, log k 0,2, …, log
k 0,ns] vector in the global retention model are altogether opti- mised using all available data
(v) If necessary, the process is repeated from step (ii) until conver- gence
3.2 Extension of global retention models to all detected peaks in the medicinal plants
The global retention models obtained for the reference com- pounds allow predictions exclusively involving the reference com- pounds, for any arbitrary gradient However, the goal of this re- search is the prediction of full chromatograms for the medicinal plants, which can include several hundred compounds Therefore,
we developed an approach to extend the global models fitted with the data of the reference compounds, to the prediction of retention for all detected peaks in the chromatograms
The global retention model, initially established with the refer- ence compounds, was modified to include other components in the chromatogram, as follows:
(i) First, a chromatogram obtained with a gradient belonging to the training design is selected, preferably that one with the largest number of detectable peaks, which is usually the gradi- ent with the lowest initial slope in the design Before being pro- cessed, the baseline is subtracted from the experimental chro- matogram using an adequate algorithm This chromatogram will be referred to as “base chromatogram”
(ii) Next, the position of all detected peaks in the base chro- matogram is measured, using an automatic analysis function These peaks are those exceeding certain acceptability thresh- olds, such as a critical height or bandwidth The autodetection software developed in our laboratory was applied for this pur- pose [16]
(iii) The retention times for all detected peaks ( t R,i) (the reference peaks or any other exceeding the detection thresholds) are obtained, together with other measurements that define the bandwidths and areas
(iv) The process followed to extend the global model, to all de- tected peaks, consists of least-squares fitting, where the column and solvent parameters ( b and ) are kept fixed to the values found with the reference peaks, whereas the specific parame- ters log k 0,i(related to solute hydrophobicity) describe the ex- perimental retention times ( t R,i) for all solutes (reference com-
Trang 4pounds or any other), when they elute with the gradient asso-
ciated with the base chromatogram
(v) With this information ( b and and log k 0,i), the chromatogram
for any other arbitrary gradient can be predicted
Following this protocol, the effect of the modifier was deter-
mined with several gradients with very different profiles and some
representative solutes, whereas the effect of the solute hydropho-
bicity (which ideally should be mobile-phase independent) was ob-
tained only with the gradient in the design showing a maximal
number of peaks A vector gathering the parameters of the global
model [ b, , log k 0,1, log k 0,2, ] is thus obtained This vector can be
rearranged into a collection of smaller [ b, , log k 0,i] vectors, each
of them associated to the individual retention model for solute i
In order to speed up and favour the convergence of the ex-
tended global model, several options were tried The best one was
carrying out a sequential fitting, where the specific solute param-
eters are determined solute-by-solute in decreasing hydrophobic-
ity order, so that the log k 0 value found for solute i is used as
an initial estimate for solute i – 1 This operation mode acceler-
ates considerably the regression process, and increases the chances
of obtaining a good fitting in a single attempt Other options that
were tried with less success were: (i) independent fittings using
the same initial estimate (log k 0) for all solutes, and (ii) sequential
fittings, where the solution found for solute i was used in increas-
ing hydrophobicity order
3.3 Global bandwidth models for the reference compounds
To be realistic and practical, the simulation of chromatograms
requires not only the prediction of peak location, for each compo-
nent in the sample as the elution conditions change, but also the
peak bandwidths Although some peaks present anomalous band-
widths, often due to partial co-elution or other phenomena, what
really matters is that most peaks in fingerprints are well predicted
In this work, chromatographic peak profiles were simulated us-
ing a modified Gaussian model, where the standard deviation de-
pends on the distance to the retention time [ 20, 21] (see Sup-
plementary material) The parameters of the Gaussian model can
be related to the peak retention time, area and widths (or half-
widths) In turn, the bandwidths can be correlated with the re-
tention times, giving rise to a family of global models based on
the generalisation of the concept of chromatographic efficiency ( N )
[22–24] Bandwidth models describe the trend of chromatographic
peaks to broaden, as the retention time increases In this work, the
measurement of bandwidths was carried out when the signal was
attenuated to 10% of the maximal peak height
If the starting data are isocratic, the experimental band-
widths are directly correlated with the respective retention times
Parabolic trends are usually obtained [23]:
w=ω0+ω1tiso+ω2t2
which can be often assimilated to a linear behaviour In Eq.(3),
w can be the peak width (or the left or right half-widths),
and t isois the isocratic retention time
For gradient elution, the relationship between the bandwidths
and retention time is not direct However, enough accuracy can be
obtained by applying the Jandera approximation [25], although it
is only strictly valid for linear gradients This approximation pos-
tulates that, under gradient elution, the bandwidth of a solute i
is the same as that obtained if it migrated isocratically using a
mobile phase at the instant composition ϕj, reached by gradient
j when the solute leaves the column Although the source data
come from gradient experiments, the prediction of gradient reten-
tion times provides collaterally the instant composition when the
solute leaves the column, and hence, isocratic retention times are calculated The isocratic time corresponding to ϕjwill be referred here as “equivalent isocratic time”
The sequence of operations needed to obtain the parameters of the bandwidth global models ( ω0, ω1 and ω2 in Eq.(3)) is the fol- lowing:
(i) The retention data for each solute and gradient are calculated
by solving the fundamental equation for gradient elution [ 26–
28], with either analytical or numerical integration Once found the time along the gradient that makes the sum of integrals match the dead time, the instant composition ϕjat which the solute leaves the column is collaterally obtained
(ii) The equivalent isocratic retention time (at which each solute would leave the column if it migrated at ϕj) can be determined
by substituting the composition into the retention model (e.g.,
Eq.(2))
(iii) The gradient bandwidth for solute i in gradient j is obtained
straightforwardly by introducing t isoin Eq.(3) (iv) Finally, the bandwidth global model is fitted by modulating the parameters in Eq.(3), trying to obtain the best matching be- tween the observed bandwidths and the corresponding predic- tions, using the reference compounds and all training gradients
4 Results and discussion
4.1 Measurement of the chromatographic signal
As indicated in Section2.2, peak monitoring was carried out in the wavelength range between 210 and 280 nm (using nine ac- quisition channels separated each other by 10 nm) The detection wavelength was selected according to two approaches The first one made use of the “total chromatogram”, where the maximal absorbance in a certain wavelength domain is plotted versus the retention time This chromatogram can be processed and used fur- ther as a conventional chromatogram In the second approach, a compromise wavelength was selected balancing detectability and noise This approach was finally preferred, and the most suitable wavelength was found to be 230 nm At higher values, the chro- matograms showed fewer peaks (i.e., the absorption was more se- lective), while below 230 nm the background was too disturbing, making peak tracking more difficult
Before processing the chromatograms, the baseline was re- moved using a Matlab function developed in our laboratory, which automates and applies the BEADS (Baseline Estimation and Denois- ing using Sparsity) algorithm [15] BEADS performs a frequency- based signal decomposition to obtain three contributions: baseline, noise and net signal The built-in laboratory software applies the algorithm in a very flexible way, allowing a successful treatment
of highly complex chromatograms
Fig.2shows a representative chromatogram for the linden ex- tract, obtained with gradient G3 (see Fig.1c) As can be observed, the assisted BEADS algorithm was successful for baseline suppres- sion, removing almost completely the perturbation associated with the sudden increase in the gradient slope at 40 min Fig 3 de- picts the chromatogram for the linden extract, once processed by the automatic detection algorithm after eliminating the baseline The simulated signals included the real peak size, which was auto- matically measured with the MATLAB function developed for signal analysis
4.2 Construction of global bandwidth models to simulate chromatograms
As commented, the simulation of chromatograms requires, be- sides the availability of retention models ( Section 3.2), the con-
Trang 5Fig 2 Chromatogram obtained for the linden extract using gradient G3 (see Fig 1 c), before (a) and after (b) baseline subtraction with the assisted BEADS algorithm
Fig 3 Peak detection analysis carried out with the automatic algorithm developed in the laboratory, for one of the fingerprint replicates obtained with gradient G3 for
linden, after subtracting the baseline The abscissa axis corresponds to the indices of the time vector (data acquisition frequency of five points per second)
struction of bandwidth models to describe the peak profiles of
the sample components In this work, bandwidths were pre-
dicted based on correlations with the isocratic retention times (see
Section 3.3) However, there is no direct correspondence between
the bandwidths and the retention times for gradient elution; thus,
an inner relationship should be established with the times the so-
lute would experience, if it migrated isocratically at the solvent
composition when it leaves the column under a given gradient (the
equivalent isocratic times)
Section3.3describes the protocol to obtain the parameters ω0,
ω1and ω2 in the global bandwidth model ( Eq.(3)), based on gra- dient data Similarly to isocratic data, the bandwidths of a set of compounds eluted under several gradients offers a parabolic trend when represented versus the equivalent isocratic retention times Fig.4a to c shows the bandwidth trends for the peaks of the refer- ence compounds in the chromatograms of the extracts of the three medicinal plants The data represented in the figure correspond to the whole set of reference compounds, eluted using all gradients in
5
Trang 6Fig 4 Width plots for: (a) green tea, (b) lemon balm, (c) linden, and (d) a set of sulphonamides See text for details
the training designs For comparison purposes, the bandwidth data
for some structurally-related compounds (a set of sulphonamides),
eluted under isocratic elution, have been represented in Fig.4d As
will be shown, the plots built for the reference compounds show
trends, which can be useful for the prediction of peak profiles for
the chromatographic fingerprints, in spite of the intrinsically larger
scattering
Medicinal plants contain compounds with a high diversity in
chemical nature, which gives rise to diverse interaction kinetics
with the chromatographic column This is one of the reasons of
the larger scattering observed in the correlations, compared to
sulphonamides The second reason that explains the larger scatter-
ing is that, in gradient elution, the isocratic retention times cor-
respond to the instant the solutes leave the column It should
be noted that this happens at the beginning of the gradient at short times for solutes of low hydrophobicity, and at the end of the gradient for solutes of high hydrophobicity, where the elution strength is higher, giving rise to a reduction in retention times Thus, the shorter retention times, characteristic of gradient elution, make the scattering more apparent Note, however, that the sim- ulations show good agreement with the experimental peaks (see Figs.5to 7)
It should be noted that the global bandwidth models for the reference peaks are valid for any peak in the chromatogram (the reference peaks or any other) This is not the case for the global retention models, which are initially obtained with reference peaks and must be adapted to predict the retention of any other compo- nent in the sample, as explained in Section3.2
Trang 7Fig 5 Comparison of the experimental chromatographic fingerprint for lemon balm, corresponding to gradient G1 (b), with the chromatograms predicted using two different
base chromatograms: (a) gradient G7, and (c) gradient G3, which include a faster and a slower initial steps, respectively See Fig 1 for the identity of gradient profiles
4.3 Some factors affecting the simulation of chromatograms based
on global models
The quality of the predictions, using global models, was checked
by comparison of experimental and predicted chromatograms for:
(i) Multi-linear gradient programs belonging to the experimental
training design ( Fig.1, gradients G)
(ii) External validation gradients, with compositions exceeding the
range covered by the training design ( Fig.1, gradients E) These
gradients were also multi-linear, with profiles very different
from those in the training design In some cases, isocratic seg-
ments were included
Validation gradients were used to check the prediction perfor-
mance of the global models, under unfavourable prediction condi-
tions This is the case of those gradients where the program starts
at modifier concentrations exceeding those used in the training de-
sign, or gradients that include isocratic segments, more prone to
prediction errors
4.3.1 Influence of the base chromatogram on the predictions
The construction of a global retention model, valid to predict
the retention for all the components in a sample, requires the ar-
bitrary selection of an experimental chromatogram with the max-
imal number of peaks (the base chromatogram, see Section 3.3)
The choice of the base chromatogram is a point that very critically
affects the quality of predictions If the selected chromatogram
were associated to the gradient with the highest initial slope in
the experimental design (e.g., gradient G7 for lemon balm, Fig.1b),
the smallest signals in the chromatogram will be higher due to the
compression effect of the gradient However, this would also favour
the undesirable co-elution of neighbouring peaks Conversely, if the
chromatogram with the slowest gradient were used (e.g., gradient
G3, again for lemon balm), the peaks would be better resolved, but the longer analysis time can make the signals with the smallest size less detectable However, if the slow ramp were followed by
a steeper linear segment (as in gradient G3), the loss of percepti- bility for the most hydrophobic components in the chromatogram will not happen There are other factors to consider when choosing the base chromatogram, such as the differences in the prediction uncertainty of peaks eluting close to sections of the gradient with strong changes in slope
The specific log k 0,i parameters in the global models, used to predict the chromatographic fingerprints, were calculated from the values of the retention times for all the peaks found in the base chromatogram, using the automatic peak detection and signal anal- ysis function The set of log k 0,isolute parameters and the param- eters associated with column and solvent (which are common for all solutes) can be used to predict the chromatograms under any other gradient included inside the experimental region covered by the training design It is interesting to note that, in total, 162, 205 and 203 peaks were detected for green tea, lemon balm and lin- den, respectively, with the respective base chromatograms (i.e., ob- tained with the slowest gradients in their experimental designs) Fig 5 shows the experimental chromatogram for the lemon balm extract eluted with gradient G1, together with two predicted chromatograms (also for G1) obtained with the global model, but using two different base chromatograms: G7 and G3 ( Fig 1b) Figs.5a and 5c show the respective predictions for both gradients: the fastest (G7) and the slowest (G3) in the experimental design In general terms, the predictions were more accurate with the global model developed with G3 As can be observed, the agreement be- tween the experimental and predicted chromatograms is excellent
It should be indicated that the acquisition of chromatograms was carried out along a period of two months In all the exper- iments, a vial containing the same extract was used, so that any
7
Trang 8chemical change in the sample produced by degradation or for-
mation of new compounds during this period, would be beyond
the fitted model Another factor to consider is that the number of
peaks in the predicted chromatogram depends on the peaks de-
tected in the base chromatogram Thus, in the experimental chro-
matogram obtained with gradient G7 (where the peaks are closer),
only two intermediate peaks are shown in region 4 ( Fig.5a) Con-
sequently, if this chromatogram is used as base chromatogram, any
prediction would include only two peaks within this region How-
ever, the experimental chromatogram with gradient G1 shows at
least seven peaks in region 4 ( Fig.5b) If the base chromatogram
would have been that obtained with gradient G3 ( Fig.5c), it would
have been possible to predict the seven peaks for gradient G1
4.3.2 Prediction of signals not associated to retained solutes
The automatic function for signal analysis naturally does no dis-
tinguish between genuine peaks and some other signals not asso-
ciated to retained solutes:
(i) Signals close to the hold-up time: Present at the start of the
chromatogram as refractive fluctuations or signals appearing
before the hold-up time region, which are associated to carry-
over phenomena or incomplete column stabilisation from a pre-
vious injection If these signals are not discarded, they will be
processed as corresponding to a fictitious solute Since they do
not follow the global retention model, the incidental prediction
will fail (e.g., see region 1 in Fig.5)
(ii) Signals associated to the sudden stop of the ramp at the end of
the gradient: The sudden stabilisation of the slope at the end
of the gradient (e.g., region 6 in Fig.5) also produces refractive
fluctuations, which appear at a fixed position These signals do
not correspond to the elution of any solute, but to the sudden
stop of the modifier increase at the end of the gradient There-
fore, they are insensitive to changes in the gradient, as long
as the gradient time t G remains constant However, when the
peaks in this region are incorrectly associated with fictitious
solutes, their position becomes susceptible to changes when a
gradient different from the base chromatogram is used There-
fore, these signals should be ignored or removed from the sim-
ulation Analogously, sudden changes in slope in multi-linear
gradients may give rise to fake peaks that should be removed
4.3.3 Peaks with abnormal bandwidth
Some peaks, whose bandwidths are wider than expected ac-
cording to the retention, can be found often associated to co-
elution of two or more unresolved components, although these
peaks can have another origin Since the bandwidth model is es-
tablished with the information of peaks for single compounds, an
abnormally wide peak will be predicted according to the common
width trend for a single compound eluting at that position Con-
sequently, when global bandwidth models are applied, to keep the
same area the simulated peaks will appear with a larger height
than its experimental counterparts (compare the experimental and
simulated peaks in Fig.5)
In order to evaluate the quality of the predictions of band-
widths, removing the consequences of eventual biases in the pre-
diction of retention times, the chromatogram for a selected gra-
dient was predicted using itself as base chromatogram Therefore,
the peak positions were not actually predicted, only the peak pro-
files According to this idea, the chromatograms associated to gra-
dients G3 and G7 were predicted with the global retention models
that included all peaks present in the experimental signal
The experimental and predicted chromatograms are compared
for both gradients G3 and G7 in Fig.6a and b, respectively As ex-
pected, abnormally wide peaks are predicted thinner and more in-
tense This is the case of regions 2 and 5 in Fig 5, and peaks 1
and 2 in Fig.6b On the other hand, the refractive signals that ap- pear at the end of the gradient (region 3 in Fig.6b) are displaced when the gradient composition changes, as they are processed as genuine sample components Consequently, a fictitious value of log
k 0,iis assigned to these signals, and changes in composition affect their location In the example, the simulation only includes posi- tive areas, and therefore, both refractive peaks are positive These signal can be easily identified and removed if wished
4.4 Validation of chromatograms obtained with external multi-linear gradients
Experimental chromatograms corresponding to multi-linear gradients outside the training design (i.e., not used to build the global models) were also simulated with the aim of verifying the prediction performance under less favourable conditions These validation gradients are shown in Fig.1 for the samples of green tea (gradient E8), lemon balm (E8) and linden (E7 and E8) The external validation runs were carried out after the acquisition and modelling steps, usually two weeks after the experimental design was completed For a more realistic comparison, the baseline con- tribution, initially subtracted by the BEADS algorithm, was added
to the predicted chromatograms ( Fig.7)
In the chromatogram for green tea, some experimental peaks are observed, whose prediction is abnormally narrower (e.g., peaks
1 and 2 in Fig.7a), since they are processed as genuine peaks as- sociated to a single component when they are predicted with the global bandwidth model Observe that the bandwidths of these ex- perimental signals show differences with the trend observed for the neighbouring peaks Therefore, the abnormally broader peaks may be the result of co-elution of two or more components Other medicinal plants and gradients also showed sporadic broader peaks (e.g., peak 4 in gradient E8 for linden, in Fig 7d) The shift to- wards shorter times of the peaks associated to the refractive sig- nals, at the end of the gradient, is equally perceptible in the chromatograms The profile and position of the experimental re- fractive disturbance R 1, for the three plants, must be compared with the R 2 + R 3 signals in the predicted chromatograms These chromatograms were obtained by adding the fictitious peaks that model the refractive disturbance to the baseline found by BEADS Some differences observed between experimental and predicted chromatograms may be attributed to a slow degradation of the samples along weeks, which would have been solved by the pe- riodic renewal of the solutions It should be noted that the base chromatograms were acquired several days before performing the validation experiments Therefore, certain peaks are present in some experimental chromatograms, but not in others However, most peaks retain their original presence and intensity
It should be also taken into account that the validation gradi- ents include isocratic segments, followed by other segments with strong increases in slope This type of configuration makes the po- sition of the signals more uncertain, being the effects cumulative along the gradient Region 3 in the chromatogram of linden, ob- tained with gradient E8 ( Fig 7d), illustrates this behaviour as a shift in the sequence of peaks The magnitude and sign of the shift depends on the particular gradient configuration
A similar effect (region 3 in Fig 7b), but amplified due to a steeper gradient slope (gradient E8, see Fig.1b), is observed around the node for lemon balm, close to 40 min This strong variation
in the eluent composition, together with the progressively higher uncertainties in peak position (typical of slower solutes) results in dissimilar bandwidths for relatively close peaks It can be seen that the first two peaks in region 3 for the experimental chromatogram ( Fig.7b), which elute in the isocratic segment of the gradient pro- gram (before the change in slope), give rise to broader bandwidths According to the global model, the compounds associated to these
Trang 9Fig 6 Comparison between the experimental (above, blue) and predicted (below, red) chromatograms for lemon balm, obtained with gradients: (a) G7, and (b) G3 (see
Fig 1 ) The same gradients were also used as base chromatograms
peaks are slightly more hydrophobic with regard to the experimen-
tal ones; therefore, they are predicted with longer retention How-
ever, since these peaks are located close to a steep change in gra-
dient slope, the slightly higher value of the predicted log k 0,i(re-
lated to solute hydrophobicity) implies being reached by the next
segment of steeper slope in the gradient when they leave the col-
umn This accelerates the elution of these peaks, and consequently,
they are compressed Therefore, the five peaks in region 3 for gra-
dient E8 are correctly predicted considering their bandwidth, but
experience gradual biases in position
Finally, it should be noted that for green tea and lemon balm,
the composition range scanned by the validation set at the begin-
ning of the gradient is out of the domain covered by the training
design (16.4% acetonitrile for green tea and 23% for lemon balm,
see gradient E8 in Fig.1a and 1b) This means that for the least re-
tained compounds, the gradients will not reach such high concen-
trations in the first few minutes, and therefore, prediction of the
retention for these compounds will be based on extrapolations
The more polar components in the samples, which elute at the
start of the gradient, are more sensitive to the lack of informa-
tion, being thus affected by larger uncertainties Since the valida-
tion gradients for green tea and lemon balm start with isocratic
elution, this problem is magnified Nevertheless, in spite of this
limitation, the predicted and experimental chromatograms show
good agreement
5 Conclusions
This work deals with the suitability of global models to sim-
ulate chromatograms containing hundreds of components, which
can be useful for optimisation purposes In Part II, the global re- tention models, obtained in Part I [13] for selected compounds
in chromatographic fingerprints, are extended to include all com- ponents in the sample To do this, the retention data for all de- tected peaks, found in the chromatogram associated to the assayed gradient containing the lowest initial slope, were included in the model Global models allow the prediction of highly complex chro- matograms under different gradient conditions, with a remarkable level of approximation to reality The approach has been verified with excellent results for the extracts of three medicinal plants, with chromatograms affected of specific problems In order to get safer detection of the smallest peaks, a baseline correction algo- rithm was applied, followed by an unsupervised, laboratory-built MATLAB function for peak detection
In the construction of conventional individual retention mod- els, all the parameters obtained by fitting the retention data are specific of a given solute, since each is fitted independently As
a consequence, when the specific solute parameters (log k 0,i) are compared, these are unevenly affected by their chemical nature In contrast, in global models, the regression process isolates the com- mon column/solvent effects from those specific of each solute This makes the estimation of solute hydrophobicity less dependant on the particular interactions of the analytes Consequently, the con- tribution of each solute to retention is better ranked [13]
Although the prediction of the retention behaviour using a global model implies losing some solute specificity, which is dis- tinctive of the individual models, the loss in prediction perfor- mance is acceptable The main limitation of our proposal (and in general of global models in its current state) is that changes in the elution order of the components in the sample, with the com-
9
Trang 10Fig 7 Comparison between the experimental (above, blue) and predicted (below, red) chromatograms obtained for the three medicinal plants, corresponding to validation
gradients: (a) green tee obtained with gradient E8 (see Fig 1 ), (b) lemon balm with gradient E8, (c) linden with gradient E7, and (d) linden with gradient E8