The modeling of chromatographic separations can speed up downstream process development, reducing the time to market and corresponding development costs for new products such as pharmaceuticals.
Trang 1Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/chroma
simulations
Ronald Colin Jäpela, Johannes Felix Buyela, b, ∗
a Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Forckenbeckstrasse 6, Aachen 52074, Germany
b Institute for Molecular Biotechnology, RWTH Aachen University, Worringerweg 1, Aachen 52074, Germany
Article history:
Received 24 April 2022
Revised 27 July 2022
Accepted 29 July 2022
Available online 9 August 2022
Keywords:
Ion-exchange chromatography
Mechanistic model
Numeric optimization
Parameter estimation
Steric mass action (SMA) model
The modeling of chromatographic separations can speed up downstream process development, reduc- ing the time to market and corresponding development costs for new products such as pharmaceuticals However, calibrating such models by identifying suitable parameter values for mass transport and sorp- tion is a major, time-consuming challenge that can hinder model development and improvement We therefore designed a new approach based on Bayesian optimization (BayesOpt) and Gaussian processes that reduced the time required to compute relevant chromatography parameters by up to two orders of magnitude compared to a multistart gradient descent and a genetic algorithm We compared the three approaches side by side to process several internal and external datasets for ion exchange chromatogra- phy (based on a steric mass action isotherm) and hydrophobic interaction chromatography (a modified version of a recently published five-parameter isotherm) as well as different input data types (gradi- ent elution data alone vs gradient elution and breakthrough data) We found that BayesOpt computation was consistently faster than the other approaches when using either single-core or 12-cores computer processing units The error of the BayesOpt parameter estimates was higher than that of the competing algorithms, but still two orders of magnitude less than the variability of our experimental data, indicat- ing BayesOpts applicability for chromatography modeling The low computational demand of BayesOpt will facilitate rapid model development and improvement even for large datasets (e.g., > 100 proteins) and increase its suitability for research laboratories or small and medium enterprises lacking access to dedicated mainframe computers
© 2022 The Authors Published by Elsevier B.V This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/)
1 Introduction
Chromatography is widely used for the purification of biophar-
maceutical proteins [1–3] but can be a major cost driver during
production and process development [4] Such costs can be limited
by the model-driven optimization of chromatographic separation,
which reduces experimental screening to the most relevant opera-
tional conditions Modeling can also improve process understand-
ing and can facilitate adaptive process control [5]
Chromatography models often consist of a mass transport com-
ponent that can be simulated using the general rate model [ 6, 7]
and an adsorption component, describing protein interaction with
∗ Corresponding author at: Institute for Molecular Biotechnology, RWTH Aachen
University, Worringerweg 1, Aachen 52074, Germany
E-mail address: johannes.buyel@rwth-aachen.de (J.F Buyel)
a stationary phase, using isotherms such as the steric mass ac- tion (SMA) model [8] Both components require the calibration of several parameter values so that the model results match the ex- perimental data However, current calibration algorithms, such as multistart gradient descent, genetic algorithms and Markov chain Monte Carlo, require extensive computational time to identify ap- propriate sets of parameter values This is a bottleneck hindering the widespread application of model-based process development because the necessary computational infrastructure is often avail- able only to specialized research facilities or large companies Ac- cordingly, research laboratories as well as small and medium en- terprises would be empowered to use chromatography modeling tools if the computational time could be reduced This could be achieved by combining Gaussian processes (GP) and Bayesian opti- mization (BayesOpt)
https://doi.org/10.1016/j.chroma.2022.463408
0021-9673/© 2022 The Authors Published by Elsevier B.V This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
Trang 2A GP is a likelihood distribution over functions covering (mul-
tiple) continuous dimensions, such that every collection of values
drawn from the GP is a multivariate Gaussian distribution [ 9, 10]
In GP regression, a GP can be used to approximate an unknown
function by estimating the expected function values and the asso-
ciated uncertainties based on a (small) set of known data points
in the parameter space During BayesOpt, GP regression can there-
fore be used to identify extrema of unknown functions, which are
called objective functions in this context First, a GP is fitted to a
set of initial data points, i.e parameter combinations at which the
objective function has been evaluated Then the mean and vari-
ance predicted by the GP for each point in the parameter space
are combined using an acquisition function to select the next point
at which the objective function should be evaluated [11] The ac-
quisition function can balance exploitation, i.e focusing parameter
improvement near the current optimal region, and exploration, i.e
focusing on regions of the objective function where uncertainty is
high and global optima might be hidden
We reasoned that BayesOpt can therefore be applied to the in-
verse fitting of chromatography parameters to be used in simula-
tions in a multi-step process First, one or more objective functions
are defined that can capture the performance of a parameter fit
For example, the time offset between the maxima of an experi-
mentally determined protein elution peak and the corresponding
simulated peak can reveal how well the parameters of the under-
lying model were estimated (e.g., isotherm and mass transport pa-
rameters) These objective functions are then evaluated for an ini-
tial set of points distributed across the parameter search space
In the context of chromatography, an objective function evalu-
ation is equivalent to simulating protein binding and elution and
each point of evaluation corresponds to a combination of param-
eter value estimates for isotherm and mass transport Then, a GP
is created for each objective function and each GP is fitted to the
initial set of evaluation results Thereafter, the mean and variance
estimates of the GPs are combined in a single acquisition func-
tion and a new set of parameter values is selected to be evaluated
next The resulting new values of the objective functions (e.g., the
offset between the experimental and simulated peak maxima) are
then added to the data collection available to the respective GPs,
the mean and variance estimates are updated and used to choose
the next combination of chromatography parameter for evaluation
through simulation
This BayesOpt procedure has been shown to be advantageous
over other regression methods and converges to the global op-
timum faster than these if a set of prerequisites is met [ 12, 13]
Specifically, (i) there are few data points available because the cre-
ation of data (i.e., objective function evaluation, here: simulating
a chromatographic separation) is time consuming, (ii) estimates of
uncertainty are of interest, and/or (iii) the shape of the objective
functions are unknown but their smoothness is similar in all di-
mensions [11] In contrast, the performance of BayesOpt may suffer
if (i) the number of data points increases because the computation
of the GPs scales with O(n ³), and/or (ii) if the objective functions
are not smooth or their smoothness varies locally [14]
Here we present a novel method for the calibration of chro-
matography models using GPs Specifically, we propose three new
approaches for BayesOpt to mitigate the performance issues that
arise if the objective function is not smooth, has regions varying
in smoothness, or if large numbers of data points must be consid-
ered simultaneously First, we developed the concept of directional
objective functions Second, we aggregated multiple directional ob-
jective functions into a combined objective function Third, we in-
corporated dimensional trimming to reduce the calculation time as
the number of data points in the GP increases We applied these
approaches to the simultaneous determination of mass transport
and isotherm parameters in the context of protein chromatography
simulations As an isotherm, we used either the well-established steric mass action (SMA) model for ion exchange chromatography [8]or a novel isotherm for hydrophobic interaction chromatogra- phy (HIC)
2 Materials and methods
2.1 Computational hardware
All computations were run on Intel Xeon E5-2630 v3 computer processing units (CPUs) with 3.5 GB random access memory (RAM) per CPU core
2.2 Chromatography simulations
All chromatography simulations were computed using CADET software [ 15, 16,34,35] We compiled the binaries based on CADET release 3.1.2, adding a hydrophobic interaction isotherm modified from the original version [17] Individual simulations were set
up in CADET as a three-unit-operation model comprising the in- let, column and outlet Target chromatograms were generated in CADET using the parameter values specified in Table S1 The calcu- lated protein concentration at the outlet unit (mol per cubic me- ter over time) was saved to csv files The CADETMatch package v0.6.23 (commit 873a81c3b6f593313212c243018b7e5122d770c3) obtained from https://github.com/modsim/CADET-Match/releases was the latest available version at the time of this study and was used to handle genetic algorithm parameter fitting and multistart gradient descent parameter fitting [ 18, 17] Hyper-parameters for these algorithms were taken from the examples distributed with CADETMatch in the “Example 2” folder The dextran example from the same source was used for the non-pore-penetrating tracer datasets, the NonBindingProtein example was used for the pore- penetrating tracer datasets, and the Isotherm example was used for the SMA datasets For HIC parameter fitting, the hyper-parameters from the “Example 2/Isotherm” example were used with a genetic algorithm generation-population size of 50 instead of 20, based on the software creator’s advice
We maintained two separate conda virtual environments for (i) our BayesOpt and gradient descent algorithms as well as (ii) the CADETMatch package to prevent package conflicts All calculations were started in the BayesOpt virtual environment For CADETMatch evaluations, we used the Popen class of the subprocess module to start a new process in which we activated the second environment and ran the CADETMatch calls in that process The additional over- head time ( ∼0.2 s) was subtracted from all CADETMatch results before comparing the performance with other algorithms
2.3 Algorithm performance comparison
Algorithm performance was compared based on (i) duration, (ii) the parameter estimation error calculated as the Euclidian distance
of fitted-to-target parameter values in a normalized (0–1) multi- parameter space, and (iii) the relative sum squared error (rSSE) of
a simulation using the fitted parameter values compared to the tar- get curves The rSSE was calculated by taking the sum of squared errors (SSEs) and dividing it by the total sum of squares (TSS) of the target curve Eqs.(1)–(3)
rSSE= SSE
SSE=
n
i=0
T SS=
n
i=0
Trang 3Where y represents the target values and ˆ y represents the simula-
tion results using the fitted parameter values for all n data points
The sample size n ranged from 206 to 20,001 depending on the
simulation
Dividing SSE by TSS compensated for differences in signal scale
between the elution and breakthrough experiments From this
point onward, SSE always refers to the sum of squared errors be-
tween a target chromatogram and a simulated chromatogram
2.4 Example isotherms for performance testing
Ion exchange chromatography was simulated using the SMA
isotherm [ 8, 15, 16] that describes the change in protein bound to
the stationary phase dq iover time dt while accounting for the salt
concentration, number of protein–ligand interactions and shielded
binding sites Eqs.(4)and (5)
dq i
dt =k a ,i c p ,i(¯q0)ν i − k d ,i q i c ν i
¯q0=−
N comp−1
j=1
νj+σj
where t is the time, q iis the concentration of the ith protein bound
to the stationary phase, k a,i is the adsorption constant of the i-th
protein, p,iis the soluble concentration in the particle pores of the
ith protein, q ̅ 0is the number of free binding sites on the stationary
phase, νi is the characteristic charge of the ith protein, k d,iis the
desorption constant of the ith protein, sis the salt concentration
in the mobile phase, is the total ionic capacity of the stationary
phase, and σiis the shielding (steric) factor of the i-th protein
It is useful to divide Eq.(5) by k d and define k a/ k d as k eq and
the reciprocal of k d as k kin, which results in Eq.(6):
k kin,i dq i
dt =k eq ,i c p ,i(¯q0)ν i − q i c ν i
For the simulation of HIC, a previously described isotherm
[17]was modified ( Eq.(7)) to enable its integration into the CADET
framework, which will be published separately
k kin dq
dt =k eq(1− q
where m is the number of binding sites and β is the number of
bulk-like water molecules that stabilize all m binding sites The pa-
rameter βis calculated using Eq.(8):
where β0 is the number of bulk-like water molecules at infinite
dilution and β1 is a scaling factor that describes the influence of
the salt concentration on the number of bulk-like water molecules
2.5 Statistical testing
All groups of replicated results were assessed for normal-
ity using a Shapiro–Wilk test ( α ≤ 0.05) as computed with
scipy.stats.shapiro [20] Normally distributed data were ana-
lyzed using a two-sample, two-sided Welch’s t-test computed
with scipy.stats.ttest_ind, whereas non-normally distributed data
were analyzed using a Kruskal–Wallis H-test computed with
scipy.stats.kruskal ( α ≤ 0.05 in both cases) The sample sizes was
n= 6 when comparing durations and n= 12 when comparing
parameter estimation errors In all figures, asterisks indicate sig-
nificance: p ≤ 0.05 (significant), ∗∗p ≤ 0.01 (highly significant),
∗∗∗p≤ 0.001 (most significant) In the figures and tables, a super-
script w indicates the application of Welch’s t-test whereas a su-
perscript k indicates the application of the Kruskal–Wallis H-test
2.6 Calculation of objective functions
The agreement between simulated and target chromatograms was quantified using three case-dependent objective functions Tracers and gradient elution peaks were assessed based on (i) the retention time difference between the peaks, (ii) the height difference at peak maximum, and (iii) the peak skew difference ( Fig 1A) The retention time difference was evaluated by calcu- lating the cross correlation between the target peak and the sim- ulated peak using scipy.signal.correlate [20] The offset with the highest correlation coefficient was used as the time difference as previously explained [ 18, 19]
The height difference was calculated using Eq.(9):
height=(max(y target)− max(y sim) )/max(y target) (9)
where max( y sim) is the maximum value of the simulated peak and max( y target) is the maximum value of the target peak The peak skew was calculated by first treating peaks as probability distribu- tions, dividing them by the area under the curve and then applying
Eq.(10):
where μ is the distribution mean, ν is its median and σ is the standard deviation The difference in skewness was calculated as the skew of the simulated chromatogramminus the skew of the target chromatogram as shown in Eq.(11):
Breakthrough curves were compared based on (i) the difference
in the maximum concentration, (ii) the difference in the time re- quired to reach 50% of the maximum concentration, and (iii) the difference in the time required to increase from 50% to 98% max- imum concentration ( Fig.1B) All three values were calculated as percent differences relative to the target chromatograms Eq.(12)
=(t target − t sim)/t target (12)
where is the value of the objective function, t targetis the metric for the target chromatogram (e.g., the time taken to reach 50% of the maximum concentration) and t sim is the same metric for the simulated chromatogram The independent variables of all objec- tive functions were scaled to [-1,1] using Eq.(13) to improve the numerical stability of the algorithms
x = x
where x’ is the scaled independent variable of an objective function and x is the original unscaled variable The source code is available
on github (https://github.com/ronald-jaepel/ChromBayesOpt)
2.7 Calculation of GPs in Python
We used the GaussianProcessRegressor class from sklearn.gaussian_process to calculate all GPs [21] To aggregate multiple objective functions, we modified a previously published [22] BayesOpt algorithm by overwriting its BayesianOptimization class with a class that can handle multidimensional objective functions This new class was created to hold a GP for each objective function while exposing only a single GP fit and a single
GP predict method Python’s duck-typing allows for the new class
to seamlessly replace the regular GaussianProcessRegressor class from sklearn.gaussian_process We also extended the new class to allow the transformation of the independent variable (x) space
to a unit hypercube, which improves the numeric stability as discussed above The rational quadratic kernel was chosen for all subsequent optimizations because it generated the highest log marginal likelihood compared to all other available kernels,
3
Trang 4Fig 1 Graphical representation of the six objective functions used to assess the quality of chromatographic simulation results, specifically the coincidence of experimental
and simulated (gradient elution) peaks and breakthrough curves A Gradient elution peaks were compared based on differences in peak retention time, peak height and skew B Breakthrough curves were compared based on the difference in the maximum concentration peak height, the time to reach 50% of that concentration, and the time required to increase from 50% to 98% of the maximum concentration
as calculated using the log_marginal_likelihood method of the
GaussianProcessRegressor class of the scikit-learn python package
[21] on several sample datasets [ 23, 24] This is desirable be-
cause the log marginal likelihood describes the probability of the
observed data given the assumed model, i.e., kernel
2.8 Hyperparameter optimization
Hyperparameters ( Table1) for the dimensional trimming algo-
rithm ( Section 3.4) were optimized using an I-optimal design of
experiments (DoE) approach with 382 runs of third-order poly-
nomial complexity built in Design-Expert v13 [25] DoE parame-
ter ranges were chosen based on a set of scouting and screening
experiments (data not shown) and the response was the computa-
tional time required by BayesOpt to estimate all SMA isotherm pa-
rameters ( Section4.2) The “IEX Preliminary test” dataset was used
as a reference task (Table S1) Non-significant parameters were eliminated from the model by automated backwards selection us- ing a p-value threshold of 0.05 The final model achieved an R ² of 0.699, an adjusted R ² of 0.694 and a predicted R ² of 0.688, indicat- ing a suitable model quality (Table S2) Numerical minimization of the response (computational time) resulted in the optimal param- eter settings shown in Table1
2.9 Testing BayesOpt with experimental chromatography data
Exocellobiohydrolase 1 (CBH1; UniProt ID P62694) derived from Trichoderma reesei preparation Celluclast 1.5 (Novozymes A/S, Bagsværd, Denmark) was purified from a 1:20 v v −1 dilution with equilibration buffer (25 mM sodium phosphate, pH 7.5) Purifi-
4
Trang 5Table 1
DoE for hyperparameter optimization of dimensional trimming applied during BayesOpt
Starting value of the exploration-exploitation factor κ κstart Numeric 0.0 1.0 0.0
cation was performed using a 46 mL Q Sepharose HP XK26/20
Column (GE Healthcare, Chicago, USA) mounted to an ÄKTA pure
25 M system (Cytiva, Marlborough, USA) The column was equili-
brated with five column volumes (cv) of equilibration buffer, fol-
lowed by loading 0.2 L ( ∼5 cv) of the Celluclast dilution We
then applied 5 cv of equilibration buffer for washing followed by
a step-wise elution (25 mM sodium phosphate 1.0 M sodium chlo-
ride ( ∼50 mS cm −1), pH 7.5), including elution steps at 23.0, 26.0,
and 50.0 mS cm −1 The flow rate was 10.0 mL min −1(11.6 m h −1)
and 4.0 mL fractions were analyzed by lithium dodecylsulfate poly-
acrylamide gel electrophoresis (LDS-PAGE) [ 26, 1] Fractions con-
taining CBH1 were pooled and had a purity of 98% as per den-
sitometric analysis The pooled sample was buffer exchanged into
sample buffer (25 mM sodium phosphate, 25 mM sodium chlo-
ride, pH 7.0, 7.00 mS cm −1) using Vivaspin filter (Sartorius, Göt-
tingen, Germany) and the CBH1 concentration was 3.78 mg L −1
based on a microtiter-plate Bradford assay (Thermo Fisher Scien-
tific Inc., USA) [ 27, 2] We loaded 1.0 or 35.2 mL of purified CBH1
for gradient elution and frontal experiments respectively using a
1 mL Q Sepharose HP pre-packed column (Cytiva) mounted to a
dedicated ÄKTA pure 25 L system (Cytiva) The column had been
equilibrated for 10 cv in the modeling equilibration buffer (25 mM
sodium phosphate, 25 mM sodium chloride, pH 7.0) before sam-
ple loading and was washed for 5 cv using the same buffer after
sample loading Linear gradient elutions of CBH1 were carried out
over 5, 30 or 120 cv up to 100% elution buffer (25 mM sodium
phosphate, 500 mM sodium chloride, pH 7.0) Protein elution was
monitored as ultraviolet light adsorption at 280 nm The flow rate
was 0.50 mL min −1(7.80 m h −1) at all times
The resulting chromatograms were preprocessed by removing
the salt-induced drift in the UV measurements A linear correlation
between the UV adsorption and the conductivity signal was esti-
mated based on the data points during the wash steps both before
and after the gradient elutions Based on this correlation, the UV
signal was corrected for each data point based on the conductivity
measured at that point An exponentially modified Gaussian distri-
bution Eqs.(14)and (15)[ 28, 3] was fitted to the chromatogram to
remove noise and impurities from the signal
f(x ; h, μ, σ, τ )=h · e−0.5(x−μ
σ )2
·σ τ
π
1
√ 2
π
τ −x−σ μ (14)
erfcx(x)=exp
x2
·√2
π
∞
∫
x e−θ2
Where x is the retention time, f(x) is the UV signal, μ is the
mean of the Gaussian component, σ is the standard deviation of
the Gaussian component, h is the height of the Gaussian compo-
nent, τ is exponent relaxation time, θ is the pseudo variable over
which erfcx is integrated and erfcx is the scaled complementary
error function
The distribution parameters were estimated using the curve_fit
method from scipy.optimize [20] The resulting distribution was
used as a concentration profile and was subjected to the same pa-
rameter fitting described for synthetic data above
3 Theory and calculation
3.1 Directional objective functions
As stated above, BayesOpt performs best using smooth objective functions The objective function most often chosen for the inverse fitting of chromatography models is a minimization of the SSE of the protein elution concentration profile between the experiment and the simulation [19] The SSE objective function has multiple localminima and multiple abrupt changes in slope For example, running simulations with varying k eq and a true k eq of 1.00 (other parameters follow the “IEX Preliminary test” in Table S1) resulted
in a local minimum of SSE at a k eq of 10 −4 and a sharp drop to- wards the global k eq minimum at 1.00 ( Fig 2A and D; note the log10 scale of the x-axis) Accordingly, the SSE objective function is not well suited for BayesOpt
Alternatively, the absolute value (i.e., the magnitude) of the time difference between the simulated and target peak may be used to assess the quality of fitted parameter values ( Fig.2B and E) This function contains only a single global minimum to which minimizing algorithms will converge regardless of the starting con- ditions However, the function cannot be differentiated in that min- imum, which is characterized by an abrupt change in slope This property compromises the objective function’s smoothness and thus impedes the performance of GPs, as discussed above
In contrast, the actual value of the time offset forms a smooth objective function ( Fig.2C and F) and has the additional benefit of providing information whether a simulated peak appears “earlier”
or “later” than the target peak with the optimum being zero Here,
we introduce the term ‘directional objective function’ for objective functions whose optimum is zero and that yield suboptimal val- ues in both the negative and positive number space Hence, they provide additional information showing in which direction a pa- rameter value should be modified for optimization However, di- rectional objective functions introduce a computational challenge because their optimum is not a minimum or maximum and thus cannot be identified effectively using any optimizer available to use A new optimization algorithm is therefore required to iden- tify the optimum, specifically the parameter value(s) that optimize the agreement between simulated and experimental data There- fore, we developed an option to construct such an algorithm
3.2 Adapting the acquisition function to directional objective functions
During BayesOpt, an acquisition function is used to choose the next point in the parameter space for evaluation using the objec- tive function(s) Common examples include the upper confidence bound, the expected improvement, and the probability of improve- ment [11] The probability of improvement is the likelihood that the objective function at a point in the parameter space that has previously not been evaluated will yield a better parameter value estimate than the best value known up to that iteration in the op- timization process The expected improvement quantifies the re- sult by multiplying the likelihood by the relative improvement that can be gained compared to the previous optimum Both probability
Trang 6Fig. 2 Objective functions for parameter estimation and their approximations using a Gaussian process during Bayesian optimization with SMA parameter k eq as an example
A and D Sum squared error (SSE) between the elution concentration profiles of simulated and target peaks B and E Absolute value of the relative time difference between the simulated and target peaks C and F Relative time difference between the simulated and target peaks The top row represents the beginning of the parameter fitting when coverage of the parameter space is sparse (four data points), whereas the bottom row represents a state close to the end of the optimization with nine data points (including three close to the optimum) added to the Gaussian processes
functions return zeros for large fractions of the parameter space if
BayesOpt is close to completion (i.e., the actual optimum), because
the probabilities of improvement in certain regions approach zero
Specifically, the ratio of the GP uncertainty to the objective func-
tion’s output range becomes very small Accordingly, the function
in these regions does not have a slope that would point to the
optimum, which is therefore difficult to identify at this stage be-
cause new points for evaluation are identified inefficiently In con-
trast, the upper confidence bound does not suffer from this limi-
tation because it returns non-zero values even within undesirable
regions We therefore used the upper confidence bound as an ac-
quisition function for the BayesOpt algorithm but replaced its de-
fault formula ( Eq.(16)) with the one of the lower confidence bound
(LCB) as shown in Eq.(17), which allowed us to construct the ob-
jective functions as minimization tasks ( Section2.6)
where μ is the mean of the GP, σ is the GP standard deviation
and κis the exploration–exploitation tradeoff factor, with high val-
ues of κfavoring the exploration of regions with high uncertainties
over regions close to the values currently yielding the best results
with respect to the objective function(s) However, in the form of
Eq (17), the mean GP value can be negative for sub-optimal pa-
rameter conditions, e.g a simulated peak appearing earlier than
its experimental counterpart Using the absolute value of the GP mean would create a minimum at the optimal function value but this absolute value function cannot be differentiated when the de- pendent variable is zero, effectively impeding the performance of the gradient-based local optimization of the acquisition function
We therefore approximated the absolute value function by Eq.(18), which is differentiable in each point and has a maximum deviation from the true absolute value of 5 × 10−9 This difference was con-
sidered negligible because the range of the objective functions was scaled to span from –1.0 to 1.0 in order to maximize the numerical stability of the algorithm (see above)
f( μ,σ )= μ2
μ2+10−8 −σ·κ ≈| μ|−σ·κ (18)
We chose not to transform the standard deviation of the objec- tive functions into the statistically correct folded normal distribu- tion because the latter compromised key aspects of the acquisition function when the normal distribution’s range crossed below zero Specifically, regions with high uncertainty were deemed less fa- vorable for exploration by the acquisition function when using the folded normal distribution, effectively contradicting the purpose of exploration ( Fig 3A) Instead, we used the untransformed uncer- tainty of the objective function(s) Even though this caused some results to predict negative error values that should be impossible
in theory ( Fig.3, shaded areas below zero), the acquisition func- tion successfully chose the expected locations of interest and the algorithm converged to the correct parameter values
Trang 7Fig 3 Evaluation of the lower confidence bound (LCB) acquisition function in three scenarios of Gaussian processes (GP) using either an untransformed uncertainty (normal
distribution, green) or the formally correct folded normal distribution (orange) A Scenario with a constant mean (blue line) and varying uncertainty (shaded area) The LCB with a folded normal distribution disregards regions of high uncertainty in the GP and is therefore not useful to identify the next parameter value to be evaluated B Scenario with a varying mean and constant uncertainty Both acquisition functions correctly identify the location where the next parameter values should be evaluated C Scenario with a varying mean and varying uncertainty The minimum of the LCB with unmodified uncertainty (normal distribution) is closer to where the mean approaches zero than the minimum of the LCB with folded normal distribution Note that a scenario with constant mean and constant uncertainty is not shown because the GP starts after an initial iteration has been performed and there is a non-uniform prior of the objective function
3.3 Multiple objective functions and their aggregation
A single directional objective function is typically capable of
identifying an optimum for only a single independent parameter to
be fitted However, when multiple parameter values need to be op-
timized, a single directional objective function will probably result
in a set of indistinguishable optima: instead of a single root (inter-
section with the objective function at zero value) there will be a
line or area of roots in the multi-dimensional parameter space for
which the objective function adopts a zero value Combining multi-
ple directional objective functions can resolve this ambiguity when multiple parameters need to be optimized at the same time, which
is the case for an SMA isotherm, especially when mass transport is also considered
We therefore built a new optimizer that maintains individual GPs for each objective function and combines all GP estimates of the objective functions into a single, aggregated objective function during the evaluation step performed by the acquisition function
We selected the arithmetic mean to aggregate the individual ob- jective functions ( Eq (19)), with the option to add weightings to
7
Trang 8the individual objective functions present in the code but unused
for the results in this paper ( Eq.(20)) For example, the weightings
can help to fine-tune the fitting process, for example by placing
emphasis on peak height and skew over retention time ( Fig 1)
Similarly, alternative aggregation functions may also be used to in-
troduce a weighting between individual directional objective func-
tions, such as the geometric mean or the harmonic mean
f(x)=1
n
n
i=1
x2
i
x2
i+10−8
n
n
i=1
f(w, x)=1
n
n
i=1
(w i · x i)2
(w i · x i)2+10−8 ≈1
n
n
i=1
|w i · x i| (20)
Where f( x) is the aggregated objective function, n is the number
of individual objective functions to be aggregated, x iis the value of
the ith objective function and w iis the weighting assigned to the
i-th objective function
An estimate of the combined uncertainty of the aggregated ob-
jective function is also required to solve the acquisition function
( Eq (18)) Calculating this uncertainty in a closed form was im-
practical, because the form depends on the number of objective
functions involved and would require adaptation if the number
and/or nature of the functions change Estimating the combined
uncertainty using a Monte Carlo method instead [29]increased the
calculation costs about 40-fold (data not shown) Therefore, the in-
dividual standard deviations were combined using the rules of er-
ror propagation ( Eq.(21)), which can also be adapted to the use of
weightings ( Eq.(22))
f( σ )=
n
i=1σ2
i
f(w,σ )=
n
i=1(w i·σi)2
Where n is the number of individual objective functions with un-
certainties to be aggregated and σiis the standard deviation of the
ith objective function
3.4 Dimensional trimming
As described in the introduction, a caveat of BayesOpt is the
increasing computational cost of the fitting and evaluation of GPs
as the number of data points increases For example, the time re-
quired to fit and evaluate the GPs for each search step compared to
the time required for the CADET simulations during each step in-
creased substantially over the course of parameter estimation runs
( Fig.4A and B) Therefore, it would take more time to compute the
parameter values used to execute the next chromatography simu-
lation using CADET than to conduct that simulation We therefore
modified our algorithm to trim down the parameter dimensions
after a certain number of GP evaluation steps, effectively limiting
the duration of GP computation ( Fig.4C and D)
The trimming procedure ( Fig 5) used a pre-optimized set of
hyperparameters ( Table 1, Section 2.8) and started by filling the
parameter space with an initial set of candidate points ( n cp) to
be evaluated using CADET These points were distributed through-
out the parameter space using the Enhanced Stochastic Evolution-
ary algorithm from the python surrogate modeling toolbox (SMT)
package, which produces low-discrepancy Latin hypercube sam-
pling distributions [30] Once these points had been evaluated by
CADET, BayesOpt used the GP estimates to select a fixed num-
ber of search points ( n sp) with κ decreasing from a starting value
( k start) to zero, effectively shifting the focus from exploration to ex-
ploitation during search point selection Thereafter, the n p × n bp
best data points were identified, where n p is the number of pa- rameters to be fitted, and the boundaries of the parameter space were shrunk to the ranges spanned by these points Thereafter, the procedure entered the next iteration until a termination-threshold score of 0.005 was achieved, which was equivalent to an average error of 0.5% across the multiple objective functions This threshold can be reduced if higher precision is required at the cost of longer computation times Alternative termination criteria may be spec- ified, such as a fixed number of CADET evaluations The method can rapidly shrink the parameter space in the case of simple op- tima ( Fig.5) Should multiple local optima exist for one or several
of the parameters to be fitted, the range of the corresponding pa- rameter(s) will shrink only as far as possible while still including these optima
3.5 Algorithm termination condition
We chose a stall threshold of less than stall (here 0.001) im- provement over n stall data points with respect to the combined score functions to define a termination criterion for the algorithm
As the score functions were formulated as percentage differences between the target values and the simulated values, a delta of 0.001 corresponded to an error of 0.1%, which we deemed accept- able For n stall we chose n sp, the number points determined for the dimensional shrinking section As a result, if an entire iteration of the dimensional shrinking procedure elapses without further im- provement, the algorithm ended as it had most likely converged to the best possible solution given the respective data input
4 Results and discussion
4.1 Inverse fitting of transport and porosity parameter values
Transport parameters and porosities must be determined to set the boundary conditions for the modeling of packed-bed chro- matography columns [31] We assumed that experimental condi- tions such as the column length and volumetric flow rate would be known We used the lumped rate model with pores to fit values for the column porosity (i.e., inter-particle porosity), particle porosity (i.e., intra-particle porosity), axial dispersion coefficient, and film diffusion coefficient [ 7, 32] We used two types of input data to fit these mass transport parameters: (i) non-pore-penetrating tracer data to determine the column porosity and the axial dispersion coefficient, and (ii) pore-penetrating tracer data to determine the particle porosity and the film diffusion coefficient For subsequent experiments to determine the transport parameters, the adsorption constant k awas set to zero to eliminate interactions between the components and the stationary phase For the experiments with non-pore-penetrating tracers, the particle porosity and film diffu- sion were also set to zero
We compared the performance of BayesOpt, a multi-start gradi- ent descent algorithm, and a genetic algorithm using four datasets ( Figs 6A, S1, Table S4) that captured the variability of single- protein peak shapes that we have previously encountered during the determination of SMA parameters [33] If restricted to one CPU core, BayesOpt was on average 15 % faster than the multi-start gra- dient descent algorithm and 4.3-fold faster than the genetic algo- rithm ( Fig.6B) When parallelizing over 12 CPU cores, the BayesOpt algorithm was on average 37 % slower than gradient descent and
7 % faster than the genetic algorithm ( Fig 6C) Overall, the time required for BayesOpt calculations was less than 5 min and was thus compatible with model updating on a daily basis even for large collections of chromatography data featuring more than 100 individual calculations, for example representing different proteins and chromatography conditions The parameter estimation error in
Trang 9Fig 4 Computation time required for chromatography parameter estimation and its dependence on isotherm complexity and the size of the parameter search space A SMA
parameter estimation ( k a , k d , νand σ) and the resulting change in duration of CADET and GP computation times using a fixed parameter space throughout the process B HIC parameter estimation ( k eq , k kin , m, β 0 and β 1 ) and the resulting change in duration of CADET and GP computation times using a fixed parameter space throughout the
process C As in A, but including a dimensional trimming step for the GP D As in B, but including a dimensional trimming step for the GP The trimming procedure ( Fig 5 ) causes abrupt changes in the GP duration of operation in panels C and D
the BayesOpt method was at least 100-fold lower than the stan-
dard deviation of the same parameters in replicated experiments
[33] (Table S3) Overall, we deemed the BayesOpt error accept-
able for the estimation of transport parameters in chromatography
models, even though it was significantly ( p≤ 0.001) higher than
the multi-start gradient descent error for all datasets except the
external pore-penetrating dataset ( Fig.6D)
4.2 Inverse fitting of SMA isotherm parameter values
Calibrating an SMA model based on experimental data can be
achieved by (i) estimating k a , k d ,νand σ based on gradient elution
and a breakthrough data, or (ii) estimating k eq(i.e., the ratio of k a
and k d) and νbased on several gradient elutions [ 8, 31] When test-
ing the three algorithms on in silico generated datasets in the first
scenario (gradient elution and breakthrough data, Fig.7), we found
that convergence was achieved on a single CPU core on average
∼12-fold faster using BayesOpt compared to the multistart gradi-
ent descent algorithm and ∼22-fold faster compared to the genetic
algorithm ( Fig.7C, Table S6) When parameter fitting was executed
on 12 CPU cores in parallel, BayesOpt was still 3-fold faster than
the multi-start gradient descent algorithm and 4-fold faster than
the genetic algorithm ( Fig.7D)
Similarly to the results for the transport parameters, BayesOpt
generated higher parameter estimation errors and larger rSSE val-
ues compared to the multistart gradient descent algorithm on all
datasets except for the external dataset ( Fig.7E and F) We can-
not compare the parameter estimation errors to experimental re-
sults because the actual parameter values of real proteins are un- known However, we can compare the standard deviations of the parameter estimates, produced by the algorithms on artificial data,
to the standard deviation of the same parameters obtained from replicated experiments We found that at worst the standard de- viation of BayesOpt (4.14 × 10−3, n= 12) was ‘only’ two or- ders of magnitude lower than the experimental standard deviation obtained for ribulose-1,5-bisphosphate carboxylase-oxygenase (Ru- BisCO) (7.21 × 10−1, n= 3) on 1 mL Q Sepharose HP column fitted
by gradient descent (Table S5) Therefore, the error introduced by BayesOpt was only 0.6% of the experimental uncertainty, which we consider acceptable Furthermore, the differences between the pre- dicted and target chromatograms were marginal in all cases (Figs S2–S5) Therefore, we deemed BayesOpt suitable for the estimation
of SMA parameters in chromatography models based on combined breakthrough and gradient elution data but concede that multi- start gradient descent and the genetic algorithm can achieve higher parameter certainties
4.3 Inverse fitting of SMA isotherm parameters k eqandν based on gradient elution data
More than 500 mg of pure protein is typically required for breakthrough curve experiments [33], which is difficult to obtain during early downstream process development Because the infor- mation derived from these curves (i.e., an estimate of σ and thus column capacity) is not usually required at that development stage, estimating k eq(i.e., the ratio of k aand k d) and νbased on gradient
9
Trang 10Fig. 5 Graphical representation of the steps in the algorithm used for dimensional trimming A An initial set of candidate points ( n cp ) is distributed throughout the
parameter space using the Enhanced Stochastic Evolutionary algorithm B GP estimates are used to select additional search points ( n sp ), initially focusing on exploration –
the sampling of high-uncertainty regions in the parameter space C By iteratively reducing the exploration–exploitation factor κfor each search point, GP selections favor exploitation (i.e., investigate regions close to the current optimum) in the course of search point selection D The top n p × n bp points in terms of the objective function value are identified to form the basis of a new parameter range, where n p is the number of parameters to be optimized and n bp is the best point threshold E The parameter range spanned by the n p × n bp points is used to define new boundaries for the values of the parameters to be fitted F–H The new boundaries are applied and the search
is reiterated until a termination condition is reached (see Section 3.5 ) Each dot in the panels represents an aggregated objective function score for a given set of parameter values (e.g., SMA parameters) Objective function scores were calculated as described in Section 2.6 and aggregated as described in Section 3.3
elution experiments alone is another relevant task in chromatogra-
phy modeling We evaluated all three algorithms on three in silico
datasets using three elution curves of 5, 30 and 120 column vol-
umes (cv) each ( Fig.8) The external dataset was not used because
it contained only a single gradient elution profile On a single CPU
core, we found that BayesOpt was on average 25.0-fold faster than
the multi-start gradient descent algorithm and 37.9-fold faster than
the genetic algorithms On 12 CPU cores, BayesOpt was on average
∼7-fold faster than both alternative algorithms (Table S7)
As before, the peak shapes of the simulated and target chro- matograms were very similar for all approaches (Figs S6–S8), even though BayesOpt had significantly ( p≤ 0.001) higher parameter es- timation errors and rSSE values compared to the gradient descent algorithm on two of the three in silico datasets and significantly lower ( p≤ 0.001) errors on the internal dataset 3 The variability introduced by BayesOpt never exceed 1.3% of the standard devia- tion of the parameter values experimentally determined by repli- cated measurements (Table S5) We therefore considered BayesOpt
10
... number of CADET evaluations The method can rapidly shrink the parameter space in the case of simple op- tima ( Fig.5) Should multiple local optima exist for one or severalof the parameters. .. p is the number of pa- rameters to be fitted, and the boundaries of the parameter space were shrunk to the ranges spanned by these points Thereafter, the procedure entered the next iteration... evaluation of GPs
as the number of data points increases For example, the time re-
quired to fit and evaluate the GPs for each search step compared to
the time required for the CADET