Bayesian optimization using multiple directional objective functions allows the rapid inverse fitting of parameters for chromatography simulations

The modeling of chromatographic separations can speed up downstream process development, reducing the time to market and corresponding development costs for new products such as pharmaceuticals.

Trang 1

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/chroma

simulations

Ronald Colin Jäpela, Johannes Felix Buyela, b, ∗

a Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Forckenbeckstrasse 6, Aachen 52074, Germany

b Institute for Molecular Biotechnology, RWTH Aachen University, Worringerweg 1, Aachen 52074, Germany

Article history:

Received 24 April 2022

Revised 27 July 2022

Accepted 29 July 2022

Available online 9 August 2022

Keywords:

Ion-exchange chromatography

Mechanistic model

Numeric optimization

Parameter estimation

Steric mass action (SMA) model

The modeling of chromatographic separations can speed up downstream process development, reducing the time to market and corresponding development costs for new products such as pharmaceuticals However, calibrating such models by identifying suitable parameter values for mass transport and sorp- tion is a major, time-consuming challenge that can hinder model development and improvement We therefore designed a new approach based on Bayesian optimization (BayesOpt) and Gaussian processes that reduced the time required to compute relevant chromatography parameters by up to two orders of magnitude compared to a multistart gradient descent and a genetic algorithm We compared the three approaches side by side to process several internal and external datasets for ion exchange chromatography (based on a steric mass action isotherm) and hydrophobic interaction chromatography (a modiﬁed version of a recently published ﬁve-parameter isotherm) as well as different input data types (gradient elution data alone vs gradient elution and breakthrough data) We found that BayesOpt computation was consistently faster than the other approaches when using either single-core or 12-cores computer processing units The error of the BayesOpt parameter estimates was higher than that of the competing algorithms, but still two orders of magnitude less than the variability of our experimental data, indicat- ing BayesOpts applicability for chromatography modeling The low computational demand of BayesOpt will facilitate rapid model development and improvement even for large datasets (e.g., > 100 proteins) and increase its suitability for research laboratories or small and medium enterprises lacking access to dedicated mainframe computers

1 Introduction

Chromatography is widely used for the puriﬁcation of biophar-

maceutical proteins [1–3] but can be a major cost driver during

production and process development [4] Such costs can be limited

by the model-driven optimization of chromatographic separation,

which reduces experimental screening to the most relevant opera-

tional conditions Modeling can also improve process understand-

ing and can facilitate adaptive process control [5]

Chromatography models often consist of a mass transport com-

ponent that can be simulated using the general rate model [ 6, 7]

and an adsorption component, describing protein interaction with

∗ Corresponding author at: Institute for Molecular Biotechnology, RWTH Aachen

University, Worringerweg 1, Aachen 52074, Germany

E-mail address: johannes.buyel@rwth-aachen.de (J.F Buyel)

a stationary phase, using isotherms such as the steric mass action (SMA) model [8] Both components require the calibration of several parameter values so that the model results match the experimental data However, current calibration algorithms, such as multistart gradient descent, genetic algorithms and Markov chain Monte Carlo, require extensive computational time to identify ap- propriate sets of parameter values This is a bottleneck hindering the widespread application of model-based process development because the necessary computational infrastructure is often available only to specialized research facilities or large companies Ac- cordingly, research laboratories as well as small and medium enterprises would be empowered to use chromatography modeling tools if the computational time could be reduced This could be achieved by combining Gaussian processes (GP) and Bayesian optimization (BayesOpt)

https://doi.org/10.1016/j.chroma.2022.463408

Trang 2

A GP is a likelihood distribution over functions covering (mul-

tiple) continuous dimensions, such that every collection of values

drawn from the GP is a multivariate Gaussian distribution [ 9, 10]

In GP regression, a GP can be used to approximate an unknown

function by estimating the expected function values and the asso-

ciated uncertainties based on a (small) set of known data points

in the parameter space During BayesOpt, GP regression can there-

fore be used to identify extrema of unknown functions, which are

called objective functions in this context First, a GP is ﬁtted to a

set of initial data points, i.e parameter combinations at which the

objective function has been evaluated Then the mean and vari-

ance predicted by the GP for each point in the parameter space

are combined using an acquisition function to select the next point

at which the objective function should be evaluated [11] The ac-

quisition function can balance exploitation, i.e focusing parameter

improvement near the current optimal region, and exploration, i.e

focusing on regions of the objective function where uncertainty is

high and global optima might be hidden

We reasoned that BayesOpt can therefore be applied to the in-

verse ﬁtting of chromatography parameters to be used in simula-

tions in a multi-step process First, one or more objective functions

are deﬁned that can capture the performance of a parameter ﬁt

For example, the time offset between the maxima of an experi-

mentally determined protein elution peak and the corresponding

simulated peak can reveal how well the parameters of the under-

lying model were estimated (e.g., isotherm and mass transport pa-

rameters) These objective functions are then evaluated for an ini-

tial set of points distributed across the parameter search space

In the context of chromatography, an objective function evalu-

ation is equivalent to simulating protein binding and elution and

each point of evaluation corresponds to a combination of param-

eter value estimates for isotherm and mass transport Then, a GP

is created for each objective function and each GP is ﬁtted to the

initial set of evaluation results Thereafter, the mean and variance

estimates of the GPs are combined in a single acquisition func-

tion and a new set of parameter values is selected to be evaluated

next The resulting new values of the objective functions (e.g., the

offset between the experimental and simulated peak maxima) are

then added to the data collection available to the respective GPs,

the mean and variance estimates are updated and used to choose

the next combination of chromatography parameter for evaluation

through simulation

This BayesOpt procedure has been shown to be advantageous

over other regression methods and converges to the global op-

timum faster than these if a set of prerequisites is met [ 12, 13]

Speciﬁcally, (i) there are few data points available because the cre-

ation of data (i.e., objective function evaluation, here: simulating

a chromatographic separation) is time consuming, (ii) estimates of

uncertainty are of interest, and/or (iii) the shape of the objective

functions are unknown but their smoothness is similar in all di-

mensions [11] In contrast, the performance of BayesOpt may suffer

if (i) the number of data points increases because the computation

of the GPs scales with O(n ³), and/or (ii) if the objective functions

are not smooth or their smoothness varies locally [14]

Here we present a novel method for the calibration of chro-

matography models using GPs Speciﬁcally, we propose three new

approaches for BayesOpt to mitigate the performance issues that

arise if the objective function is not smooth, has regions varying

in smoothness, or if large numbers of data points must be consid-

ered simultaneously First, we developed the concept of directional

objective functions Second, we aggregated multiple directional ob-

jective functions into a combined objective function Third, we in-

corporated dimensional trimming to reduce the calculation time as

the number of data points in the GP increases We applied these

approaches to the simultaneous determination of mass transport

and isotherm parameters in the context of protein chromatography

simulations As an isotherm, we used either the well-established steric mass action (SMA) model for ion exchange chromatography [8]or a novel isotherm for hydrophobic interaction chromatography (HIC)

2 Materials and methods

2.1 Computational hardware

All computations were run on Intel Xeon E5-2630 v3 computer processing units (CPUs) with 3.5 GB random access memory (RAM) per CPU core

2.2 Chromatography simulations

All chromatography simulations were computed using CADET software [ 15, 16,34,35] We compiled the binaries based on CADET release 3.1.2, adding a hydrophobic interaction isotherm modiﬁed from the original version [17] Individual simulations were set

up in CADET as a three-unit-operation model comprising the in- let, column and outlet Target chromatograms were generated in CADET using the parameter values specified in Table S1 The calculated protein concentration at the outlet unit (mol per cubic me- ter over time) was saved to csv files The CADETMatch package v0.6.23 (commit 873a81c3b6f593313212c243018b7e5122d770c3) obtained from https://github.com/modsim/CADET-Match/releases was the latest available version at the time of this study and was used to handle genetic algorithm parameter fitting and multistart gradient descent parameter fitting [ 18, 17] Hyper-parameters for these algorithms were taken from the examples distributed with CADETMatch in the “Example 2” folder The dextran example from the same source was used for the non-pore-penetrating tracer datasets, the NonBindingProtein example was used for the pore- penetrating tracer datasets, and the Isotherm example was used for the SMA datasets For HIC parameter fitting, the hyper-parameters from the “Example 2/Isotherm” example were used with a genetic algorithm generation-population size of 50 instead of 20, based on the software creator’s advice

We maintained two separate conda virtual environments for (i) our BayesOpt and gradient descent algorithms as well as (ii) the CADETMatch package to prevent package conﬂicts All calculations were started in the BayesOpt virtual environment For CADETMatch evaluations, we used the Popen class of the subprocess module to start a new process in which we activated the second environment and ran the CADETMatch calls in that process The additional over- head time ( ∼0.2 s) was subtracted from all CADETMatch results before comparing the performance with other algorithms

2.3 Algorithm performance comparison

Algorithm performance was compared based on (i) duration, (ii) the parameter estimation error calculated as the Euclidian distance

of ﬁtted-to-target parameter values in a normalized (0–1) multi- parameter space, and (iii) the relative sum squared error (rSSE) of

a simulation using the ﬁtted parameter values compared to the target curves The rSSE was calculated by taking the sum of squared errors (SSEs) and dividing it by the total sum of squares (TSS) of the target curve Eqs.(1)–(3)

rSSE= SSE

SSE=

n

i=0

T SS=

n

i=0

Trang 3

Where y represents the target values and ˆ y represents the simula-

tion results using the ﬁtted parameter values for all n data points

The sample size n ranged from 206 to 20,001 depending on the

simulation

Dividing SSE by TSS compensated for differences in signal scale

between the elution and breakthrough experiments From this

point onward, SSE always refers to the sum of squared errors be-

tween a target chromatogram and a simulated chromatogram

2.4 Example isotherms for performance testing

Ion exchange chromatography was simulated using the SMA

isotherm [ 8, 15, 16] that describes the change in protein bound to

the stationary phase dq iover time dt while accounting for the salt

concentration, number of protein–ligand interactions and shielded

binding sites Eqs.(4)and (5)

dq i

dt =k a ,i c p ,i(¯q0)ν i − k d ,i q i c ν i

¯q0=−

N comp−1

j=1

νj+σj

where t is the time, q iis the concentration of the ith protein bound

to the stationary phase, k a,i is the adsorption constant of the i-th

protein, p,iis the soluble concentration in the particle pores of the

ith protein, q ̅ 0is the number of free binding sites on the stationary

phase, νi is the characteristic charge of the ith protein, k d,iis the

desorption constant of the ith protein, sis the salt concentration

in the mobile phase, is the total ionic capacity of the stationary

phase, and σiis the shielding (steric) factor of the i-th protein

It is useful to divide Eq.(5) by k d and deﬁne k a/ k d as k eq and

the reciprocal of k d as k kin, which results in Eq.(6):

k kin,i dq i

dt =k eq ,i c p ,i(¯q0)ν i − q i c ν i

For the simulation of HIC, a previously described isotherm

[17]was modiﬁed ( Eq.(7)) to enable its integration into the CADET

framework, which will be published separately

k kin dq

dt =k eq(1− q

where m is the number of binding sites and β is the number of

bulk-like water molecules that stabilize all m binding sites The pa-

rameter βis calculated using Eq.(8):

where β0 is the number of bulk-like water molecules at inﬁnite

dilution and β1 is a scaling factor that describes the inﬂuence of

the salt concentration on the number of bulk-like water molecules

2.5 Statistical testing

All groups of replicated results were assessed for normal-

ity using a Shapiro–Wilk test ( α ≤ 0.05) as computed with

scipy.stats.shapiro [20] Normally distributed data were ana-

lyzed using a two-sample, two-sided Welch’s t-test computed

with scipy.stats.ttest_ind, whereas non-normally distributed data

were analyzed using a Kruskal–Wallis H-test computed with

scipy.stats.kruskal ( α ≤ 0.05 in both cases) The sample sizes was

n= 6 when comparing durations and n= 12 when comparing

parameter estimation errors In all ﬁgures, asterisks indicate sig-

nificance: p ≤ 0.05 (significant), ∗∗p ≤ 0.01 (highly significant),

∗∗∗p≤ 0.001 (most signiﬁcant) In the ﬁgures and tables, a super-

script w indicates the application of Welch’s t-test whereas a su-

perscript k indicates the application of the Kruskal–Wallis H-test

2.6 Calculation of objective functions

The agreement between simulated and target chromatograms was quantiﬁed using three case-dependent objective functions Tracers and gradient elution peaks were assessed based on (i) the retention time difference between the peaks, (ii) the height difference at peak maximum, and (iii) the peak skew difference ( Fig 1A) The retention time difference was evaluated by calculating the cross correlation between the target peak and the simulated peak using scipy.signal.correlate [20] The offset with the highest correlation coeﬃcient was used as the time difference as previously explained [ 18, 19]

The height difference was calculated using Eq.(9):

height=(max(y target)− max(y sim) )/max(y target) (9)

where max( y sim) is the maximum value of the simulated peak and max( y target) is the maximum value of the target peak The peak skew was calculated by ﬁrst treating peaks as probability distributions, dividing them by the area under the curve and then applying

Eq.(10):

where μ is the distribution mean, ν is its median and σ is the standard deviation The difference in skewness was calculated as the skew of the simulated chromatogramminus the skew of the target chromatogram as shown in Eq.(11):

Breakthrough curves were compared based on (i) the difference

in the maximum concentration, (ii) the difference in the time required to reach 50% of the maximum concentration, and (iii) the difference in the time required to increase from 50% to 98% maximum concentration ( Fig.1B) All three values were calculated as percent differences relative to the target chromatograms Eq.(12)

=(t target − t sim)/t target (12)

where is the value of the objective function, t targetis the metric for the target chromatogram (e.g., the time taken to reach 50% of the maximum concentration) and t sim is the same metric for the simulated chromatogram The independent variables of all objective functions were scaled to [-1,1] using Eq.(13) to improve the numerical stability of the algorithms

x = x

where x’ is the scaled independent variable of an objective function and x is the original unscaled variable The source code is available

on github (https://github.com/ronald-jaepel/ChromBayesOpt)

2.7 Calculation of GPs in Python

We used the GaussianProcessRegressor class from sklearn.gaussian_process to calculate all GPs [21] To aggregate multiple objective functions, we modiﬁed a previously published [22] BayesOpt algorithm by overwriting its BayesianOptimization class with a class that can handle multidimensional objective functions This new class was created to hold a GP for each objective function while exposing only a single GP ﬁt and a single

GP predict method Python’s duck-typing allows for the new class

to seamlessly replace the regular GaussianProcessRegressor class from sklearn.gaussian_process We also extended the new class to allow the transformation of the independent variable (x) space

to a unit hypercube, which improves the numeric stability as discussed above The rational quadratic kernel was chosen for all subsequent optimizations because it generated the highest log marginal likelihood compared to all other available kernels,

3

Trang 4

Fig 1 Graphical representation of the six objective functions used to assess the quality of chromatographic simulation results, speciﬁcally the coincidence of experimental

and simulated (gradient elution) peaks and breakthrough curves A Gradient elution peaks were compared based on differences in peak retention time, peak height and skew B Breakthrough curves were compared based on the difference in the maximum concentration peak height, the time to reach 50% of that concentration, and the time required to increase from 50% to 98% of the maximum concentration

as calculated using the log_marginal_likelihood method of the

GaussianProcessRegressor class of the scikit-learn python package

[21] on several sample datasets [ 23, 24] This is desirable be-

cause the log marginal likelihood describes the probability of the

observed data given the assumed model, i.e., kernel

2.8 Hyperparameter optimization

Hyperparameters ( Table1) for the dimensional trimming algo-

rithm ( Section 3.4) were optimized using an I-optimal design of

experiments (DoE) approach with 382 runs of third-order poly-

nomial complexity built in Design-Expert v13 [25] DoE parame-

ter ranges were chosen based on a set of scouting and screening

experiments (data not shown) and the response was the computa-

tional time required by BayesOpt to estimate all SMA isotherm pa-

rameters ( Section4.2) The “IEX Preliminary test” dataset was used

as a reference task (Table S1) Non-signiﬁcant parameters were eliminated from the model by automated backwards selection using a p-value threshold of 0.05 The ﬁnal model achieved an R ² of 0.699, an adjusted R ² of 0.694 and a predicted R ² of 0.688, indicat- ing a suitable model quality (Table S2) Numerical minimization of the response (computational time) resulted in the optimal parameter settings shown in Table1

2.9 Testing BayesOpt with experimental chromatography data

Exocellobiohydrolase 1 (CBH1; UniProt ID P62694) derived from Trichoderma reesei preparation Celluclast 1.5 (Novozymes A/S, Bagsværd, Denmark) was puriﬁed from a 1:20 v v −1 dilution with equilibration buffer (25 mM sodium phosphate, pH 7.5) Puriﬁ-

4

Trang 5

Table 1

DoE for hyperparameter optimization of dimensional trimming applied during BayesOpt

Starting value of the exploration-exploitation factor κ κstart Numeric 0.0 1.0 0.0

cation was performed using a 46 mL Q Sepharose HP XK26/20

Column (GE Healthcare, Chicago, USA) mounted to an ÄKTA pure

25 M system (Cytiva, Marlborough, USA) The column was equili-

brated with ﬁve column volumes (cv) of equilibration buffer, fol-

lowed by loading 0.2 L ( ∼5 cv) of the Celluclast dilution We

then applied 5 cv of equilibration buffer for washing followed by

a step-wise elution (25 mM sodium phosphate 1.0 M sodium chlo-

ride ( ∼50 mS cm −1), pH 7.5), including elution steps at 23.0, 26.0,

and 50.0 mS cm −1 The ﬂow rate was 10.0 mL min −1(11.6 m h −1)

and 4.0 mL fractions were analyzed by lithium dodecylsulfate poly-

acrylamide gel electrophoresis (LDS-PAGE) [ 26, 1] Fractions con-

taining CBH1 were pooled and had a purity of 98% as per den-

sitometric analysis The pooled sample was buffer exchanged into

sample buffer (25 mM sodium phosphate, 25 mM sodium chlo-

ride, pH 7.0, 7.00 mS cm −1) using Vivaspin ﬁlter (Sartorius, Göt-

tingen, Germany) and the CBH1 concentration was 3.78 mg L −1

based on a microtiter-plate Bradford assay (Thermo Fisher Scien-

tiﬁc Inc., USA) [ 27, 2] We loaded 1.0 or 35.2 mL of puriﬁed CBH1

for gradient elution and frontal experiments respectively using a

1 mL Q Sepharose HP pre-packed column (Cytiva) mounted to a

dedicated ÄKTA pure 25 L system (Cytiva) The column had been

equilibrated for 10 cv in the modeling equilibration buffer (25 mM

sodium phosphate, 25 mM sodium chloride, pH 7.0) before sam-

ple loading and was washed for 5 cv using the same buffer after

sample loading Linear gradient elutions of CBH1 were carried out

over 5, 30 or 120 cv up to 100% elution buffer (25 mM sodium

phosphate, 500 mM sodium chloride, pH 7.0) Protein elution was

monitored as ultraviolet light adsorption at 280 nm The ﬂow rate

was 0.50 mL min −1(7.80 m h −1) at all times

The resulting chromatograms were preprocessed by removing

the salt-induced drift in the UV measurements A linear correlation

between the UV adsorption and the conductivity signal was esti-

mated based on the data points during the wash steps both before

and after the gradient elutions Based on this correlation, the UV

signal was corrected for each data point based on the conductivity

measured at that point An exponentially modiﬁed Gaussian distri-

bution Eqs.(14)and (15)[ 28, 3] was ﬁtted to the chromatogram to

remove noise and impurities from the signal

f(x ; h, μ, σ, τ )=h · e−0.5(x−μ

σ )2

·σ τ

π

1

√ 2

π

τ −x−σ μ (14)

erfcx(x)=exp

x2

·√2

π

∞

∫

x e−θ2

Where x is the retention time, f(x) is the UV signal, μ is the

mean of the Gaussian component, σ is the standard deviation of

the Gaussian component, h is the height of the Gaussian compo-

nent, τ is exponent relaxation time, θ is the pseudo variable over

which erfcx is integrated and erfcx is the scaled complementary

error function

The distribution parameters were estimated using the curve_ﬁt

method from scipy.optimize [20] The resulting distribution was

used as a concentration proﬁle and was subjected to the same pa-

rameter ﬁtting described for synthetic data above

3 Theory and calculation

3.1 Directional objective functions

As stated above, BayesOpt performs best using smooth objective functions The objective function most often chosen for the inverse ﬁtting of chromatography models is a minimization of the SSE of the protein elution concentration proﬁle between the experiment and the simulation [19] The SSE objective function has multiple localminima and multiple abrupt changes in slope For example, running simulations with varying k eq and a true k eq of 1.00 (other parameters follow the “IEX Preliminary test” in Table S1) resulted

in a local minimum of SSE at a k eq of 10 −4 and a sharp drop to- wards the global k eq minimum at 1.00 ( Fig 2A and D; note the log10 scale of the x-axis) Accordingly, the SSE objective function is not well suited for BayesOpt

Alternatively, the absolute value (i.e., the magnitude) of the time difference between the simulated and target peak may be used to assess the quality of ﬁtted parameter values ( Fig.2B and E) This function contains only a single global minimum to which minimizing algorithms will converge regardless of the starting conditions However, the function cannot be differentiated in that minimum, which is characterized by an abrupt change in slope This property compromises the objective function’s smoothness and thus impedes the performance of GPs, as discussed above

In contrast, the actual value of the time offset forms a smooth objective function ( Fig.2C and F) and has the additional beneﬁt of providing information whether a simulated peak appears “earlier”

or “later” than the target peak with the optimum being zero Here,

we introduce the term ‘directional objective function’ for objective functions whose optimum is zero and that yield suboptimal values in both the negative and positive number space Hence, they provide additional information showing in which direction a parameter value should be modified for optimization However, directional objective functions introduce a computational challenge because their optimum is not a minimum or maximum and thus cannot be identified effectively using any optimizer available to use A new optimization algorithm is therefore required to identify the optimum, specifically the parameter value(s) that optimize the agreement between simulated and experimental data There- fore, we developed an option to construct such an algorithm

3.2 Adapting the acquisition function to directional objective functions

During BayesOpt, an acquisition function is used to choose the next point in the parameter space for evaluation using the objective function(s) Common examples include the upper conﬁdence bound, the expected improvement, and the probability of improvement [11] The probability of improvement is the likelihood that the objective function at a point in the parameter space that has previously not been evaluated will yield a better parameter value estimate than the best value known up to that iteration in the optimization process The expected improvement quantiﬁes the result by multiplying the likelihood by the relative improvement that can be gained compared to the previous optimum Both probability

Trang 6

Fig. 2 Objective functions for parameter estimation and their approximations using a Gaussian process during Bayesian optimization with SMA parameter k eq as an example

A and D Sum squared error (SSE) between the elution concentration proﬁles of simulated and target peaks B and E Absolute value of the relative time difference between the simulated and target peaks C and F Relative time difference between the simulated and target peaks The top row represents the beginning of the parameter ﬁtting when coverage of the parameter space is sparse (four data points), whereas the bottom row represents a state close to the end of the optimization with nine data points (including three close to the optimum) added to the Gaussian processes

functions return zeros for large fractions of the parameter space if

BayesOpt is close to completion (i.e., the actual optimum), because

the probabilities of improvement in certain regions approach zero

Speciﬁcally, the ratio of the GP uncertainty to the objective func-

tion’s output range becomes very small Accordingly, the function

in these regions does not have a slope that would point to the

optimum, which is therefore diﬃcult to identify at this stage be-

cause new points for evaluation are identiﬁed ineﬃciently In con-

trast, the upper conﬁdence bound does not suffer from this limi-

tation because it returns non-zero values even within undesirable

regions We therefore used the upper conﬁdence bound as an ac-

quisition function for the BayesOpt algorithm but replaced its de-

fault formula ( Eq.(16)) with the one of the lower conﬁdence bound

(LCB) as shown in Eq.(17), which allowed us to construct the ob-

jective functions as minimization tasks ( Section2.6)

where μ is the mean of the GP, σ is the GP standard deviation

and κis the exploration–exploitation tradeoff factor, with high val-

ues of κfavoring the exploration of regions with high uncertainties

over regions close to the values currently yielding the best results

with respect to the objective function(s) However, in the form of

Eq (17), the mean GP value can be negative for sub-optimal pa-

rameter conditions, e.g a simulated peak appearing earlier than

its experimental counterpart Using the absolute value of the GP mean would create a minimum at the optimal function value but this absolute value function cannot be differentiated when the dependent variable is zero, effectively impeding the performance of the gradient-based local optimization of the acquisition function

We therefore approximated the absolute value function by Eq.(18), which is differentiable in each point and has a maximum deviation from the true absolute value of 5 × 10−9 This difference was con-

sidered negligible because the range of the objective functions was scaled to span from –1.0 to 1.0 in order to maximize the numerical stability of the algorithm (see above)

f( μ,σ )= μ2

μ2+10−8 −σ·κ ≈| μ|−σ·κ (18)

We chose not to transform the standard deviation of the objective functions into the statistically correct folded normal distribution because the latter compromised key aspects of the acquisition function when the normal distribution’s range crossed below zero Speciﬁcally, regions with high uncertainty were deemed less fa- vorable for exploration by the acquisition function when using the folded normal distribution, effectively contradicting the purpose of exploration ( Fig 3A) Instead, we used the untransformed uncertainty of the objective function(s) Even though this caused some results to predict negative error values that should be impossible

in theory ( Fig.3, shaded areas below zero), the acquisition function successfully chose the expected locations of interest and the algorithm converged to the correct parameter values

Trang 7

Fig 3 Evaluation of the lower conﬁdence bound (LCB) acquisition function in three scenarios of Gaussian processes (GP) using either an untransformed uncertainty (normal

distribution, green) or the formally correct folded normal distribution (orange) A Scenario with a constant mean (blue line) and varying uncertainty (shaded area) The LCB with a folded normal distribution disregards regions of high uncertainty in the GP and is therefore not useful to identify the next parameter value to be evaluated B Scenario with a varying mean and constant uncertainty Both acquisition functions correctly identify the location where the next parameter values should be evaluated C Scenario with a varying mean and varying uncertainty The minimum of the LCB with unmodiﬁed uncertainty (normal distribution) is closer to where the mean approaches zero than the minimum of the LCB with folded normal distribution Note that a scenario with constant mean and constant uncertainty is not shown because the GP starts after an initial iteration has been performed and there is a non-uniform prior of the objective function

3.3 Multiple objective functions and their aggregation

A single directional objective function is typically capable of

identifying an optimum for only a single independent parameter to

be ﬁtted However, when multiple parameter values need to be op-

timized, a single directional objective function will probably result

in a set of indistinguishable optima: instead of a single root (inter-

section with the objective function at zero value) there will be a

line or area of roots in the multi-dimensional parameter space for

which the objective function adopts a zero value Combining multi-

ple directional objective functions can resolve this ambiguity when multiple parameters need to be optimized at the same time, which

is the case for an SMA isotherm, especially when mass transport is also considered

We therefore built a new optimizer that maintains individual GPs for each objective function and combines all GP estimates of the objective functions into a single, aggregated objective function during the evaluation step performed by the acquisition function

We selected the arithmetic mean to aggregate the individual objective functions ( Eq (19)), with the option to add weightings to

7

Trang 8

the individual objective functions present in the code but unused

for the results in this paper ( Eq.(20)) For example, the weightings

can help to ﬁne-tune the ﬁtting process, for example by placing

emphasis on peak height and skew over retention time ( Fig 1)

Similarly, alternative aggregation functions may also be used to in-

troduce a weighting between individual directional objective func-

tions, such as the geometric mean or the harmonic mean

f(x)=1

n

i=1

x2

i

x2

i+10−8

n

i=1

f(w, x)=1

n

i=1

(w i · x i)2

(w i · x i)2+10−8 ≈1

n

i=1

|w i · x i| (20)

Where f( x) is the aggregated objective function, n is the number

of individual objective functions to be aggregated, x iis the value of

the ith objective function and w iis the weighting assigned to the

i-th objective function

An estimate of the combined uncertainty of the aggregated ob-

jective function is also required to solve the acquisition function

( Eq (18)) Calculating this uncertainty in a closed form was im-

practical, because the form depends on the number of objective

functions involved and would require adaptation if the number

and/or nature of the functions change Estimating the combined

uncertainty using a Monte Carlo method instead [29]increased the

calculation costs about 40-fold (data not shown) Therefore, the in-

dividual standard deviations were combined using the rules of er-

ror propagation ( Eq.(21)), which can also be adapted to the use of

weightings ( Eq.(22))

f( σ )=

n

i=1σ2

i

f(w,σ )=

n

i=1(w i·σi)2

Where n is the number of individual objective functions with un-

certainties to be aggregated and σiis the standard deviation of the

ith objective function

3.4 Dimensional trimming

As described in the introduction, a caveat of BayesOpt is the

increasing computational cost of the ﬁtting and evaluation of GPs

as the number of data points increases For example, the time re-

quired to ﬁt and evaluate the GPs for each search step compared to

the time required for the CADET simulations during each step in-

creased substantially over the course of parameter estimation runs

( Fig.4A and B) Therefore, it would take more time to compute the

parameter values used to execute the next chromatography simu-

lation using CADET than to conduct that simulation We therefore

modiﬁed our algorithm to trim down the parameter dimensions

after a certain number of GP evaluation steps, effectively limiting

the duration of GP computation ( Fig.4C and D)

The trimming procedure ( Fig 5) used a pre-optimized set of

hyperparameters ( Table 1, Section 2.8) and started by ﬁlling the

parameter space with an initial set of candidate points ( n cp) to

be evaluated using CADET These points were distributed through-

out the parameter space using the Enhanced Stochastic Evolution-

ary algorithm from the python surrogate modeling toolbox (SMT)

package, which produces low-discrepancy Latin hypercube sam-

pling distributions [30] Once these points had been evaluated by

CADET, BayesOpt used the GP estimates to select a ﬁxed num-

ber of search points ( n sp) with κ decreasing from a starting value

( k start) to zero, effectively shifting the focus from exploration to ex-

ploitation during search point selection Thereafter, the n p × n bp

best data points were identified, where n p is the number of parameters to be fitted, and the boundaries of the parameter space were shrunk to the ranges spanned by these points Thereafter, the procedure entered the next iteration until a termination-threshold score of 0.005 was achieved, which was equivalent to an average error of 0.5% across the multiple objective functions This threshold can be reduced if higher precision is required at the cost of longer computation times Alternative termination criteria may be specified, such as a fixed number of CADET evaluations The method can rapidly shrink the parameter space in the case of simple optima ( Fig.5) Should multiple local optima exist for one or several

of the parameters to be ﬁtted, the range of the corresponding parameter(s) will shrink only as far as possible while still including these optima

3.5 Algorithm termination condition

We chose a stall threshold of less than stall (here 0.001) improvement over n stall data points with respect to the combined score functions to deﬁne a termination criterion for the algorithm

As the score functions were formulated as percentage differences between the target values and the simulated values, a delta of 0.001 corresponded to an error of 0.1%, which we deemed acceptable For n stall we chose n sp, the number points determined for the dimensional shrinking section As a result, if an entire iteration of the dimensional shrinking procedure elapses without further improvement, the algorithm ended as it had most likely converged to the best possible solution given the respective data input

4 Results and discussion

4.1 Inverse ﬁtting of transport and porosity parameter values

Transport parameters and porosities must be determined to set the boundary conditions for the modeling of packed-bed chromatography columns [31] We assumed that experimental conditions such as the column length and volumetric flow rate would be known We used the lumped rate model with pores to fit values for the column porosity (i.e., inter-particle porosity), particle porosity (i.e., intra-particle porosity), axial dispersion coefficient, and film diffusion coefficient [ 7, 32] We used two types of input data to fit these mass transport parameters: (i) non-pore-penetrating tracer data to determine the column porosity and the axial dispersion coefficient, and (ii) pore-penetrating tracer data to determine the particle porosity and the film diffusion coefficient For subsequent experiments to determine the transport parameters, the adsorption constant k awas set to zero to eliminate interactions between the components and the stationary phase For the experiments with non-pore-penetrating tracers, the particle porosity and film diffusion were also set to zero

We compared the performance of BayesOpt, a multi-start gradient descent algorithm, and a genetic algorithm using four datasets ( Figs 6A, S1, Table S4) that captured the variability of single- protein peak shapes that we have previously encountered during the determination of SMA parameters [33] If restricted to one CPU core, BayesOpt was on average 15 % faster than the multi-start gradient descent algorithm and 4.3-fold faster than the genetic algorithm ( Fig.6B) When parallelizing over 12 CPU cores, the BayesOpt algorithm was on average 37 % slower than gradient descent and

7 % faster than the genetic algorithm ( Fig 6C) Overall, the time required for BayesOpt calculations was less than 5 min and was thus compatible with model updating on a daily basis even for large collections of chromatography data featuring more than 100 individual calculations, for example representing different proteins and chromatography conditions The parameter estimation error in

Trang 9

Fig 4 Computation time required for chromatography parameter estimation and its dependence on isotherm complexity and the size of the parameter search space A SMA

parameter estimation ( k a , k d , νand σ) and the resulting change in duration of CADET and GP computation times using a ﬁxed parameter space throughout the process B HIC parameter estimation ( k eq , k kin , m, β 0 and β 1 ) and the resulting change in duration of CADET and GP computation times using a ﬁxed parameter space throughout the

process C As in A, but including a dimensional trimming step for the GP D As in B, but including a dimensional trimming step for the GP The trimming procedure ( Fig 5 ) causes abrupt changes in the GP duration of operation in panels C and D

the BayesOpt method was at least 100-fold lower than the stan-

dard deviation of the same parameters in replicated experiments

[33] (Table S3) Overall, we deemed the BayesOpt error accept-

able for the estimation of transport parameters in chromatography

models, even though it was signiﬁcantly ( p≤ 0.001) higher than

the multi-start gradient descent error for all datasets except the

external pore-penetrating dataset ( Fig.6D)

4.2 Inverse ﬁtting of SMA isotherm parameter values

Calibrating an SMA model based on experimental data can be

achieved by (i) estimating k a , k d ,νand σ based on gradient elution

and a breakthrough data, or (ii) estimating k eq(i.e., the ratio of k a

and k d) and νbased on several gradient elutions [ 8, 31] When test-

ing the three algorithms on in silico generated datasets in the ﬁrst

scenario (gradient elution and breakthrough data, Fig.7), we found

that convergence was achieved on a single CPU core on average

∼12-fold faster using BayesOpt compared to the multistart gradi-

ent descent algorithm and ∼22-fold faster compared to the genetic

algorithm ( Fig.7C, Table S6) When parameter ﬁtting was executed

on 12 CPU cores in parallel, BayesOpt was still 3-fold faster than

the multi-start gradient descent algorithm and 4-fold faster than

the genetic algorithm ( Fig.7D)

Similarly to the results for the transport parameters, BayesOpt

generated higher parameter estimation errors and larger rSSE val-

ues compared to the multistart gradient descent algorithm on all

datasets except for the external dataset ( Fig.7E and F) We can-

not compare the parameter estimation errors to experimental re-

sults because the actual parameter values of real proteins are unknown However, we can compare the standard deviations of the parameter estimates, produced by the algorithms on artiﬁcial data,

to the standard deviation of the same parameters obtained from replicated experiments We found that at worst the standard deviation of BayesOpt (4.14 × 10−3, n= 12) was ‘only’ two orders of magnitude lower than the experimental standard deviation obtained for ribulose-1,5-bisphosphate carboxylase-oxygenase (Ru- BisCO) (7.21 × 10−1, n= 3) on 1 mL Q Sepharose HP column ﬁtted

by gradient descent (Table S5) Therefore, the error introduced by BayesOpt was only 0.6% of the experimental uncertainty, which we consider acceptable Furthermore, the differences between the predicted and target chromatograms were marginal in all cases (Figs S2–S5) Therefore, we deemed BayesOpt suitable for the estimation

of SMA parameters in chromatography models based on combined breakthrough and gradient elution data but concede that multistart gradient descent and the genetic algorithm can achieve higher parameter certainties

4.3 Inverse ﬁtting of SMA isotherm parameters k eqandν based on gradient elution data

More than 500 mg of pure protein is typically required for breakthrough curve experiments [33], which is diﬃcult to obtain during early downstream process development Because the information derived from these curves (i.e., an estimate of σ and thus column capacity) is not usually required at that development stage, estimating k eq(i.e., the ratio of k aand k d) and νbased on gradient

9

Trang 10

Fig. 5 Graphical representation of the steps in the algorithm used for dimensional trimming A An initial set of candidate points ( n cp ) is distributed throughout the

parameter space using the Enhanced Stochastic Evolutionary algorithm B GP estimates are used to select additional search points ( n sp ), initially focusing on exploration –

the sampling of high-uncertainty regions in the parameter space C By iteratively reducing the exploration–exploitation factor κfor each search point, GP selections favor exploitation (i.e., investigate regions close to the current optimum) in the course of search point selection D The top n p × n bp points in terms of the objective function value are identified to form the basis of a new parameter range, where n p is the number of parameters to be optimized and n bp is the best point threshold E The parameter range spanned by the n p × n bp points is used to define new boundaries for the values of the parameters to be fitted F–H The new boundaries are applied and the search

is reiterated until a termination condition is reached (see Section 3.5 ) Each dot in the panels represents an aggregated objective function score for a given set of parameter values (e.g., SMA parameters) Objective function scores were calculated as described in Section 2.6 and aggregated as described in Section 3.3

elution experiments alone is another relevant task in chromatogra-

phy modeling We evaluated all three algorithms on three in silico

datasets using three elution curves of 5, 30 and 120 column vol-

umes (cv) each ( Fig.8) The external dataset was not used because

it contained only a single gradient elution proﬁle On a single CPU

core, we found that BayesOpt was on average 25.0-fold faster than

the multi-start gradient descent algorithm and 37.9-fold faster than

the genetic algorithms On 12 CPU cores, BayesOpt was on average

∼7-fold faster than both alternative algorithms (Table S7)

As before, the peak shapes of the simulated and target chromatograms were very similar for all approaches (Figs S6–S8), even though BayesOpt had signiﬁcantly ( p≤ 0.001) higher parameter estimation errors and rSSE values compared to the gradient descent algorithm on two of the three in silico datasets and signiﬁcantly lower ( p≤ 0.001) errors on the internal dataset 3 The variability introduced by BayesOpt never exceed 1.3% of the standard deviation of the parameter values experimentally determined by replicated measurements (Table S5) We therefore considered BayesOpt

10

of the parameters. .. p is the number of parameters to be ﬁtted, and the boundaries of the parameter space were shrunk to the ranges spanned by these points Thereafter, the procedure entered the next iteration... evaluation of GPs

as the number of data points increases For example, the time re-

quired to ﬁt and evaluate the GPs for each search step compared to

the time required for the CADET

Tiêu đề	Bayesian optimization using multiple directional objective functions allows the rapid inverse fitting of parameters for chromatography simulations
Tác giả	Ronald Colin Jọpel, Johannes Felix Buyel
Trường học	Fraunhofer Institute for Molecular Biology and Applied Ecology IME, Forckenbeckstrasse 6, Aachen 52074, Germany; Institute for Molecular Biotechnology, RWTH Aachen University, Worringerweg 1, Aachen 52074, Germany
Chuyên ngành	Chromatography, Process Development, Parameter Estimation
Thể loại	Research Article
Năm xuất bản	2022
Thành phố	Aachen

Định dạng
Số trang	16
Dung lượng	2,69 MB