Báo cáo y học: "A multivariate Bayesian model for assessing morbidity after coronary artery surgery" docx

We developed a Bayes linear model to discriminate morbidity risk after coronary artery bypass grafting and compared it with three different score models: the Higgins' original scoring sy

Trang 1

Open Access

Vol 10 No 3

Research

A multivariate Bayesian model for assessing morbidity after

coronary artery surgery

Bonizella Biagioli1, Sabino Scolletta1, Gabriele Cevenini1, Emanuela Barbini2,

Pierpaolo Giomarelli1 and Paolo Barbini1

1 Department of Surgery and Bioengineering, University of Siena, Viale Bracci, 53100 Siena, Italy

2 Department of Physiopathology, Experimental Medicine and Public Health, University of Siena, Via Aldo Moro, 53100 Siena, Italy

Corresponding author: Paolo Barbini, barbini@biolab.med.unisi.it

Received: 27 Jan 2006 Revisions requested: 13 Mar 2006 Revisions received: 4 May 2006 Accepted: 17 May 2006 Published: 17 Jul 2006

Critical Care 2006, 10:R94 (doi:10.1186/cc4951)

This article is online at: http://ccforum.com/content/10/3/R94

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Introduction Although most risk-stratification scores are

derived from preoperative patient variables, there are several

intraoperative and postoperative variables that can influence

prognosis Higgins and colleagues previously evaluated the

contribution of preoperative, intraoperative and postoperative

predictors to the outcome We developed a Bayes linear model

to discriminate morbidity risk after coronary artery bypass

grafting and compared it with three different score models: the

Higgins' original scoring system, derived from the patient's

status on admission to the intensive care unit (ICU), and two

models designed and customized to our patient population

Methods We analyzed 88 operative risk factors; 1,090

consecutive adult patients who underwent coronary artery

bypass grafting were studied Training and testing data sets of

740 patients and 350 patients, respectively, were used A

stepwise approach enabled selection of an optimal subset of

predictor variables Model discrimination was assessed by

receiver operating characteristic (ROC) curves, whereas

calibration was measured using the Hosmer-Lemeshow

goodness-of-fit test

Results A set of 12 preoperative, intraoperative and

postoperative predictor variables was identified for the Bayes linear model Bayes and locally customized score models fitted according to the Hosmer-Lemeshow test However, the comparison between the areas under the ROC curve proved that the Bayes linear classifier had a significantly higher discrimination capacity than the score models Calibration and discrimination were both much worse with Higgins' original scoring system

Conclusion Most prediction rules use sequential numerical risk

scoring to quantify prognosis and are an advanced form of audit Score models are very attractive tools because their application

in routine clinical practice is simple If locally customized, they also predict patient morbidity in an acceptable manner The Bayesian model seems to be a feasible alternative It has better discrimination and can be tailored more easily to individual institutions

Introduction

Since the mid-1980s, many predictive models for the

assess-ment of cardiac postoperative mortality have gained popularity

in the medical community [1] Because much has happened in

the field of cardiac surgery in recent years, mortality is now low

and morbidity has been suggested as both a valid end point

and a more attractive target for developing operative risk

mod-els [2] General severity-of-illness modmod-els can be inaccurate

when applied to specific groups of patients, even if they are valid for comparing outcomes in large numbers of patients [3], and the inaccuracy of these models makes them inappropriate for predicting individual outcome [4,5] Predictive models, therefore, provide significant advantages in clinical decision-making only if they are customized to the specific population

of patients to be investigated Moreover, although most risk-stratification variables are derived from preoperative patient

AUC = area under the ROC curve; CABG = coronary artery bypass graft; CPB = cardiopulmonary bypass; CPDF = conditional probability density function; DO2I = oxygen delivery index; FC = fully customized; FPF = false-positive fraction; IABP = intra-aortic balloon pump; ICU = intensive care unit; LOO = leave one out; PC = partially customized; ROC = receiver operating characteristic; SE = sensitivity; SP = specificity; TPF = true-positive fraction.

Trang 2

characteristics [6-10], there are several intraoperative and

postoperative physiological variables that can influence

mor-bidity and mortality [11,12]

Higgins and colleagues previously evaluated the relative

con-tribution of preoperative conditions, operating theater events

and physiological parameters on admission to the intensive

care unit (ICU) to outcome, describing a sequential model

derived from the patient's status on admission to the ICU [11]

This model is complementary to the preoperative score of the

same study group [13] Higgins' models, similar to certain

other models, use univariate and multivariate logistic

regres-sion to quantify prognosis by a numerical scoring system, but

caution is needed in applying scores to individuals

[14][15][16]

Algorithms for classification derived from the Bayes theorem

can be valid alternatives to logistic regression in discrimination

problems The measured set of individual features serves as

input to a decision rule by which the patient is assigned to a

morbidity risk class A key characteristic of this approach is

that, given complete knowledge of the statistics of the patterns

to be classified, the Bayes rule defines the optimum classifier

that minimizes the probability of classification error or the

expected cost of an incorrect decision [17] A Bayes linear

classifier is the simplest approach, but, in the Bayes sense, it

is optimal only for normal distributions with equal covariance

matrices of the classification groups However, in many cases,

the simplicity and robustness of the linear classifier

compen-sate for the loss of performance occasioned by nonnormality

or nonhomoscedasticity [17-19] In clinical decision-making it

is easy to implement and locally customize, because the

statis-tics of the patterns to be classified only require knowledge of

the group means and the pooled within-sample covariance

matrix, which can be estimated by a training set of correctly

classified cases [18] The simplicity of a linear classifier, which

enables it to be easily tailored and updated to the patient

pop-ulation of a given institution, is a significant advantage of this

approach in clinical practice, with respect to multiple logistic

regression The Bayes approach also provides a decision rule

for prognosis derived from the whole set of measured

predic-tor variables rather than from scores obtained with logistic

regression from group characteristics [20] These aspects

have led to widespread use of the Bayes decision rule in

dis-crimination problems instead of logistic regression [21]

The aims of this study were as follows:

1) to develop an ICU–Bayes model to select the preoperative,

intraoperative and postoperative risk factors that best predict

postoperative morbidity for coronary artery bypass graft

(CABG) patients

2) to evaluate the reliability of score models in our population

of patients

3) to compare these different models as predictors of morbid-ity risk

Materials and methods

Patient population

This is an observational study approved by the Ethics Commit-tee of our institution All patients gave their written, informed consent All data were entered into a prospectively collected database and retrospectively analyzed for the purposes of this study The computerized database files of 1,090 consecutive adult CABG patients were analyzed The database was divided into two subsets: the first consecutive 740 patients, who underwent CABG surgery between 1 January 2002 and

31 December 2003, served as a training set to develop the Bayesian risk model and customize score models to our pop-ulation of patients; and the next consecutive 350 patients (the testing set), who underwent CABG surgery between 1 Janu-ary 2004 and 31 December 2004, were used for testing the predictive performance of the models on new data Standard preoperative and postoperative management and cardiopul-monary bypass (CPB) were performed [22]

Risk predictor variables included in the model

We selected 88 preoperative, intraoperative and postopera-tive variables, which could be associated with postoperapostopera-tive morbidity, from the literature We analyzed the influence of each predictor on outcome The variable set included all the predictors of both Higgins' models [11,13] CABG proce-dures were divided into three periods: pre-CPB, during CPB, and post-CPB Preoperative and intraoperative data were col-lected under the anesthesiologist's supervision Post-CPB consisted of two data-collection periods: data were collected

in the first three hours after admission to the ICU, and postop-erative outcome data were retrieved from the medical records after discharge from the ICU

According to the definitions of Higgins and colleagues [11,13], emergency cases were defined as unstable angina, unstable hemodynamics or ischemic valve dysfunction that could not be controlled medically Left ventricular ejection fractions <35% were considered severely impaired Diabetes

or chronic obstructive pulmonary disease was diagnosed only

if the patient was maintained on appropriate medication CPB time was the total of all bypass runs if a second or subsequent period of bypass was conducted Re-operation was consid-ered as a separate predictor variable in the analysis [11]

Outcome variables

The primary outcome in this study was morbidity, which was defined as one or more of the following events:

1) cardiovascular complications: myocardial infarction (docu-mented by electrocardiography and enzyme criteria); low car-diac output requiring inotropic support for >24 hours, an intra-aortic balloon pump (IABP) or a ventricular assist device; or

Trang 3

severe arrhythmias requiring treatment or cardiopulmonary

resuscitation

2) respiratory complications: prolonged ventilatory support

(defined as mechanical ventilatory support for >24 hours);

re-intubation; tracheostomy; clinical evidence of pulmonary

embolism or edema; or adult respiratory distress syndrome

3) neurological complications: central nervous system

compli-cations (defined as a focal brain lesion (confirmed by clinical

findings or computed tomographic scan, or both), diffuse

encephalopathy with >24 hours of severely altered mental

sta-tus or unexplained failure to awaken within 24 hours of the

operation)

4) renal complications: acute renal failure (need for dialysis)

5) infectious complications: serious infection was defined as

culture-proven pneumonia, mediastinitis, wound infection,

septicemia (with appropriate clinical findings) or septic shock

6) hemorrhagic complications: bleeding requiring

re-opera-tion

Different authors tend to use their own criteria to compare the

performances of different risk models for predicting morbidity

[23-25] We preferred the outcome criteria for morbidity in the

original database of the Cleveland Clinic Foundation, from

which the scoring systems were derived [11,13], to evaluate

the reliability of Higgins' scores to predict complications in our

patient population

Bayes linear model

For discriminating patients at risk of morbidity (M) from those

with a normal clinical course (N; or at low risk of morbidity) a

Bayes classification scheme was used [20,25,26] Using the

set of measured variables (x) for a patient, the Bayes rule

ena-bles morbidity risk evaluation directly through the posterior

conditional probability of morbidity:

(Where P(M) is the prior probability of morbidity, P(N) = 1

-P(M) is the prior probability of normal course, and p(x|M) and

p(x|N) are the conditional probability density functions

(CPDFs) of morbid patients and of normally recovering

patients, respectively.)

Similarly, the posterior conditional probability of normal course

is as follows:

A reasonable discrimination criterion would be to assign

patient x to the population with the largest posterior

probabil-ity, but the decision rule can also be chosen using somewhat different reasoning [17]

If no assumptions at all are made about the form of the CPDFs, these functions must be estimated from the training set (cor-rectly classified cases for both classes of patient) by certain nonparametric methods [18] Despite recent interest in these nonparametric methods, the overwhelming majority of applica-tions of discrimination and classification still rely on various parametric assumptions In this paper, we assumed normal CPDFs with equal covariance matrices, because in many cases this choice provides a simple and robust method of dis-crimination, especially when many variables are available and have to be selected [17-19] The practical benefits of making this assumption are that the discriminant function and alloca-tion rule become very simple indeed In particular, according

to these hypotheses, the decision boundary for discrimination

is given by a linear function in x and the corresponding model

is, therefore, described as linear [17] In addition, the CPDFs are easily estimated and locally tuned, because they require only the calculation of group means and the pooled within-sample covariance matrix Indeed, the CPDF of group i (i = M

or N) is given by the well-known multivariate normal probability density as follows:

(Where µi is the mean of class i, ∑ is the covariance matrix

(which is assumed to be the same for M and N), q is the number of predictor variables used for discrimination and the superscript T indicates matrix transposition.)

Of course, in our model µM, µN and ∑ were estimated from the

means and covariance matrix, which were calculated from the training set of patients in classes M and N The prior

probabil-ities P(M) and P(N) were both assumed to be 0.5.

A stepwise approach was used to select an optimal subset of predictor variables to be included in the Bayesian model [3,25,26] The capacity of the model to discriminate between patients who will have complications after surgery and patients who will have a normal clinical course was assessed from the receiver operating characteristic (ROC) curves [16] The goodness of fit of the Bayesian model was evaluated using the Hosmer-Lemeshow χ2 statistic [15] Finally, testing data were used to evaluate the model's generalization capacity All com-puter calculations for the Bayesian model were performed using the MATLAB® software package (The MathWorks Inc., Natick, MA, USA) [27]

ROC curves

It is well known that ROC curves give a graphic representation

of the relationship between the true-positive fraction (TPF) and

=

( | ) ( | ) ( | ) ( )

( | ) ( ) ( | ) ( )

p( | )

/





1 2

1

2

Trang 4

the false-positive fraction (FPF) ROC curves can be used to

study the effect of changing the discrimination criterion,

namely of selecting a probability threshold to be compared

with the predicted model probability of morbidity [28] By

using the sensitivity (SE) and specificity (SP) values, the ROC

curve is obtained by plotting SE = TPF against 1 - SP = FPF

in a squared box, where the area under the ROC curve (AUC)

is commonly used to measure the predictive power of the

sta-tistical discrimination model [28-30]

The discrimination criterion assumed in our model was settled

by choosing the point on the ROC curve where SE = SP If the two classes of patients have equal prior probabilities and nor-mal distributions with equal covariance matrices, this choice provides an optimal discrimination rule, minimizing the proba-bility of error [17] The corresponding decision probaproba-bility was taken as the threshold to discriminate between high and low risk of morbidity Of course, different criteria (such as, different pairs of SE and SP) can be chosen, depending on the clinical cost of a wrong decision [17]

Table 1

Demographics, baseline patient characteristics, main operative data and morbidity outcomes

BSA, body surface area; CHF, congestive heart failure; COPD, chronic obstructive pulmonary disease; CPB, cardiopulmonary bypass; Hct, hematocrit; IABP, intra-aortic balloon pump; LVEF, left ventricular ejection fraction; MI, myocardial infarction; N, number of cases; PVD/CD, peripheral vascular disease/carotid disease; REDO, re-operation; SD, standard deviation; TIA, transient ischemic attack.

Trang 5

The discrimination performance of the model was evaluated by

analyzing the ROC curve and its 95% confidence interval,

which was fitted from the training set using a

maximum-likeli-hood estimation procedure by assuming binormal distribution

of the data [30]

Model generalization

A key element in statistical discrimination is the model's

gen-eralization capacity, which is estimated by the model's

per-formance on a test set that is not used for training The model

generalizes well if errors in testing and training sets do not

dif-fer significantly A well-known source of loss of generalization

power is the use of too many predictor variables [31] A

greater number (q) of predictor variables requires a greater

number of parameters (q for each group mean and

for the pooled within-sample covariance matrix) to be

esti-mated to define normal CPDFs Of course, with a set of

train-ing data, the accuracy of the model's parameter estimates

rapidly worsens as q increases, leading to a significant loss in

generalization capacity A minimum subset of predictor

varia-bles (also called 'features') that provides high generalization

power to the Bayes linear classifier should, therefore, be sought using an optimization criterion We used a computer-aided stepwise technique [32] combined with the leave-one-out (LOO) method of cross-validation [33] to check the model generalization directly during the feature-selection process At each step of the process, a variable was entered or removed from the predictor subset and the significance of its contribu-tion to the AUC was evaluated The stepwise process stopped

if no variable satisfied the criterion for inclusion or removal The LOO method is particularly useful in biomedical applications where little data is usually available, because it enables all the data to be used efficiently for training the classification model

and testing its predictive performance For n available input– output data, it considers n distinct training sessions combining (n - 1) cases in all possible ways The n cases left out, one per

session, were used to calculate the testing discrimination per-formance, which was evaluated by the AUC

Comparison of score models with the Bayesian model

Because comparison of a locally customized model with a pre-viously published model can be unfair, the method proposed

by Higgins and colleagues for assessing morbidity risk in the ICU [11] was employed to design score models customized to our patient sample The following two different approaches were used:

1) exactly following Higgins' procedure, including the selec-tion of variables

2) tailoring a score model with the variables selected for the Bayesian model to our data set

The first choice, the fully customized (FC) score model, ena-bles comparison of our proposed Bayes model with the best-possible score model designed from our patient sample by mimicking Higgins' method The second choice, the partially customized (PC) score model, was built to evaluate differ-ences in model performance when the same predictor varia-bles were used

Bayesian and score models were compared for discrimination and calibration [29] Model discrimination was tested by ana-lyzing the ROC curves derived from the technique developed

by Metz et al [30] Model calibration was evaluated using the

Hosmer-Lemeshow goodness-of-fit test [15]

All computer calculations for score models were performed using the SPSS® statistical package (SPSS Inc., Chicago, IL, USA) [34]

2

Table 2

Stepwise selection of variables for discriminating morbidity

with the Bayesian model

STEP Variable Area under ROC curve

1 DO2I (mL/minute/m 2 ) after 3

2 Inotropic support after CPB 0.7407

4 Preoperative creatinine (mg/dL) 0.7692

8 Duration of CPB (minutes) 0.7923

10 WBC (10 3 /mm 3 ) after 3 hours

in ICU

0.7966

The area under the receiver operating characteristic (ROC) curve

represents discrimination performance The stepwise technique was

employed in combination with the leave-one-out method of

cross-validation The variable that adds the most or least to the area under

the ROC curve is entered or removed, respectively, at each step The

stepwise process stopped when no further significant improvement

was found CABG, coronary artery bypass graft; CD, carotid

disease; CPB, cardiopulmonary bypass; DO2I, oxygen delivery index;

IABP, intra-aortic balloon pump; ICU, intensive care unit; PVD,

peripheral vascular disease; REDO, re-operation; WBC, white blood

cells.

Trang 6

Demographics, morbidity and mortality rate

Demographics, baseline patient characteristics, main

opera-tive data and morbidity outcomes for both training (740

patients) and testing (350 patients) sets are shown in Table 1

In-ICU morbidity was around 21%, whereas a mortality rate of

about 2% was recorded within 30 days of the operation This

small mortality group was regarded as unsuitable for validating

the score risk models and performing a Bayes analysis with

respect to the mortality end point

Bayes linear model

The stepwise procedure selected 12 variables (Table 2)

Uni-variate analysis indicated that the oxygen delivery index (DO2I)

was the most discriminating variable and it was, therefore,

selected as the first step of the stepwise procedure In the

fol-lowing steps, addition of other variables produced increases in

the AUC, which reached a value about 13% greater at the final

step than at the first step (Table 2) After calibrating the

Baye-sian model with this set of predictor variables, the

Hosmer-Lemeshow test showed a good fit (P = 0.35).

Figure 1 shows the ROC curve (bold line) and its 95%

confi-dence interval (fine lines), which were estimated by the training

set by assuming binormal distribution of the data The corre-sponding AUC was 0.80 (with a 95% confidence interval ranging from 0.75 to 0.83) The empirical ROC curve obtained from the testing set is the dashed line in the same figure and the corresponding AUC is equal to 0.79 The ROC curve and AUC for the testing set are very close to the training set esti-mates, indicating that the Bayes linear model designed from training set maintains very similar discrimination performance with new data

The cross on the estimated ROC curve indicates the point at which SE = SP (72%) This choice corresponded to a proba-bility threshold of 0.427: patients with an estimated posterior probability of morbidity greater than or equal to 0.427 were classified as at high risk With this decision criterion, the per-centage of correctly predicted cases in the testing set was 70.6% (247 out of 350 patients): the Bayesian model cor-rectly recognized 61 out of 86 morbidity cases (70.9%) and

186 out of 264 uncomplicated cases (70.5%) This value is within the confidence interval of the ROC curve estimated with the training data (Figure 1)

More specifically, all patients in the testing set who developed infections were correctly identified by the model as at high risk The performance of the prediction model deteriorated slightly

if patients had other types of complications (91%, 79%, 77%, 74% and 62% for renal, cardiovascular, respiratory, neurolog-ical and hemorrhagic complications, respectively) The high percentages obtained for most complications are not surpris-ing because high-risk patients often had matchsurpris-ing complica-tions In fact, algorithm performance improved sharply as the number of concomitant complications increased Again, in the test set 61%, 79%, 89% and 100% of morbidity cases were correctly discriminated when the number of complications was one, two, three or more than three, respectively

Fully customized score model

Locally selected variables and corresponding scores of the FC model are shown in Table 3 The Hosmer-Lemeshow test proved that the FC score model fits our data well Regarding the discrimination capacity, Figure 2 shows both the estima-tion of the ROC curve (bold line) derived from the training set using the binormal method and the empirical ROC curve (dashed line) obtained with the testing set The 95% confi-dence interval of the estimated ROC curve is bounded by the two fine lines in the same figure The AUC corresponding to the estimated ROC curve was 0.76 (with a 95% confidence interval ranging from 0.72 to 0.80), whereas the AUC calcu-lated from testing data was 0.74 Also for this model, the dis-crimination performance was similar in testing data and training set estimates As for the Bayesian model, we assumed

a decision criterion, such as to obtain the most similar possible values of SE and SP This corresponded to a threshold score

of 4 (that is to say, patients with a score greater than or equal

to 4 were classified at high risk of morbidity)

Figure 1

Receiver operating characteristic (ROC) curve for the Bayes linear

model

Receiver operating characteristic (ROC) curve for the Bayes linear

model The ROC curve (bold continuous line) and its 95% confidence

interval (fine continuous lines) were derived from the training set by a

maximum-likelihood estimation procedure by assuming binormal

distri-bution of the data The corresponding area under the ROC curve was

0.80 (with a 95% confidence interval ranging from 0.75 to 0.83) The

cross indicates the point where SE and SP are equal The dashed line

represents the empirical ROC curve obtained from testing data FPF,

false-positive fraction; SE, sensitivity; SP, specificity; TPF, true-positive

fraction.

Trang 7

The FC score model seems to have lower discrimination

capacity than the Bayes linear classifier, because the AUC

obtained from the former model was lower than the AUC

cor-responding to the latter model The Metz technique for

com-parison of ROC areas proved a significant difference between

the AUCs of the two models (P < 0.05).

Partially customized score model

Table 4 shows the scores associated with the PC score

model: according to the model construction criterion, the

vari-ables in the PC score model were the same as those in the

Bayes model The Hosmer-Lemeshow goodness-of-fit test

demonstrated good calibration

Figure 3 shows the estimation of the ROC curve and its 95%

confidence interval that were obtained from the training set, by

assuming binormal distribution of the data, and the empirical

ROC curve derived from the testing set The AUC of the

esti-mated ROC curve was 0.75 (with a 95% confidence interval

ranging from 0.71 to 0.80), whereas the AUC appraised from

the testing data was 0.73 The discrimination capacity of the

PC model was, therefore, clearly worse than that of the

Baye-sian model: the Metz test for ROC area comparison indicated

high statistical significance (P < 0.01) The threshold score,

which corresponded to about equal SE and SP, was the same

as for the FC score model (that is exactly = 4)

Higgins' score model

The performance of the score model became radically worse when morbidity risk in our ICU patients was discriminated using the standard scoring system, as originally proposed by

Higgins et al [11] This is evident comparing Figures 1, 2, 3

with Figure 4, which shows the ROC curves obtained for the Higgins' original scoring system with preoperative conditions, operating theater events and measurements on admission to the ICU In this case, AUC values were 0.70 and 0.69 when calculated by estimated and empirical ROC curves, respec-tively A score threshold of 6 was chosen to obtain approxi-mately equal SE and SP

Bayesian model application

An example of the application of the Bayesian model is shown

in Table 5 Two patients were classified, one who did not develop complications (case A) and one who only developed

a cardiovascular complication (case B) Table 5 shows the

val-ues of the 12 predictor variables (x) for both cases and those

of the CPDFs for morbidity and normal conditions (p(x|M) and

p(x|N), respectively) p(x|M) and p(x|N) were calculated using

equation 3 (above), which required knowledge of the group means and covariance matrix (calculated from the training set)

and involved the use of a personal computer Assuming P(M)

= P(N) = 0.5, the Bayes posterior probability of morbidity was

derived from equation 1 (above) Because this posterior prob-ability was less than the selected probprob-ability threshold (0.427)

in case A, whereas in case B it was greater, both patients were

Table 3

Fully customized score model

CABG, coronary artery bypass graft; CD, carotid disease; CI, confidence interval; CPB, cardiopulmonary bypass; DO2I, oxygen delivery index; IABP, intra-aortic balloon pump; LVEF, left ventricle ejection fraction; OR, odds ratio; PAH, pulmonary artery hypertension; PVD, peripheral vascular disease; RC, regression coefficient; SE, standard error; SvO2, mixed venous oxygen saturation; T, body temperature; VCO2, carbonic dioxide production.

Trang 8

correctly classified By contrast, cases A and B were both

classified as high risk by all score models (Table 5)

Discussion

A model that predicts the outcome of patients in the ICU with

good discrimination can be useful because the risk prediction

enables better allocation of resources, for example, and can

aid decisions about the appropriateness of continuing

treat-ment [35] Most studies have concentrated on short-term

mor-tality, and there is a lack of easy-to-use models predicting risk

of complications (morbidity) However, mortality by itself might

not be an adequate indicator of quality of care or resource use

[2,14] On the contrary, morbidity might be more informative,

being a more frequent event than mortality and enabling

statis-tical inferences to be drawn from smaller populations Finally,

morbidity can be measured in terms of postoperative

compli-cations and length of stay in the ICU [1,14] Several authors

have developed predictive indices of stay in the ICU after heart

surgery Most of these studies included preoperative variables

and, generally, did not take events affecting patient outcome in

the operative or immediate postoperative period into account

Of course, quantifying risk and assessing outcome in the ICU

after cardiac surgery according to preoperative variables alone

could lead to incorrect conclusions about the true morbidity

risk [1,11,14] We chose to consider the contribution of pre-operative conditions, operating theater events and physiologi-cal measurements on admission to the ICU and selected an optimal subset of predictor variables using a stepwise tech-nique The aim of the study was to compare two approaches for risk discrimination in ICU patients after heart surgery: a Bayes linear classifier developed in our specialized ICU, and score models designed in our training set using the method proposed by Higgins and colleagues [11]

Both approaches have strengths and weaknesses The great-est benefit of a score model is that it only requires the sum of integer factors and is, therefore, very simple to apply in routine clinical practice However, the Higgins approach first requires the development of a logistic regression model Although con-tinuous and categorical predictor variables can be mixed, the model development can be problematic because logistic regression is very sensitive to correlations between predictors

in the model [16] If the predictor variables are highly corre-lated during local application, the result is a loss of information

To overcome this problem, we used a stepwise procedure, similar to that employed with the Bayes model, to select varia-bles to enter in the logistic regression model A weakness of the scoring system is the difficulty of locally customizing this

Figure 2

Receiver operating characteristic (ROC) curve for the fully-customized

score model

Receiver operating characteristic (ROC) curve for the fully-customized

score model The ROC curve (bold continuous line) and its 95%

confi-dence interval (fine continuous lines) were derived from the training set

by a maximum-likelihood estimation procedure by assuming binormal

distribution of the data The corresponding area under the ROC curve

was 0.76 (with a 95% confidence interval ranging from 0.72 to 0.80)

The cross indicates the point where SE and SP are equal The dashed

line represents the empirical ROC curve obtained from testing data

FPF, false-positive fraction; SE, sensitivity; SP, specificity; TPF,

true-positive fraction.

Figure 3

Receiver operating characteristic (ROC) curve for the partially-custom-ized score model

Receiver operating characteristic (ROC) curve for the partially-custom-ized score model The ROC curve (bold continuous line) and its 95% confidence interval (fine continuous lines) were derived from the train-ing set by a maximum-likelihood estimation procedure by assumtrain-ing binormal distribution of the data The corresponding area under the ROC curve was 0.75 (with a 95% confidence interval ranging from 0.71 to 0.80) The cross indicates the point where SE and SP are equal The dashed line represents the empirical ROC curve obtained from testing data FPF, false-positive fraction; SE, sensitivity; SP, spe-cificity; TPF, true-positive fraction.

Trang 9

type of model if training sets planned in a different institution

are used The design of the scoring system requires a complex

process, which can have low interobserver reproducibility In

particular, to refit the logistic model using all predictors as

cat-egorical variables, Higgins and colleagues used a locally

weighted smoothing scatterplot procedure, which involves

subjective choices, to identify cut-off points Similar difficulties

might also be encountered when the model is updated with

new data, such as improved results resulting from

technologi-cal advances Easy updating is a crucial feature In fact,

acqui-sition of correctly classified new patients enables the training

set to be increased day by day, with corresponding

improve-ment in discrimination performance of the model Progress in

medical techniques also makes it necessary to be able to

change decision-making models continuously For example,

the dramatic decrease in cardiac postoperative mortality

means that morbidity is now used as the new end point for

developing operative risk models Bayes linear discrimination

provides much more ductile models because their tuning to

new data sets is a rapid and objective procedure that only

requires calculation of predictor variable means in the two risk

classes and pooled variances and covariances in the whole

training set A weakness of this approach is that it is optimal

only if the CPDFs of the two classes can be assumed normal

and with equal covariance matrices However, this type of

classifier is used in a wide range of clinical applications

because its simplicity and robustness compensate for the loss

of performance resulting from incomplete observance of the

above statistical hypotheses [17-19] Our results show that

the Bayes linear classifier can predict all types of

complica-tions, especially infection and renal failure Discrimination

increases with the number of complications In particular, the model exactly recognized patients with more than three com-plications

The area under the ROC curve, estimated by a maximum-like-lihood procedure by assuming binormal distribution of the data, was significantly higher for the Bayes linear model Sim-ilar results were obtained by evaluating the empirical ROC curves obtained from the testing set According to the Hosmer and Lemeshow criterion [15], all locally customized models had acceptable discrimination capacities in the testing data set, because their AUCs were much greater than 0.7 and less than 0.8 On the contrary, the AUC of Higgins' standard scor-ing system calculated with the testscor-ing data set did not reach 0.7, indicating poor discrimination capacity for this model in our patients With regard to calibration, the Hosmer-Leme-show test Hosmer-Leme-showed good fit for all models, except Higgins' scoring system Table 5 sums up the discrimination and cali-bration performances tested for the Bayes linear classifier and

FC, PC and Higgins' standard scoring models It points out that the two locally customized score models had significantly lower discrimination capacities than the Bayes linear classifier The statistical significance of the difference in AUCs between the Bayes linear classifier and the score models increased when passing from the FC to the PC approach, indicating that the score model performance considerably worsened when using the set of variables identified as optimal by the Bayes classifier as predictors Furthermore, Table 5 shows the weak points of the Higgins' standard score system applied in our specialized ICU, confirming that any comparison of a locally

Table 4

Partially customized score model

CABG, coronary artery bypass graft; CD, carotid disease; CI, confidence interval; CPB, cardiopulmonary bypass; DO2I, oxygen delivery index; IABP, intra-aortic balloon pump; OR, odds ratio; PVD, peripheral vascular disease; RC, regression coefficient; REDO, re-operation; SE, standard error; WBC, white blood cells.

Trang 10

customized model with a previously published model is unfair,

regardless of the method by which the model was developed

In our data set, model performance dropped sharply when

logistic regression models were changed to scoring systems,

using the procedure suggested by Higgins et al [11] In fact,

when we customized logistic models without transforming

regression coefficients into integer scores, we obtained

dis-crimination performance only slightly worse than that of the

Bayesian model; however, in this case statistical comparison

of ROC areas did not indicate significant differences This fully

agrees with the results obtained in previous studies [21,36]

and suggests that attempts to obtain a very simple clinical

model that reduces computation difficulties could lead to

sig-nificant loss of performance Despite the immediateness and

simplicity of scoring systems derived from weighted variables,

sequential summing of integer factors can distort the

multivar-iate characteristics of outcome prediction The Bayesian

model does not use a weighted scoring system, it uses a

deci-sion rule that enables the probability of morbidity in patients

undergone CABG surgery to be assessed according to

multi-variate statistics of the predictor variables used for

discrimina-tion (12 variables were selected in our model)

Many papers have tested the validity of the preoperative scor-ing system [37-42], but to our knowledge, no study on valida-tion of Higgins' ICU-admission score has been published The present study is the first to locally customize this ICU scoring system and to test its validity using external data In the original version of the ICU-admission morbidity model, Higgins and associates used an additive scoring system comprising 13 weighted predictors that were graded from 1 to 7, giving a maximum total score of 44 points [11] In the FC version, the same method of model development led to a different choice

of 13 weighted predictors Most risk predictors in Higgins' score are the same in other North American and European mortality risk models (such as; the Parsonnet and EUROscore models) [43-45] Similar risk factors were revealed by Higgins and colleagues and in our Bayes model (Table 2): emergency procedure, age, elevated serum creatinine levels, prior heart operation, history of vascular disease, weight, CPB time, use

of IABP after CPB, and low postoperative flow state (low car-diac index and low DO2I) Although a low preoperative ejec-tion fracejec-tion is a known predictor of poor immediate postoperative outcome after cardiac surgery, it was not a risk factor in our study This is in line with the findings of Zaroff and colleagues [46], who showed that in some high-risk cases there could be great improvement in left ventricular function after operation because of successful revascularization Not all patients with a low preoperative ejection fraction required ino-tropic support, and a low ejection fraction was not a risk factor for outcome for the whole population [46] However, we found that morbidity was associated with the need for preoperative and postoperative IABP and use of inotropes after the opera-tion, and these variables are strongly correlated to poor car-diac function

The idea of developing a risk model derived from the Bayes rule is not new In 1985, Edwards and colleagues began to use a Bayesian model of operative mortality associated with CABG procedures [20] The Society of Thoracic Surgeons National Cardiac Surgery Database model, developed by Edwards and colleagues, incorporates 23 risk factors and is the most widely used model in the USA [47] The Society of Cardiothoracic Surgeons of Great Britain and Ireland also pro-posed a Bayesian model for CABG patients in the UK [48,49] However, both these models focused on postoperative mortal-ity, not morbidity In the present study, we developed and tested a Bayesian discrimination model for assessing morbid-ity risk after coronary artery surgery Some practical aspects need to be considered when this discrimination technique is chosen as support for clinical decision-making First of all, this approach requires the use of a computer Moreover, an initial retrospective study for deriving the model might be time con-suming and tedious If detailed records are not available, it might not be possible to obtain the whole set of variables for each patient Finally, many groups have found it necessary to establish physician training programs to ensure that all users

Figure 4

Receiver operating characteristic (ROC) curve for the standard

(uncus-tomized) Higgins score model

Receiver operating characteristic (ROC) curve for the standard

(uncus-tomized) Higgins score model The ROC curve (bold continuous line)

and its 95% confidence interval (fine continuous lines) were derived

from the training set by a maximum-likelihood estimation procedure by

assuming binormal distribution of the data The corresponding area

under the ROC curve was 0.70 (with a 95% confidence interval

rang-ing from 0.65 to 0.75) The cross indicates the point where SE and SP

are equal The dashed line represents the empirical ROC curve

obtained from testing data FPF, false-positive fraction; SE, sensitivity;

SP, specificity; TPF, true-positive fraction.

Định dạng
Số trang	13
Dung lượng	454,11 KB