The models are developed on data from the Diabetes Control and Complications Trial DCCT and the Epidemiology of Diabetes Interventions and Complications EDIC studies, and are validated o
Trang 1Development and validation of risk assessment models for
diabetes-related complications based on the DCCT/EDIC data
Vincenzo Lagania,⁎ , Franco Chiarugia, Shona Thomsonb, Jo Furssec, Edin Lakasingc,
Russell W Jonesc, Ioannis Tsamardinosa,d
a
Institute of Computer Science, Foundation for Research and Technology—Hellas, Heraklion, Greece
b
Herts Valley Clinical Commission Group, Hertfordshire, United Kingdom
c
Chorleywood Health Center, Chorleywood, United Kingdom
d
Computer Science Department, University of Crete, Heraklion, Greece
a b s t r a c t
a r t i c l e i n f o
Article history:
Received 28 November 2014
Received in revised form 10 February 2015
Accepted 1 March 2015
Available online xxxx
Keywords:
Risk assessment models
Risk stratification
Risk factors
Diabetes complications
Risk model external validation
Aim: To derive and validate a set of computational models able to assess the risk of developing complications and experiencing adverse events for patients with diabetes The models are developed on data from the Diabetes Control and Complications Trial (DCCT) and the Epidemiology of Diabetes Interventions and Complications (EDIC) studies, and are validated on an external, retrospectively collected cohort
Methods: We selectedfifty-one clinical parameters measured at baseline during the DCCT as potential risk factors for the following adverse outcomes: Cardiovascular Diseases (CVD), Hypoglycemia, Ketoacidosis, Microalbuminuria, Proteinuria, Neuropathy and Retinopathy For each outcome we applied a data-mining analysis protocol in order to identify the best-performing signature, i.e., the smallest set of clinical parameters that, considered jointly, are maximally predictive for the selected outcome The predictive models built on the selected signatures underwent both an interval validation on the DCCT/EDIC data and an external validation on
a retrospective cohort of 393 diabetes patients (49 Type I and 344 Type II) from the Chorleywood Medical Center, UK
Results: The selected predictive signatures containfive to fifteen risk factors, depending on the specific outcome Internal validation performances, as measured by the Concordance Index (CI), range from 0.62 to 0.83, indicating good predictive power The models achieved comparable performances for the Type I and, quite surprisingly, Type II external cohort
Conclusions: Data-mining analyses of the DCCT/EDIC data allow the identification of accurate predictive models for diabetes-related complications We also present initial evidences that these models can be applied
on a more recent, European population
© 2015 Published by Elsevier Inc
1 Introduction
Computational models for assessing the risk of diabetes-related
complications are becoming more and more prevalent in diabetes
clinical research (Palmer, 2013) Risk assessment models can be
defined as mathematical tools that evaluate the risk of experiencing
an adverse outcome on the basis of patient’s clinical profile These
models are employed in clinical practice for assisting the clinicians in
stratifying patients according to the gravity of their conditions and the
possible evolution of their clinical trajectories Moreover, devising risk
assessment models usually leads to the identification of novel risk
factors associated with a given complications In turn, this knowledge potentially grants a better understanding of diabetes pathophysiology (Ajmera, Swat, Laibe, Le, & Chelliah, 2013)
We analyzed the information collected during the Diabetes and Complication Control Trial (DCCT) (The Diabetes Control and Complications Trial Research Group, 1993) and the Epidemiology of Diabetes Interventions and Complications study (EDIC) (Nathan et al.,
2005) for deriving risk assessment models for seven different diabetes-related complications and adverse events: Cardiovascular Diseases (CVD), Hypoglycemia, Ketoacidosis, Microalbuminuria, Pro-teinuria, Neuropathy and Retinopathy Particularly, for each compli-cation we tried to identify the minimal set of clinical parameters that, considered jointly, are maximally predictive Identifying such minimal sets of risk factors leads to models easier to interpret, possibly providing intuitions into the mechanisms originating the disease, while discarded factors are either irrelevant or redundant given
Journal of Diabetes and Its Complications xxx (2015) xxx–xxx
Conflicts of interest: The authors declare that there are no conflicts of interest.
⁎ Corresponding author N Plastira 100, Vassilika Vouton, GR-700 13 Heraklion, Crete,
Greece Tel.: +30 2810 391070; fax: +30 2810 391428.
E-mail address: vlagani@ics.forth.gr (V Lagani).
http://dx.doi.org/10.1016/j.jdiacomp.2015.03.001
1056-8727/© 2015 Published by Elsevier Inc.
Contents lists available atScienceDirect Journal of Diabetes and Its Complications
j o u r n a l h o m e p a g e :W W W J D C J O U R N A L C O M
Trang 2the selected ones Borrowing a notation commonly used in
ge-nomic research (Subramanian & Simon, 2010), hereafter we will
refer to such parsimonious, predictive sets of risk factors as
pre-dictive signatures
During our analyses we employed a complex machine-learning
protocol (Lagani & Tsamardinos, 2010) in order to simultaneously (a)
identify the predictive signatures, (b) derive the best models over the
selected signatures and (c) unbiasedly assess the performances of the
models on the DCCT/EDIC data (internal validation) Moreover, we
retrospectively collected data from 393 Type I (49) and Type II (344)
diabetes patients, followed in the Chorleywood Medical Center (CHC),
United Kingdom (UK), in the period 2004–2014 The models were
evaluated on this external cohort, in order to assess their
transfer-ability on a population with different characteristics with respect to
the one followed in the DCCT/EDIC study
The results of the validation indicate that models trained on a USA/
Canada cohort of diabetes patients enrolled in the 80’s can actually
transfer on a cohort of contemporary European patients
Transfer-ability increases when the models are re-calibrated on the new data
by conserving the original predictive signature This suggests that while
the effect size of each risk factor may change over time and across
different geographical area, factors that were highly predictive in the
80’s can still help clinicians in correctly stratifying diabetes patients
according to their risk
2 Research design and methods
2.1 DCCT/EDIC data
The DCCT design has been described elsewhere (The Diabetes
Control and Complications Trial Research Group, 1993) Briefly, 1441
Type I diabetes patients (13 to 39 years of age) were enrolled in the
study from 1983 to 1989 and followed, on average, for 6.5 years The
study was designed as a randomized control trial, with patients
randomly assigned to conventional or intensive insulin therapy Two
distinct cohort were enrolled: the primary intervention cohort was
composed of patients with albumin concentration≤ 40 mg/24 h, no
retinopathy and having diabetes for 1 to 5 years, while the secondary
intervention cohort comprises subjects with a longer history of
diabetes (1 to 15 years), mild to moderated non-proliferative diabetic
retinopathy, and albumin excretion rate≤ 200 mg/24 h An
exhaus-tive clinical examination was performed at baseline (including
medical history, physical examination, electrocardiogram, and
labo-ratory analyses), while patients’ conditions and risk factors were
re-assessed annually (with glycosylated hemoglobin measured
quarterly (The DCCT Research Group, 1987))
In 1994, 1394 subjects out of the original 1441 DCCT patients (97%)
accepted to participate in a long term follow-up, the EDIC study, whose
main objective was to collect prospective data on the evolution of
macrovascular and microvascular complications (Epidemiology of
Diabetes Interventions and Complications (EDIC) Research Group,
1999) The EDIC followed the same methods of DCCT, with only minor
modifications in the schedule of the measurements of glycosylated
hemoglobin (measured annually), fasting lipid levels and renal function
(re-assessed every two years)
For our analyses we selected fifty-one clinical parameters
measured at DCCT baseline (see Table 1 in the Supplementary
Material) These clinical parameters were selected by a panel of
clinical practitioners as the ones commonly used to date in the
treatment of diabetes Remaining parameters were either measured
solely during the DCCT for research purposes or are not employed in
the clinical practice anymore This selection was performed in order
to enhance the conformity of our results with the medical procedures
followed in modern clinical settings
2.2 Outcomes definition
We have defined seven different outcomes, each one corresponding
to a severe diabetes-related complication or adverse event Several studies (Nathan et al., 2005; The Diabetes Control and Complications Trial Research Group, 1993, 1995a, 1995b, 1995c, 1995d, 1997) have defined and studied similar diabetes-related complications on the DCCT/EDIC data Whenever possible, we have adopted the same definitions suggested by these previous works
2.2.1 Cardiovascular disease (CVD) Following the work presented in (Nathan et al., 2005), we define CVD as thefirst occurrence of any of the following events: Cardiovas-cular death, Acute Myocardial Infarction, Bypass graft/Angioplasty, Angina Pectoris, Cardiac Arrhythmia, Major ECG abnormality, Silent Myocardial Infarction, Congestive Heart Failure, Transient Ischemic Attack, Arterial Event requiring surgery
The relatively young age of the subjects included in the DCCT study led to a particularly low incidence of CVD events: only twenty-eight subjects (1.94%) experienced any macro or microvascular complica-tions One of the main objectives of the EDIC study was to record and study the incidence of CVD complications in the DCCT cohort after the end of the DCCT follow-up We decided to define two distinct outcomes for cardiovascular diseases: thefirst one, hereafter named CVD-DCCT, takes in consideration the DCCT follow-up and includes only the CVD events that occurred during the DCCT study; the second outcome, namely CVD-EDIC, considers the combined follow-up period
of both DCCT and EDIC and includes the CVD events that occurred in both studies
2.2.2 Hypoglycemia and ketoacidosis The Hypoglycemia and Ketoacidosis outcomes were defined as any serious hypoglycemic and ketoacidosis event, respectively, requiring hospitalization, as reported by the patients in each quarterly visit 2.2.3 Microalbuminuria and proteinuria
Microalbuminuria was defined as albumin/creatinine ratio (ACR) greater than or equal to 2.5 mg/mmol (men) or 3.5 mg/mmol (women) (The National Collaborating Centre for Chronic Conditions, 2008), or albumin concentration greater than or equal to 20 mg/l, while Proteinuria was identified by an albumin/creatinine ratio greater than
or equal to 30 mg/mmol or albumin concentration greater than or equal
to 200 mg/l
2.2.4 Neuropathy The Neuropathy outcome was defined as the presence of abnormalities in the autonomic function During the DCCT Neurop-athy was diagnosed on the basis of“physical examination and history confirmed by unequivocal abnormality of either nerve conduction or autonomic nervous system” (The Diabetes Control and Complications Trial Research Group, 1995d) In the CHC validation cohort we used an alternative definition based on the presence of dysfunctions in bowel/ bladder or erectile dysfunction
2.2.5 Retinopathy The presence and severity of retinopathy were assessed in the DCCT study according to a scale derived from the Early Treatment Diabetic Retinopathy Study Scale (ETDRS) (see Tables 1–2 in The Diabetes Control and Complications Trial Research Group, 1995e) Currently, the UK Retinopathy Severity (UKRS) scale (The Royal College of Ophthalmologists, 2012) is usually employed in clinical practice in UK We translated the DCCT–ETDRS measurements in UKRS values, according to the conversion schema reported in Table 1.1 of the Diabetic Retinopathy Guidelines (The Royal College of Ophthalmologists, 2012) (see also Table 3 in Supplementary Mate-rial) After the conversion, we adopted an approach similar to (The
2 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx
Trang 3Diabetes Control and Complications Trial Research Group, 1995e) and
we defined a “retinopathy event” as any worsening in the retina
condition that lasted at least six months
2.3 Derivation of the computational models and internal validation
The goals of our analyses are (a) identifying the best predictive
signature for each outcome, (b)fitting a computational risk-assessment
model over each signature and (c) assessing the predictive
perfor-mances of these models The presence of censoring in the DCCT/EDIC
data requires the adoption of specialized methods for achieving these
goals.“Censoring” in these context means that the information about
the outcome can be partial; particularly, the data used in this work are
affected by right-censoring, i.e., for some subjects the exact
time-to-event is not known, and the only available information is that they
were event-free up to a given point (follow-up time)
More formally, the baseline visit of the DCCT data can be
represented as a dataset D containing m = 1441 diabetes patients,
where each patient is represented as a vector of measurementsxi
defined over a set of n = 51 risk factors X = {X1,…, Xj,…, Xn} Each
outcome K is represented by a tuple Ok= {(ti,δi)}, whereδiis a binary
variable indicating that subject i experienced the specific event (δi= 1)
or not (δi= 0), while tiis the recorded time-to-event or follow-up time
The best signature and predictive model for each outcome are indicated
asXk⁎ ⊆ X and Mk, respectively
Survival Max–Min Parent Children (SMMPC, (Lagani & Tsamardinos,
2010)), Lasso Cox Regression (Tibshirani, 1997), Bayesian Variable
Selection (BVS, (Faraggi & Simon, 1998)), and Forward and Univariate
Selection (Bøvelstad et al., 2007) were employed as feature selection
methods for identifying the best performing signatures These feature
selection methods are based on different theoretical foundations and
assumptions; however, they all attempt to identify a setX* ⊆ X that is
highly predictive with respect to the outcome Notably, while all
methods try to keepX* parsimonious, only SMMPC provides theoretical
guaranties about retrieving a minimal-sizeX* (Tsamardinos, Brown, &
Aliferis, 2006)
Once a signatureX* is identified, predictive models can be fitted over
it Cox regression (Cox, 1972), Ridge Cox regression (Van Houwelingen,
Bruinsma, Hart, Van’t Veer, & Wessels, 2006), Accelerated Failure Time
(AFT) models (Kalbfleisch & Prentice, 1980), Random Survival Forest
(RSF (Ishwaran, Kogalur, Eugene, & Blackstone, 2008)) and Support
Vector Machine Censored Regression (SVCR, (Shivaswamy, Chu, &
Jansche, 2007)) were employed as regression algorithms for model
fitting All regression methods provide models that are able to calculate
a single-point risk estimate for any new subject xm + 1, under the form
rm + 1, k= Mk⁎(xm + 1) These estimates can then be used for ranking
patients according to their relative risk Particularly, for (Ridge) Cox
Regression and AFT models the risk estimates are given by ri=∑βjxij,
whereβ is the coefficient provided by the regression procedure SVCR
and RSF predictions are given by weighted combinations of
kernel-function products and single survival-tree predictions, respectively
Each of these feature selection and regression algorithms requires
the user to provide one or more“hyper-parameters”, i.e., parameters
that are not directly estimated from the data and that must be specified a
priori For example, the hyper-parameterλ in the Lasso Cox Regression
regulates the level of shrinkage for the coefficients and, indirectly, the
number of variables to be included in the regression model SVCR
models require the specification of an appropriate kernel function and
cost-parameter C The hyper-parameters used for each method are
listed in the Supplementary Material
We employed a complex experimentation protocol in order to (a)
find for each outcome the best combination of feature selection and
regression algorithms, along with their respective optimal
hyper-parameters (model selection) and (b) provide an unbiased assessment
of the predictive performance of the selected model (internal
validation/performance estimation) Model selection was performed
through cross validation In cross validation, the data are partitioned
in N separate folds, and each fold is in turn held out for performance estimation purpose (test set) while the rest of the data (training set) is employed for deriving predictive models When N is equal to the number of samples, the procedure is named leave-one-out The configuration that obtains the best average performance over the N folds is then applied on the whole set of data, in order to obtain the final predictive signature X* and the corresponding model M* The predictive performances of the final models were assessed through nested-cross validation (Statnikov, Aliferis, Tsamardinos, Hardin, & Levy, 2005) Nested-cross validation is an extension of the common cross validation procedure, where an inner loop of cross validation is performed within each training set The inner loop serves for selecting the best combination of algorithms and hyper-parameters, while the N test sets of the outer cross validation are used exclusively for performance estimation The procedure provides
a vectorP = {P1,…, PN} of estimated performances, whose average valueP is typically taken as single-point estimate Notably, nested-cross validation estimates are usually conservative (Tsamardinos, Lagani, & Rakhshani, 2014) Figs 1 and 2 in the Supplementary Material provide a visual representation of both procedures All performances are measured in terms of Concordance Index (CI (Uno, Cai, Pencina, D’Agostino, & Wei, 2011)) The CI metric is specific for right censored survival data, and it can be interpreted as the probability that the model will correctly rank two randomly selected subjects in accordance to their actual risk of experiencing a given event Similarly to the Area Under the Receiver Operator Curve metric for binary classification problems (AUC (Fawcett, 2006)), a value of CI equals to one indicates a perfect rank in terms of relative risk, while a value of 0.5 indicates a random ordering
In both nested and standard cross validation the variables of each training set are standardized to have zero-mean and unitary standard deviation Test sets are standardized according to the mean and standard deviation values of the corresponding training set More-over, categorical variables are transformed in sets of binary variables, one binary variable for each category In this way the feature selection methods are free to include in each model only the categories that are relevant for the outcome at hand
2.4 External validation Validation data were retrospectively collected from 393 diabetes patients who were admitted at the CHC premises between 2004 and
2014 Forty-nine patients (12.5%) had Type I diabetes, while the remaining ones were diagnosed with Type II diabetes For each patient and for each outcome we considered thefirst visit where the risk factors included in the corresponding predictive signature were measured Patients that already developed a specific complication at the time of thefirst visit were not employed for the validation of the respective predictive model Missing values were replaced with the average or mode values of the respective predictors, as calculated on the DCCT baseline data The data collection procedure produced seven distinct datasets, one for outcome, with a number of included subjects ranging between 274 and 343 and with an average follow-up between 37.6 and 69.4 months Table 2 in the supplementary material describes the distribution of the validation data and compares it with the DCCT cohort
3 Results 3.1 The predictive signatures and their interplay Thefinal risk assessment models are reported inTable 1 Each model is composed of a number of risk factors ranging fromfive to ten, for a total of twenty-five risk factors included in at least one model For each outcome a different regression algorithm was chosen by the
3
V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx
Trang 4model selection procedure: Ridge Cox Regression for CVD-DCCT,
Ketoacidosis and Proteinuria outcomes, Accelerated Failure Time
models for CVD-EDIC, Neuropathy and Retinopathy, linear-kernel
Support Vector Machines and Random Survival Forest for
Hypogly-cemia and Microalbuminuria, respectively The corresponding
opti-mal feature selection methods are reported in Supplementary Table 4
Each regression algorithm produces coefficients with a specific
interpretation; particularly, Ridge Cox Regression coefficients
repre-sent a hazard ratio change in the logarithmic scale This means that for
a standard-deviation unit increase (i.e., 1.594%, DCCT scale) in glycated hemoglobin (HbA1c) the hazard of a CVD complication becomes e0.204= 1.23 times higher AFT and linear-kernel SVCR coefficients act as linear multipliers for the expected time to event This means that for the same increase in HbA1c the expected time before developing Neuropathy decreases by 4.812 months RSF usually provides highly non-linear models, where the effect of each
Table 1
Risk assessment models.
Clinical parameters CVD-DCCT (Cox
Regression)
CVD-EDIC (Accelerated Failure Model)
Hypoglycemia (Support Vector Machine)
Ketoacidosis (Ridge Cox Regression)
Microalbuminuria (Random Survival Forest)
Proteinuria (Ridge Cox Regression)
Neuropathy (Accelerated Failure Model)
Retinopathy (Accelerated Failure Model)
# Models
Marital Status −0.146
(Never Married) 0.095 (Divorced)
−80.667 (Married)
5.24E-006 (Widowed)
|0.043|
(Married)
−0.26 (Married)
5
Albumin-urine
value (mg/24 h)
Insulin Regime
(Strict/Standard
control)
380.5 (Strict) |0.054| (Strict) −0.036
(Strict)
3
Retinopathy level
(R0, R1, R2, R3)
|0.091| (R2) 16.113 (R2) 1.437 (R0)
−0.931 (R2)
3
Total Insulin Daily
Dosage (Units/
Weight)
Post Pubescent
diabetes duration
(in months)
Total diabetes
duration (in
months)
Presence of
neuropathy
Patient’s occupation 0.056 (Manager)
0.074 (Clerical) 0.031 (Laborer)
−0.088 (Student)
−2.61E-005 (Manager)
2
Smoke
(never/ex-smoker/current)
−0.114 (Never) 0.128 (Current)
−431.822 (ex-smoker)
2
Patient's body mass
index (kg/m 2
)
0.096 1
Patient attempted
suicide
Creatinine
Clearance (ml/min)
Family history of
IDDM
Family History
of NIDDM
HDL serum
cholesterol (mg/dl)
Systolic Blood
Pressure
Past history of
severe
hypoglycemia
Glomerularfiltration
rate (ml/min)
Gender specific
ideal body weight
Hospitalization(s)
due to ketoacidosis
in past year
Each row represents a risk factor, while each column reports a single model The header shows the outcome of interest for each model along with the regression algorithm selected
by the model-selection procedure (see the Method section) Cells report model coefficients, with empty cells indicating risk factors not included in the corresponding model Categorical risk factors can have multiple coefficients, one for each category included in the model The semantics of the coefficients depends on the used regression algorithm: log-hazard ratio for Ridge Cox Regression, survival time multipliers for Accelerated Failure Time models and (linear kernel) Support Vector Machines, relative variable importance for Random Survival Forest (see text for more details) The original AFT and SVCR coefficients’ signs have been switched in order to have positive values indicating an increase in the risk in all models Micro-albuminuria coefficients are reported as absolute values whose signs do not reflect an increase or decrease of the risk.
4 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx
Trang 5single risk factor can vary depending on the values of the other
predictors Consequently, covariates in an RSF model do not have a
univocal coefficient, i.e., it is not generally possible to assess if the factor
has a protective or deleterious effect However, a method has been
developed for estimating Variable IMPortance (VIMP) in the RSF
models, where the VIMP is proportional to the contribution of the
variable in the predictive performance of the model (Ishwaran, 2007)
The VIMP values for the Microalbuminuria model inTable 1have been
scaled in order to sum up to one, for ease of comparison
Given these different interpretations, it is not possible to compare
effect-sizes across different models However, within each model the
absolute value of each coefficient is directly proportional to the effect
size of the corresponding predictors, and can be used for raking
factors among each other We further set the signs of all coefficients
such that positive values indicate an increment in risk while negative
values indicate a decrease (except for the VIMP values of the RSF
model that are reported in absolute value)
3.2 Internal and external validation
Table 2 reports and contrasts the results of both internal and
external validation For the internal validation, we report the average CI
values obtained on the DCCT data through the nested-cross validation
procedure These values represent our expectations on the
perfor-mances that the models should achieve when applied on a validation
cohort coming from the same population of the training data, i.e., a
hypothetical validation cohort collected in the same years, in the same
geographical area and with similar characteristics of the DCCT data
(Tsamardinos et al., 2014) Models’ results lay in the range [0.6024–
0.8333], meaning that we expect all models to provide a relevant
improvement with respect to a random ranking the patients (CI = 0.5)
For all models, the CI is statistically significantly greater than 0.5
(p-value≤ 0.001, as calculated with a one-tail t-test) For each model,
we also report the interval spanned by the CI values calculated over the
external folds of the nested-cross validation procedure
For the external validation, the final models were separately
applied on the Type I and Type II diabetes patients of the Chorleywood
cohort The resulting CI values estimate the predictive ability of the
models on a UK-based population collected in recent times
Interestingly, the models perform surprisingly well, reaching
perfor-mances statistically significantly different from random guessing for
several models For each model we also report the bootstrapped
estimates (Efron & Tibshirani, 1986) of the 95% confidence interval
and a permutation-based p-value assessing the null hypothesis H0:
CI≤ 0.5 These permutation p-values are obtained by comparing the
observed CI value with the null-distribution obtained by randomly permuting 10,000 times the order of the predictions
For Type I diabetes, several models manage to achieve a relevant and statistically significant predictive performance, particularly the Micro-albuminuria, Neuropathy and Retinopathy models The CVD-DCCT and Hypoglycemia are also borderline significant The validation cohorts for the remaining models contain less than 5 events, and the respective results should be considered carefully
The external validation on Type II patients brought positive results
as well Particularly, both CVD models, as well as the Microalbumi-nuria and ProteiMicroalbumi-nuria models achieve statistically significant results
on a relatively large number of events The results of the Hypogly-cemia model are barely significant, but it is interesting to note that this model achieves almost identical results in both Type I and II external cohorts The Neuropathy and Retinopathy models did not prove to be better than random, and the Ketoacidosis model was not applicable on Type II diabetes patients
3.3 Calibration and re-assessment of the risk models Risk factors’ effect on the probability of developing diabetes-related complications may differ across geographical areas or over time, due to several reasons For example, the association between a given risk factor and the outcome may be (partially) mediated by a third, unknown and unmeasured quantity If the value of this third quantity changes across different places, or over time, then also the association between the risk factor and the outcome changes or even ceases It is worthwhile to underline that the DCCT and Chorleywood cohorts were collected in different countries, and the DCCT data collection started in 1983, while the earliest recorded visit in Chorley-wood was performed in 2004 (N20 years difference) Moreover, treatment options for diabetes patients (Franz et al., 2003; Gallen,
2004) and nutritional habits (Kuklina, Carrol, Shaw, & Hirsch, 2013) have provably undergone considerable changes during this period This implies that the models derived from the DCCT data may need
to be re-calibrated or revised in order to provide accurate predictions
on the Chorleywood cohorts, since the effects of the risk factors may differ between the two populations
We follow the approach suggested byVan Houwelingen (2000)for assessing the calibration of the single-point risk estimates r against a known outcome O = {(δi, ti)} The approach consists infitting a Cox regression model h(t|r) = h0(t)exp(α ⋅ r), where h(t|r) is the hazard
at time t given r, h0is the baseline hazard function, andα is the single coefficient of the model A perfectly calibrated model would produce
Table 2
Results of the internal and external validation of the models.
Type I Diabetes
Internal Validation
Type I Diabetes External Validation
Type II Diabetes External Validation Model name # Events Average
CI Cross-Validation
CI Interval
p-value H0: Aver.
CI ≤ 0.5
# Events CI CI 95%
Confidence Interval
p-value H0: CI
≤ 0.5
# Events CI CI 95%
Confidence Interval
p-value H0: CI
≤ 0.5 CVD-DCCT 28 0.7257 [0.50962–0.8629] 0.0001 5 0.6887 [0.4923–0.86207] 0.0932 32 0.7143 [0.62384–0.80563] b0.0001 CVD-EDIC 127 0.6204 [0.5549–0.69224] ≤0.0001 5 0.4862 [0.18084–0.81984] 0.5246 33 0.6099 [0.50211–0.71809] 0.0165 Hypoglycemia 408 0.6694 [0.58766–0.75118] ≤0.0001 8 0.6903 [0.5–0.8691] 0.0584 5 0.7002 [0.19012–0.97115] 0.0084 Ketoacidosis 130 0.6745 [0.59412–0.75479] ≤0.0001 3 0.8182 [0.23077–1] 0.0367 – – – Microalbuminuria 299 0.7421 [0.6751–0.77652] ≤0.0001 6 0.824 [0.66234–0.96875] 0.0078 116 0.5701 [0.52144–0.62193] 0.0058 Proteinuria 44 0.8330 [0.53521–0.96223] ≤0.0001 0 – – – 28 0.6569 [0.53261–0.77125] 0.0027 Neuropathy 149 0.6661 [0.54626–0.74187] ≤0.0001 6 0.735 [0.55102–0.90754] 0.0429 20 0.4359 [0.32132–0.56216] 0.8239 Retinopathy 969 0.6564 [0.60826–0.6745] ≤0.0001 17 0.7201 [0.58669–0.8745] 0.0025 70 0.5451 [0.47399–0.6189] 0.119 External validation was separately performed on Type I and Type II diabetes patients, while internal validation was performed only on Type I patients (as the DCCT study focused exclusively on Type I diabetes) For the internal validation and for each model (rows) we report the total number of events, the predictive performance expressed as nested-cross validated Concordance Index (CI), the interval spanned by the CI values obtained in the external loop of the nested-cross validation, and a p-value assessing the null-hypothesis that the CI is less or equal than 0.5, i.e., that the risk stratification provided by the model is not better than random For the external validations we report the CI values obtained by applying the final models on the external cohorts, along with the 95% confidence interval estimated through bootstrapping The p-values for the internal evaluation are calculated through one-tail t-test, while for the external evaluation they are obtained through a permutation-based test (see text for more detail).
5
V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx
Trang 6α ≈ 1, while higher or lower values would indicate an
under-estimation or over-under-estimation of the actual risk, respectively
Table 3shows the calibration Cox regression coefficients for each
outcome The most calibrated models seem to be the ones
corre-sponding to CVD-DCCT, Proteinuria and Retinopathy (the latter on the
Type I cohort only), while all the other models seem to provide
predictions that are somewhat overly optimistic or pessimist These
results suggest that the models should be revised and re-evaluated on
the new data in order to provide more accurate predictions We thus
decided to re-fit the coefficients of the models on the external cohorts
and to assess the predictive performances of the revised models
through cross validation Specifically, for each outcome and external
cohort we performed a ten-fold cross-validation by using the same
signature, regression method and hyper-parameter configuration
select-ed on the DCCT/EDIC data For outcomes with fewer than 10 recordselect-ed
events we employed a leave-one-out cross-validation schema, and
the performance was calculated on all predictions pooled together
The adoption of this revision procedure implies that we assume that
the signatures selected on the data from the DCCT baseline visits have
a valuable predictive power also for the Chorleywood cohort
The results of model revision are reported inTable 3 All the models
showed at least a slight improvement in terms of average CI, except for
the CVD-DCCT and Hypoglycemia models in the Type II diabetes cohort
and for Neuropathy in the Type I cohort Some models achieve perfect
score (CI = 1), although the limited number of events available for
these outcomes suggests to consider these results carefully
4 Discussion
4.1 Mainfindings
The main contribution of the present work consists of the
derivation of a set of computational models for assessing the risk of
developing diabetes-related complications The models have been
derived on the basis of the baseline-visit data of the DCCT study and of
the DCCT/EDIC follow-up information Furthermore, the derivation of
the models led to the identification of the minimal-size, maximally
predictive set of features for each considered outcome, out of an initial
set offifty-one clinical parameters measured in the DCCT baseline
visit.Table 3 reports the clinical parameters included in each risk
assessment model, along with their respective coefficients Negative
coefficients indicate protective factors, while factors with positive
coefficients are associated with increasing risk
The level of glycated hemoglobin HbA1c demonstrated to be the
most relevant risk factor, being included in seven models out of eight
Particularly, high values of HbA1c are associated with increased risk of developing diabetes-related complications This is perfectly in line with the current literature (Huang, Liu, Moffet, John, & Karter, 2011; Marcovecchio, Dalton, Chiarelli, & Dunger, 2011; Weber & Schnell,
2009) and in particular with the previous studies on the DCCT cohort (The Diabetes Control and Complications Trial Research Group, 1996) Our analyses also point out the relevance of the marital status for predicting the probability of developing diabetes-related complica-tions and adverse events Being married is associated with a lower risk
of experiencing hypoglycemia or retinopathy worsening The pres-ence of a spouse is known to have a beneficial effect in different pathologies (Chung, Moser, Lennie, & Riegel, 2006; Goodwin, Hunt, Key, & Samet, 1987; Sugarman, Bauer, Barber, Hayes, & Hughes, 1993), and a recent work has demonstrated that, in heart failure patients, this beneficial effect is mediated by the medication adherence (Wu et al.,
2014) Thus, a possible explanation for our results is that being married increases the adherence to medication or diet, and this in turn improves the patient’s prognosis For the CVD and Ketoacidosis models being respectively divorced or widowed increases the risk of experiencing an adverse event In this case the marital status may act
as a proxy for the patient’s ages, since both divorced and widowed DCCT sub-cohorts are characterized by an older age than the rest The baseline value of the urine-albumin excretion rate turns out to be predictive of renal complications (i.e., Microalbuminuria and Proteinuria),
a result already known in the medical literature (Newman et al., 2005), and for the development of cardiovascular diseases and Neuropathy The CVD-DCCT and CVD-EDIC models are in agreement with the CVD risk factors previously identified on the DCCT/EDIC data; particularly, all elements in the signature of the DCCT-EDIC model are listed among the clinical characteristics at DCCT baseline that were significantly associated with cardiovascular disease over the course of the DCCT/EDIC Study (Nathan et al., 2005)
The predictive signatures of both CVD-DCCT and CVD-EDIC models closely resemble the results of different studies focusing on identifying relevant risk factor for cardiovascular complications in diabetes patients Particularly, our results are in good agreement with the results of the UK Prospective Diabetes Study (UKPDS)
The UKPDS was a landmark randomized controlled trial,
conduct-ed over a period of 14 years (1977–1991) and involved 5102 patients followed, on average, for a period of 10.7 years The study actually showed that strict control of blood glucose and blood pressure can lower the risk of diabetes-related complications in individuals recently diagnosed with Type II diabetes (Turner & Holman, 1996) Several risk assessment models were developed on the basis of the UKPDS data Thefirst UKPDS model (Stevens, Kothari, Adler, & Stratton,
Table 3
Results of models’ recalibration and re-assessment.
Type I Diabetes Revised models
Type II Diabetes Revised Models Model name # Events Calibration α Average
CI
Cross-Validation
CI Interval
p-value H0: Aver.
CI ≤ 0.5
# Events Calibration α Average
CI
Cross-Validation
CI Interval
p-value H0: Aver.
CI ≤ 0.5 CVD–DCCT 5 0.4989 1 – b0.0001 32 1.2238 0.6757 [0.54386–0.92683] 0.003 CVD–EDIC 5 −0.0011 0.6422 – 0.2712 33 0.0013 0.6621 [0.54348–0.83871] 0.0002
Microalbuminuria 6 0.0975 1 – b0.0001 116 0.0368 0.5932 [0.41146–0.67516] 0.0023
Neuropathy 6 0.0096 0.6496 – 0.2436 20 –0.0036 0.5285 [0.076923–0.87234] 0.3833 Retinopathy 17 0.6625 0.7521 [0.5–1] 0.0039 70 0.0822 0.5664 [0.42063–0.76238] 0.0381 For each outcome and external cohort, the calibration of the corresponding model is assessed (a) by applying the model on the external cohort and (b) by using the resulting vector
of risk scores r i as a predictor in a Cox regression Cox coefficients close to one indicate well calibrated models The predictive capabilities of the selected signatures are then re-assessed using only the external cohort data Specifically, for each outcome and external cohort the predictive performance of the selected signature, regression method and hyper-parameter configuration is assessed through ten-fold cross-validation For each model and each cohort the number of events, the calibration Cox regression coefficient α, the cross-validated CI value along with its corresponding interval over the cross-validation folds are reported The statistical significance of the CI values is assessed through a one-tail t-test Outcomes with fewer than 10 recorded events were evaluated with a leave-one-out cross validation schema, which allows better performance estimation.
6 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx
Trang 72001) included Age, Gender, Race, Smoking, HbA1c, Systolic Blood
Pressure and Total Cholesterol/HDL Cholesterol ratio as predictors, and
focused on assessing the probability of developing Coronary Hearth
Diseases (CHD) The second version of the model (Clarke et al., 2004)
provides seven different mathematical equations for predicting as many
diabetes-related complications (stroke, heart failure, fatal or non-fatal
MI, other IHD, amputation, renal failure and blindness) and three
different equations for assessing the risk of mortality This second model
is based on the same predictors of thefirst one, but it also includes
information about the patients’ medical history (previous occurrences
of diabetes-related adverse events) and physiology (Body Mass Index,
BMI) The latest version of the UKPDS model was published recently
(Hayes, Leal, Gray, Holman, & Clarke, 2013), and it slightly modifies
the previous versions by including information about micro or
macro-albuminuria, estimated GFR, heart rate, white blood cell count
and hemoglobin
Interestingly, both our CVD models include a subset of UKPDS
predictors, namely Age, Smoking and HbA1c The CVD-DCCT model
also includes Systolic Blood Pressure and Weight, both considered in
the latest version of the UKPDS engine Moreover, our CVD models
and the UKPDS models are fully in agreement regarding the direction
of effect of the common predictors, i.e., all common predictors act as
risk factors, and never as protective factors
The Hypoglycemia model suggests that being married and having
a family history of non-insulin dependent diabetes have a protective
effect against hypoglycemic events, while a past history of severe
hypoglycemia, strict glucose control and an elevated number of
insulin units per kg of weight significantly increase patient’s risk It is
worthwhile to note that the negative effect of strict glucose control on
the probability of experiencing hypoglycemia was one of the main
outcomes of the DCCT study In particular, strict glucose control is
known to lower the risk of several diabetes-related complications
except hypoglycemia (The Diabetes Control and Complications Trial
Research Group, 1995a)
The Ketoacidosis model includes several factors, the most relevant
ones being (according to the magnitude of their respective coefficients)
HbA1c, Total Insulin Dosage, Post-Pubescent diabetes duration,
Choles-terol, Hospitalization(s) due to ketoacidosis in past year (risk factors)
and Gender specific ideal body weight (protective factor) To the best of
our knowledge this is thefirst study providing a predictive model for
assessing the risk of experiencing ketoacidosis Studies investigating the
association of clinical parameters with ketoacidosis exist (Egger, Davey
Smith, Stettler, & Diem, 1997), however they do not provide quantitative
models for the estimation of the risk of ketoacidosis These studies
generally point out that an intensified treatment is associated with the
probability of experiencing ketoacidosis, which is in agreement with
our results
The two models related to renal complications (Microalbuminuria
and Proteinuria) share several predictive factors, whose relevance in the
development of renal complication in diabetes patients is already
known in the literature and was even assessed on the DCCT data
(Lopes-Virella et al., 2013): HbA1c (The Diabetes Control and
Complications Trial Research Group, 1996), Albumin-urine value over
24 h (Newman et al., 2005), Insulin Regime (The Diabetes Control and
Complications Trial Research Group, 1995c) and Total diabetes duration
A recent study (Vergouwe et al., 2010) conducted on 1115 Type I
diabetes patients also confirms the relevance of HbA1c and
Albumi-n-urine value for predicting the progression of microalbuminuria, while
another study (Elley et al., 2013) conducted on a large New Zealand
cohort (25,736 Type II diabetes patients) and focusing on End-Stage
Renal Diseases (ESRD) also identifies HbA1c and Total diabetes duration
as relevant risk factors
The Neuropathy and Retinopathy models also share part of their
predictors, particularly HbA1c, the Retinopathy level at baseline, and
Post-pubescent diabetes duration, all factors that were found to be
associated with low peripheral nerve conduction (an indicator of
neuropathy) in a study of 456 diabetes Type I individuals (Charles et al., 2010) The association between HbA1c and Retinopathy progres-sion has been already studied and established (The Diabetes Control and Complications Trial Research Group, 1995b)
One further relevant contribution of our study is the validation of the models on the retrospective cohort collected in the Chorleywood Health Center For the Type I diabetes external cohort, four models out of seven achieved statistically significant (p-value b 0.05) results, while two models (CVD-DCCT and Hypoglycemia) achieved appreciable CI performance (0.6887 and 0.6903, respectively), also borderline statis-tically significant In the case of the Type II diabetes cohort, five models out of seven achieved results statistically significantly better than random guessing (CIN 0.5)
Models’ transferability generally increases when the models are re-calibrated on the new data while the original predictive factors are conserved All revised models perform better in terms of CI than the original models, with the exception of Neuropathy for Type I and CVD-DCCT/Hypoglycemia for Type II diabetes cohorts However, we note that for these models the revised CI values are within the 95% confidence interval of the CI results of the original models In general, these results support our hypothesis that the predictive signatures selected on the DCCT/EDIC data are able to give accurate predictions
on the cohorts collected in Chorleywood
4.2 Study limitations Thefirst relevant limitation of this study is the relatively restricted number of subjects and adverse events in the external validation cohorts In some cases the scarcity of recorded events did not allow a precise estimation of the models’ performances and respective confidence intervals Thus, our results only suggest that our models successfully transfer across populations, but more extensive studies on larger cohorts of Type I and Type II diabetes patients are needed in order
to gather further evidences
One more limitation concerns the Hypoglycemia and Ketoacidosis models Accurately evaluating the probability of experiencing these adverse events would require some short-term information about nutrition and physical activity, not present in the list of considered predictors Despite this limitation, both models achieve good level of predictive performances, in both the internal and external validation
5 Conclusions
We use the DCCT/EDIC data for deriving a set of computational models for assessing the risk of developing diabetes-related complica-tions in diabetes patients Each model is defined over a parsimonious set
of predictors (clinical parameters) with maximal predictive power for its specific outcome Predictors included in the models are generally in agreement with the current literature regarding risk factors for diabetes-related complications When applied on a retrospective va-lidation cohort collected in UK, the models often provide predictions that are significantly better than random, supporting the hypothesis that the models transfer on a population that is geographically distant and more recent than the one originally examined in the DCCT/EDIC studies Future works will focus on the validation of the models on larger cohorts of diabetes patients, both Type I and Type II, in order to further strengthen the results hereinto presented
Acknowledgements This work was performed in the framework of the FP7 Integrated Project REACTION (Remote Accessibility to Diabetes Management and Therapy in Operational Healthcare Networks) partially funded by the European Commission under Grant Agreement 248590
The work was also partially funded by the EPILOGEAS GSRT ARISTEIA II project, No 3446
7
V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx
Trang 8The Diabetes Control and Complications Trial (DCCT) and its
follow-up the Epidemiology of Diabetes Interventions and
Complica-tions (EDIC) study were conducted by the DCCT/EDIC Research Group
and supported by National Institute of Health grants and contracts and
by the General Clinical Research Center Program, NCRR The data (and
samples) from the DCCT/EDIC study were supplied by the NIDDK
Central Repositories This manuscript was not prepared under the
auspices of the DCCT/EDIC study and does not represent analyses or
conclusions of the DCCT/EDIC study group, the NIDDK Central
Repositories, or the NIH
The authors would also like to thank the medical and technical
personnel of the Chorleywood Health Center for their
indispens-able assistance
Appendix A Supplementary data
Supplementary data and methods to this article can be found
online athttp://dx.doi.org/10.1016/j.jdiacomp.2015.03.001
References
Ajmera, I., Swat, M., Laibe, C., Le, Novère N., & Chelliah, V (2013) The impact of
mathematical modeling on the understanding of diabetes and related complications.
CPT pharmacometrics Syst Pharmacol2 (pp e54), e54 ([Internet] Available from:
http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3731829&tool=
pmcentrez&rendertype=abstract/nhttp://www.scopus.com/inward/record.url?
eid=2-s2.0-84881162079&partnerID=tZOtx3y1 ).
Bøvelstad, H M., Nygård, S., Størvold, H L., Aldrin, M., Borgan, Ø., Frigessi, A., et al.
(2007) Predicting survival from microarray data—A comparative study.
Bioinformatics, 23, 2080–2087.
Charles, M., Soedamah-Muthu, S S., Tesfaye, S., Fuller, J H., Arezzo, J C., Chaturvedi, N.,
et al (2010) Low peripheral nerve conduction velocities and amplitudes are
strongly related to diabetic microvascular complications in type 1 diabetes: The
EURODIAB Prospective Complications Study Diabetes Care, 33(12), 2648–2653
([Internet] [cited 2014 Nov 11] Available from: http://www.pubmedcentral.nih.
gov/articlerender.fcgi?artid=2992206&tool=pmcentrez&rendertype=abstract ).
Chung, M L., Moser, D K., Lennie, T A., & Riegel, B (2006) Abstract 2509: Spouses
enhance medication adherence in patients with heart failure Circulation, 114(18_
MeetingAbstracts), II_518 ([cited 2014 Nov 9] Available from: http://circ.
ahajournals.org/cgi/content/meeting_abstract/114/18_MeetingAbstracts/II_518 ).
Clarke, P M., Gray, A M., Briggs, A., Farmer, A J., Fenn, P., Stevens, R J., et al (2004) A
model to estimate the lifetime health outcomes of patients with type 2 diabetes:
The United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model
(UKPDS no 68) Diabetologia, 47(10), 1747–1759 ([cited 2014 Nov 11] Available
from: http://www.ncbi.nlm.nih.gov/pubmed/15517152 ).
Cox, D R (1972) Regression models and life-tables Journal of the Royal Statistical
Society, Series B, 34, 187–220.
Efron, B., & Tibshirani, R (1986) Bootstrap methods for standard errors, confidence
intervals, and other measures of statistical accuracy Statistical Science, 1(1), 54–75
(Institute of Mathematical, Statistics; [cited 2014 Oct 21]).
Egger, M., Davey Smith, G., Stettler, C., & Diem, P (1997) Risk of adverse effects of
intensified treatment in insulin-dependent diabetes mellitus: A meta-analysis.
Diabetic Medicine, 14(11), 919–928 (cited 2014 Nov 8] Available from: http://
www.ncbi.nlm.nih.gov/pubmed/9400915 ).
Elley, C R., Robinson, T., Moyes, S A., Kenealy, T., Collins, J., Robinson, E., et al (2013).
Derivation and validation of a renal risk score for people with type 2 diabetes.
Diabetes Care, 36, 3113–3120.
Epidemiology of Diabetes Interventions and Complications (EDIC) Research Group
(1999) Design, implementation, and preliminary results of a long-term follow-up
of the Diabetes Control and Complications Trial cohort Diabetes Care, 22(1),
99–111 ([cited 2014 Sep 13] Available from: http://www.pubmedcentral.nih.gov/
articlerender.fcgi?artid=2745938&tool=pmcentrez&rendertype=abstract ).
Faraggi, D., & Simon, R (1998) Bayesian variable selection method for censored
survival data Biometrics, 54, 1475–1485.
Fawcett, T (2006) An introduction to ROC analysis Pattern Recognition Letters, 27,
861–874.
Franz, M J., Warshaw, H., Daly, A E., Green-Pastors, J., Arnold, M S., & Bantle, J (2003).
Evolution of diabetes medical nutrition therapy Postgraduate Medical Journal,
79(927), 30–35 ([cited 2014 Oct 17] Available from: http://www.pubmedcentral.
nih.gov/articlerender.fcgi?artid=1742592&tool=pmcentrez&rendertype=
abstract ).
Gallen, I (2004) Review: The evolution of insulin treatment in type 1 diabetes: The
advent of analogues The British Journal of Diabetes & Vascular Disease, 4(6),
378–381 (cited 2014 Oct 17).
Goodwin, J S., Hunt, W C., Key, C R., & Samet, J M (1987) The effect of marital status
on stage, treatment, and survival of cancer patients JAMA, 258, 3125–3130.
Hayes, A J., Leal, J., Gray, A M., Holman, R R., & Clarke, P M (2013) UKPDS outcomes
model 2: A new version of a model to simulate lifetime health outcomes of patients
with type 2 diabetes mellitus using data from the 30 year United Kingdom
Prospective Diabetes Study: UKPDS 82 Diabetologia, 56(9), 1925–1933 ([cited
2014 Nov 11] Available from: http://www.ncbi.nlm.nih.gov/pubmed/23793713 ) Huang, E S., Liu, J Y., Moffet, H H., John, P M., & Karter, A J (2011) Glycemic control, complications, and death in older diabetic patients Diabetes Care, 34, 1329–1336,
http://dx.doi.org/10.2337/dc10-2377 (Available from:).
Ishwaran, H (2007) Variable importance in binary regression trees and forests Electronic Journal of Statistics, 1, 519–537 (Institute of Mathematical, Statistics; [cited 2014 Oct 10]).
Ishwaran, H., Kogalur, U B., Blackstone, E H., & Lauer, M S (2008) Random survival forest Annals of Applied Statistics, 2(3), 841–860 (cited 2014 Feb 27).
Kalbfleisch, J D., & Prentice, R L (1980) The statistical analysis of failure time data Internet New York: John Wiley and Sons (Available from: http://proquest.umi.com/pqdweb? did=745641091&Fmt=7&clientId=3748&RQT=309&VName=PQD ).
Kuklina, E V., Carrol, M D., Shaw, K M., & Hirsch, R (2013) Trends in high LDL cholesterol, cholesterol-lowering medication use, and dietary saturated-fat intake: United States, 1976–2010 [Internet] p 7 Available from: http://www.cdc.gov/ nchs/data/databriefs/db117.pdf
Lagani, V., & Tsamardinos, I (2010) Structure-based variable selection for survival data Bioinformatics, 26(15), 1887–1894 (Available from: http://www.ncbi.nlm.nih.gov/ pubmed/20519286 ).
Lopes-Virella, M F., Baker, N L., Hunt, K J., Cleary, P a, Klein, R., & Virella, G (2013) Baseline markers of inflammation are associated with progression to macro-albuminuria in type 1 diabetic subjects Diabetes Care, 36, 2317–2323 (Available from: http://www.ncbi.nlm.nih.gov/pubmed/23514730 ).
Marcovecchio, M L., Dalton, R N., Chiarelli, F., & Dunger, D B (2011) A1C variability as
an independent risk factor for microalbuminuria in young people with type 1 diabetes Diabetes Care, 34, 1011–1013.
Nathan, D M., Cleary, P A., Backlund, J -Y C., Genuth, S M., Lachin, J M., Orchard, T J.,
et al (2005) Intensive diabetes treatment and cardiovascular disease in patients with type 1 diabetes The New England Journal of Medicine, 353, 2643–2653.
Newman, D J., Mattock, M B., Dawnay, A B S., Kerry, S., McGuire, A., Yaqoob, M., et al (2005) Systematic review on urine albumin testing for early detection of diabetic complications Health Technology Assessment, 9(30), iii–vi ([cited 2014 Nov 9], xiii–
163 Available from: http://www.ncbi.nlm.nih.gov/pubmed/16095545 ) Palmer, A J (2013) Computer modeling of diabetes and its complications: A report on the fifth Mount Hood challenge meeting Value Health, 16, 670–685.
Shivaswamy, P K., Chu, W C W., & Jansche, M (2007) A support vector approach to censored targets Seventh IEEE Int Conf Data Min (ICDM 2007).
Statnikov, A., Aliferis, C F., Tsamardinos, I., Hardin, D., & Levy, S (2005) A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis Bioinformatics, 21(5), 631–643 ([cited 2014 Jan 19] Available from: http://www.ncbi.nlm.nih.gov/pubmed/15374862 ) Stevens, R J., Kothari, V., Adler, A I., & Stratton, I M (2001) The UKPDS risk engine: A model for the risk of coronary heart disease in type II diabetes (UKPDS 56) Clinical Science (London), 101, 671–679.
Subramanian, J., & Simon, R (2010) What should physicians look for in evaluating prognostic gene-expression signatures? Nature Reviews Clinical Oncology, 7(6), 327–334, http://dx.doi.org/10.1038/nrclinonc.2010.60 (Nature Publishing Group; [cited 2014 Aug 3] Available from:).
Sugarman, J R., Bauer, M C., Barber, E L., Hayes, J L., & Hughes, J W (1993) Factors associated with failure to complete treatment for diabetic retinopathy among Navajo Indians Diabetes Care, 16(1), 326–328 ([cited 2014 Nov 9] Available from:
http://www.ncbi.nlm.nih.gov/pubmed/8422803 ).
The DCCT Research Group (1987) Feasibility of centralized measurements of glycated hemoglobin in the Diabetes Control and Complications Trial: A multicenter study Clinical Chemistry, 33(12), 2267–2271 ([cited 2014 Sep 13] Available from: http:// www.ncbi.nlm.nih.gov/pubmed/3319291 ).
The Diabetes Control and Complications Trial Research Group (1993) The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus The New England Journal of Medicine, 329(14), 977–986, http://dx.doi.org/10.1056/NEJM199309303291401
([cited 2014 Jul 22] Available from: http://www.ncbi.nlm.nih.gov/pubmed/ 8366922/n ).
The Diabetes Control and Complications Trial Research Group (1995a) Adverse events and their association with treatment regimens in the diabetes control and complications trial Diabetes Care, 18, 1415–1427 (Available from: http://eutils.ncbi.nlm.nih.gov/ entrez/eutils/elink.fcgi?dbfrom=pubmed&id=8722064&retmode=ref&cmd= prlinks/npapers2 ://publication/uuid/BFE8DB4C-0CDB-4977–B262–5947EE56DDDE) The Diabetes Control and Complications Trial Research Group (1995b) The Relationship of Glycemic Exposure (HbAlc) to the Risk of Development and Progression of Retinopathy in the Diabetes Control and Complications Trial Diabetes, 44, 968–983.
The Diabetes Control and Complications Trial Research Group (1995c) Effect of intensive therapy on the development and progression of diabetic nephropathy in the Diabetes Control and Complications Trial Kidney International, 47, 1703–1720.
The Diabetes Control and Complications Trial Research Group (1995d) The effect of intensive diabetes therapy on the development and progression of neuropathy Annals of Internal Medicine, 122(8), 561–568 ([cited 2014 Sep 15] Available from:
http://www.ncbi.nlm.nih.gov/pubmed/7887548 ).
The Diabetes Control and Complications Trial Research Group (1995e) The effect of intensive diabetes treatment on the progression of diabetic retinopathy in insulin-dependent diabetes mellitus Archives of Ophthalmology, 113(1), 36–51 ([cited 2014 Sep 15] Available from: http://www.ncbi.nlm.nih.gov/pubmed/7826293 ) The Diabetes Control and Complications Trial Research Group (1996) The absence of a glycemic threshold for the development of long-term complications: The perspective of the Diabetes Control and Complications Trial Diabetes, 45(10),
8 V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx
Trang 91289–1298 ([cited 2014 Nov 9] Available from: http://www.ncbi.nlm.nih.gov/
pubmed/8826962 ).
The Diabetes Control and Complications Trial Research Group (1997) Clustering of
long-term complications in families with diabetes in the diabetes control and
complications trial Diabetes, 46, 1829–1839.
The National Collaborating Centre for Chronic Conditions (2008) Type 2 Diabetes,
National clinical guideline for management in primary and secondary care (update).
The Royal College of Ophthalmologists (2012) Diabetic Retinopathy Guidelines.
(London).
Tibshirani, R (1997) The lasso method for variable selection in the Cox model Statistics
in Medicine, 16, 385–395.
Tsamardinos, I., Brown, L E., & Aliferis, C F (2006) The max-min hill-climbing Bayesian
network structure learning algorithm Machine Learning, 65(1), 31–78.
Tsamardinos, I., Lagani, V., & Rakhshani, A (2014) Performance-Estimation
Properties of Cross-Validation-Based Protocols with Simultaneous
Hyper-Parameter Optimization SETN’14 Proceedings of the 79 h Hellenic conference on
Artificial Intelligence.
Turner, R C., & Holman, R R (1996) The UK Prospective Diabetes Study UK Prospective
Diabetes Study Group Annals of Medicine, 439–444.
Uno, H., Cai, T., Pencina, M J., D’Agostino, R B., & Wei, L J (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data Statistics in Medicine, 30, 1105–1117.
Van Houwelingen, H C (2000) Validation, calibration, revision and combination of prognostic survival models Statistics in Medicine, 19, 3401–3415.
Van Houwelingen, H C., Bruinsma, T., Hart, A A M., Van’t Veer, L J., & Wessels, L F A (2006) Cross-validated Cox regression on microarray gene expression data Statistics in Medicine, 25, 3201–3216.
Vergouwe, Y., Soedamah-Muthu, S S., Zgibor, J., Chaturvedi, N., Forsblom, C., Snell-Bergeon, J K., et al (2010) Progression to microalbuminuria in type 1 diabetes: Development and validation of a prediction rule Diabetologia, 53, 254–262.
Weber, C., & Schnell, O (2009) The assessment of glycemic variability and its impact on diabetes-related complications: An overview Diabetes Technology & Therapeutics,
11, 623–633.
Wu, J -R., Lennie, T A., Chung, M L., Frazier, S K., Dekker, R L., Biddle, M J., et al (2014) Medication adherence mediates the relationship between marital status and cardiac event-free survival in patients with heart failure Heart & Lung, 41(2), 107–114 (cited 2014 Nov 9] Available from: http://www.pubmedcentral.nih.gov/ articlerender.fcgi?artid=3288268&tool=pmcentrez&rendertype=abstract ).
9
V Lagani et al / Journal of Diabetes and Its Complications xxx (2015) xxx–xxx