Notice that patients who have developed the disease wereyounger at baseline and had much higher fasting plasma glucose, systolic anddiastolicblood pressure, and plasma triglyceride than
Trang 1Figure3.13 Curves of log[9log S(t)] for the two hypertension groups.
Table 3.9 Significant Variables (at 0.05 Level) Identified by Proportional Hazards Model
Relative Risk@ Ratio
? Variables are listed in order of entry into model with a p-value limit for entry of 0.05.
@ Favorable categories are 40 years of age, no hypertension, duration of diabetes 5 years, fasting
plasma glucose 130 mg/dL, BMI 35, no proteinuria, and no diuretics use Unfavorable categories are 60 years of age, hypertensive, duration of diabetes 14 years, fasting plasma glucose 200 mg/dL, BMI 25, having proteinuria, and diuretics use.
Trang 2developed the eye disease during the 10 to 16-year follow-up period(averagefollow-up time 12.7 years) Twelve potential factors (assessed at time of baselineexamination) were examined by univariate and multivariate methods for theirrelationship to retinopathy(RET): age, gender, duration of diabetes (DUR),fasting plasma glucose (GLU), initial treatment (TRT), systolic (SBP) anddiastolicblood pressure (DBP), body mass index (BMI), plasma cholesterol(TC), plasma triglyceride (TG), and presence of macrovascular disease (LVD)
or renal disease(RD) Table 3.10 gives the data for the first 40 patients Amongother things, the authors related these variables to the development ofretinopathy
1 Examine the individual relationship of each variable to the development of
diabetic retinopathy Table 3.11 gives some summary statistics of the eight
continuous variables for patients who have developed retinopathy and forthose who have not Notice that patients who have developed the disease wereyounger at baseline and had much higher fasting plasma glucose, systolic anddiastolicblood pressure, and plasma triglyceride than did patients who havenot Table 3.12 summarizes the contingency table analysis of retinopathyincidence rates The number of patients at risk of developing retinopathy andthe number of patients who developed the disease (and rate) are given bysubcategory of each potential risk factor Using the chi-square test, it is foundthat there was a significant difference in the retinopathy rate among thesubcategories of several variables using a significance level of 0.05: duration ofdiabetes, fasting plasma glucose, systolic and diastolic blood pressure, andtreatment It appears that patients with poor glucose control or high bloodpressure or treated with oral agents or insulin have a higher incidence ofretinopathy In addition, patients with high triglyceride levels tend to havehigher incidence of retinopathy (p : 0.064) However, patients who had
developed macrovascular disease at the time of baseline examination had alower retinopathy incidence The authors state that this may be due to the factthat 68% of the patients who had macrovascular disease either died (54%)during the follow-up period or were lost to follow-up (14%) Many of thesepatients may have developed retinopathy, particularly the patients who havedied, but were not included Therefore, the lower incidence of retinopathy inpatients who had macrovascular disease at baseline is probably the result of aselection bias Similarly, the large number of death plus the losses to follow-upmay also contribute to the drop in retinopathy rate in patients who had haddiabetes for more than 12 years at baseline Among the 80 patients in thisduration of diabetes category, 56% have died and 10% did not participate inthe follow-up examination The large number of deaths may also be responsiblefor the finding that patients who survived long enough to develop retinopathywere younger at baseline The deceased patients were significantly older(mean
57 years) than the survivors who participated in the follow-up examination(mean 48 years)
Trang 5Table 3.11 Summary Statistics for Eight Variables by Retinopathy Status at Follow-up
2 Examine the simultaneous relationship of the variables to the development
of retinopathy Univariate analysis of each variable using the contingency table
or the chi-square test gives a preliminary idea of which individual variablemight be of prognostic importance The simultaneous effect of all the variablescan be analyzed by the linear logistic regression model(discussed in Section14.2) to determine the relative importance of each
The 12 variables were fitted to the linear logisticregression model using astepwise selection procedure The variables most significantly related to thedevelopment of retinopathy were found to be initial treatment, fasting plasmaglucose, age, and diastolic blood pressure (p 0.001) Table 3.13 gives the
regression coefficients of the four most significant variables ( p 0.05), thestandard errors, and adjusted odds ratios [exp(coefficient)] The p values usedhere are the significance levels based on the likelihood ratio test or theimprovement in the maximum likelihood due to the addition of the variable inthe stepwise procedure This method is more powerful than the Wald test,which is based on the standardized regression coefficients(Chapter 14) Theresults are consistent with those in the univariate analysis
On the basis of the regression coefficients, the probability of developingretinopathy during a 10 to 16-year follow-up can be estimated by substitutingvalues of the risk factors into the regression equation,
Trang 6Table 3.12 Cumulative Incidence Rates of Retinopathy by Baseline Variables
Developed Retinopathy Number of
Trang 7Table 3.13 Results of Logistic Regression Analysis
Standard Variable Coefficient Error exp(coefficient) Coefficient/S.E.
Bibliographical Remarks
It is impossible to cite all the published examples of survival data analysissimilar to those in this chapter Other similar studies can be found in the
literature: for example, Biometrics, Biometrika, Cancer, Journal of Chronic
Disease, Journal of the National Cancer Institute, American Journal of demiology, Journal of the American Medical Association, and New England Journal of Medicine An easy way to find examples is to use the National
Epi-Library of Medicine’s Web site and search the file PubMed with appropriatekeywords
EXERCISES
The four sets of data below are taken from actual research situations Althoughthe data can be used for various analyses throughout the book, the reader isasked here only to describe in detail how the data can be analyzed The dataappear in examples and other exercises in subsequent chapters
3.1 Thirty-three patients with hypernephroma were treated with combinedchemotherapy, immunotherapy, and hormonal therapy Exercise Table 3.1
Trang 10gives the age, gender, date treatment began, response status, date of death
or last follow-up, survival status, and results of five pretreatment skintests The investigator is interested in the response and survival of thepatients and in identifying prognosticfactors How would you analyze thedata?
3.2 In a study undertaken to compare the treatments given to roma patients and to relate response and survival to surgery, metastasis,and treatment time, data from 58 patients were collected(Exercise Table3.2) How would you analyze the data to answer these questions?(a) Do patients who had nephrectomy have a higher response rate?(b) Is the time of nephrectomy related to response and survival?
hyperneph-(c) Are there significant differences between the treatments?
(d) What are the most important variables related to response andsurvival?
3.3 Exercise Table 3.3 gives the age, gender, family history of melanoma,remission duration, survival time, stage, and results of six pretreatmentskin tests (the larger diameter is given) of 102 stage 3 and 4 melanomapatients(Lee et al., 1982)
(a) Study the immunocompetence of melanoma patients by investigatingskin test results
(b) Determine if age, gender, or pretreatment skin test results are tive to remission and survival time
predic-(c) Find theoretical distributions that describe the survival and remissionpatterns
3.4 One hundred and forty-nine diabeticpatients were followed for 17 years(a subset of data from Lee et al., 1988) Exercise Table 3.4 gives thesurvival time from baseline examination, survival status, and severalpotential prognosticfactors at baseline: age, body mass index(BMI), age
at diagnosis of diabetes, smoking status, systolicblood pressure (SBP),diastolicblood pressure (DBP), electrocardiogram reading (ECG), andwhether the patient had any coronary heart disease(CHD) Identify theimportant prognostic factors that are associated with survival
Trang 24C H A P T E R 4
Nonparametric Methods of
Estimating Survival Functions
In this chapter we discuss methods of estimating the three survivalship, density, and hazard) functions for censored data Unfortunately, thesimple method of Example 2.1 cannot be applied if some of the patients arealive at the time of analysis and therefore their exact survival times areunknown Nonparametric or distribution-free methods are quite easy tounderstand and apply They are less efficient than parametric methods whensurvival times followa theoretical distribution and more efficient when nosuitable theoretical distributions are known Therefore, we suggest usingnonparametric methods to analyze survival data before attempting to fit atheoretical distribution If the main objective is to find a model for the data,estimates obtained by nonparametric methods and graphs can be helpful inchoosing a distribution
(survivor-Of the three survival functions, survivorship or its graphical presentation,the survival curve, is the most widely used Section 4.1 introduces theproduct-limit(PL) method of estimating the survivorship function developed
by Kaplan and Meier(1958) With the increased availability of computers, thismethod is applicable to small, moderate, and large samples However, if thedata have already been grouped into intervals, or the sample size is very large,say in the thousands, or the interest is in a large population, it may be moreconvenient to perform a life-table analysis Section 4.2 is devoted to thediscussion of population and clinical life tables The PL estimates and life-tableestimates of the survivorship function are essentially the same Many authors
use the term life-table estimates for the PL estimates The only difference is that
the PL estimate is based on individual survival times, whereas in the life-tablemethod, survival times are grouped into intervals The PL estimate can beconsidered as a special case of the life-table estimate where each intervalcontains only one observation
64
Trang 25In Section 4.3 we discuss three other measures that describe the survivalexperience: the relative survival rate, the five-year survival rate, and thecorrected survival rate In Section 4.4 we describe two methods, direct andindirect standardization, to adjust rates to eliminate the effect of differences inpopulation composition with respect to age and other variables In addition, itintroduces the standardized mortality rate and standardized incidence rate.
4.1 PRODUCT-LIMIT ESTIMATES OF SURVIVORSHIP FUNCTIONLet us first consider the simple case where all the patients are observed to death
so that the survival times are exact and known Let t, t, , tL be the exact survival times of the n individuals under study Conceptually, we consider this
group of patients as a random sample from a much larger population of similar
patients We relabel the n survival times t, t, , tL in ascending order such that t t·· ·tL Following (2.1.2) and (2.1.3), the survivorship func- tion at tG can be estimated as
S (tG) : n 9 i n : 1 9i
where n 9 i is the number of people in the sample surviving longer than tG If two or more tG are equal (tied observations), the largest i value is used For example, if t: t :t, then
S (t) :S(t) :S(t) : n9 4n
This gives a conservative estimate for the tied observations
Since every person is alive at the beginning of the study and no one survives
longer than tL,
S (t) :1 and S(tL) :0 (4.1.2)
In practice, S (t) is computed at every distinct survival time We do not have to
worry about the intervals between the distinct survival times in which no one
dies and S (t) remains constant Equations (4.1.1) and (4.1.2) showthat S(t) is
a step function starting at 1.0 and decreasing in steps of 1/n(if there are noties) to zero When S (t) is plotted versus t, the various percentiles of survival time can be read from the graph or calculated from S (t) The following example
illustrates the method
Example 4.1 Consider a clinical trial in which 10 lung cancer patients are
followed to death Table 4.1 lists the survival times t in months The function
- 65
Trang 26Table 4.1 Computation of S (t) for 10 Lung Cancer
estimate can be obtained using linear interpolation:
m: 8 92(0.1)
0.3 : 7.3(months)
Theoretically, S (t) should be plotted as a step function since it remains
constant between two observed exact survival times However, when themedian survival time must be estimated from a survival curve, a smooth curve
(such as Figure 4.1b) may give a much better estimate than a step function, as
indicated in the example
This method can be applied only if all the patients are followed to death Ifsome of the patients are still alive at the end of the study, a different method
of estimating S (t), such as the PL estimate given by Kaplan and Meier (1958),
is required The rationale can be illustrated by the following simple example.Suppose that 10 patients join a clinical study at the beginning of 2000;
66
Trang 27Figure 4.1 Function S (t) of lung cancer patients in Example 4.1.
during that year 6 patients die and 4 survive At the end of the year, 20additional patients join the study In 2001, 3 patients who entered in thebeginning of 2000 and 15 patients who entered later die, leaving one and fivesurvivors, respectively Suppose that the study terminates at the end of 2001and you want to estimate the proportion of patients in the population
surviving for two years or more, that is, S(2).
The first group of patients in this example is followed for two years; thesecond group is followed for only one year One possible estimate, the
who are followed only for one year Kaplan and Meier believe that the secondsample, under observation for only one year, can contribute to the estimate of
Patients who survived two years may be considered as surviving the firstyear and then surviving one more year Thus, the probability of surviving fortwo years or more is equal to the probability of surviving the first year andthen surviving one more year That is,
S(2) : P(surviving first year and then surviving one more year)
which can be written as
S(2) : P(surviving two years given patient has survived first year)
The Kaplan—Meier estimate of S(2) following (4.1.3) is
S (2) :proportion of patients surviving two years
given they survive for one year
;(proportion of patients surviving one year) (4.1.4)
- 67
...Estimating Survival Functions
In this chapter we discuss methods of estimating the three survivalship, density, and hazard) functions for censored data Unfortunately, thesimple... class="text_page_counter">Trang 25
In Section 4.3 we discuss three other measures that describe the survivalexperience: the relative survival rate, the five-year survival. .. method of Example 2. 1 cannot be applied if some of the patients arealive at the time of analysis and therefore their exact survival times areunknown Nonparametric or distribution-free methods are quite