Usually the time variable in a survival analysis measures follow-up time from some event. This event may be recruitment into a cohort, diagnosis of cancer, et cetera. In such studies everyone is at risk at time zero, when they enter the cohort. Sometimes, however, we may wish to use the patient’s age as the time variable rather than follow-up time. Both Kaplan–Meier survival curves and hazard regression analyses can be easily adapted to this situation. The key difference is that when age is the time variable, patients
255 7.9. Survival analysis with ragged study entry
are not observed to fail until after they reach the age when they enter the cohort. Hence, it is possible that no one will enter the study at age zero, and that subjects will enter the analysis at different “times” when they reach their age at recruitment. These analyses must be interpreted as the effect of age and other covariates on the risk of failure conditioned on the fact that each patient had not failed prior to her age of recruitment.
7.9.1. Kaplan—Meier Survival Curve and the Logrank Test with Ragged Entry In Section 6.3, we defined the Kaplan–Meier survival curve ˆS(t) to be a product of probabilities pi on each death day prior to timet. Each proba- bility pi =(ni−di)/ni, whereni are the number of patients at risk at the beginning of theithdeath day anddiare the number of deaths on this day.
In a traditional survival curve,ni must decrease with increasingisince the entire cohort is at risk at time 0 and death or censoring can only decrease this number with time. With ragged entry, ˆS(t) is calculated in the same way only now the number of patients at risk can increase as well as decrease;
ni equals the total number of people to be recruited before timetminus the total number of people who die or are censored prior to this time. The cumulative mortality curve is ˆD[t]=1−S[t] as was the case in Section 6.3.ˆ The logrank test is performed in exactly the same way as in Section 6.8.
The only difference is that now the number of patients at risk at the be- ginning of each death day equals the number of patients recruited prior to that day minus the number of patients who have previously died or been censored.
7.9.2. Age, Sex, and CHD in the Framingham Heart Study
Figure 7.4 shows that the distribution of age at entry in the Framingham Heart Study was very wide. This means that at any specific follow-up time in Figure 7.3, we are comparing men and women with a wide variation in ages. Figure 7.6 shows the cumulative CHD mortality in men and women as a function of age rather than years since recruitment. This figure reveals an aspect of CHD epidemiology that is missed in Figure 7.3. The morbidity curves for men and women diverge most rapidly prior to age sixty. Thereafter, they remain relatively parallel. This indicates that the protective effects of female gender on CHD are greatest in the pre- and perimenopausal ages, and that this protective effect is largely lost a decade or more after the menopause. This interaction between age and sexon CHD is not apparent
256 7. Hazard regression analysis
.
40
30 50 60 70 80 90
Age
Men
Women 0.3
0.2 0.1 0 0.4 0.5 0.6 0.7
Cumulative CHD Morbidity
Figure 7.6 Cumulative coronary heart disease (CHD)morbidity with increasing age among men and women from the Framingham Heart Study (Levy et al., 1999).
in the Kaplan–Meier curves in Figure 7.3 that were plotted as a function of time since recruitment.
7.9.3. Proportional Hazards Regression Analysis with Ragged Entry
Proportional hazards regression analysis also focuses on the number of patients at risk and the number of deaths on each death day. For this reason, they are easily adapted for analyses of data with ragged study entry. A simple example of such a proportional hazards model is
λi[age]=λ0[age] exp[β×malei], (7.16)
whereageis a specific age for theithsubject,λ0
age
is the CHD hazard for women at this age,maleiequals 1 if theithsubject is a man and equals 0 if she is a woman, andλi
age
is the CHD hazard for theithstudy subject at the indicated age. Model (7.16) differs from model (7.6) only in that in model (7.6) t represents time since entry while in model (7.16) age represents the subject’s age. Under model (7.16), a man’s hazard isλ0
age
exp [β].
Hence, the age-adjusted relative risk of CHD in men compared to women is exp [β]. Applying model (7.16) to the Framingham Heart Study data gives this relative risk of CHD for men to be 2.01 with a 95% confidence interval of (1.8–2.2). Note, however, that model (7.16) assumes that the relative risk of CHD between men and women remains constant with age. This assumption is rather unsatisfactory in view of the evidence from Figure 7.6 that this relative risk diminishes after age 60.
257 7.9. Survival analysis with ragged study entry
7.9.4. Survival Analysis with Ragged Entry using Stata
The following log file and comments illustrate how to perform the analyses discussed above using Stata.
. * 7.9.4.Framingham.log . *
. * Plot Kaplan-Meier cumulative CHD morbidity curves as a function of age.
. * Patients from the Framingham Heart Study enter the analysis when they . * reach the age of their baseline exam.
. *
. set memory 2000 (2000k)
. use C:\WDDtext\2.20.Framingham.dta, clear . set textsize 120
. graph age, bin(39) xlabel(30,35 to 65) ylabel(0,.01 to .04) gap(3)
{Graph omitted. See Figure 7.4}
. generate time = followup/365.25
. label variable time "Follow-up in Years"
. generate exitage = age + time {1}
. stset exitage, enter(time age) failure(chdfate) {2} failure event: chdfate ∼= 0 & chdfate ∼=.
obs. time interval: (0, exitage]
enter on or after: time age exit on or before: failure
--- 4699 total obs.
0 exclusions
--- 4699 obs. remaining, representing
1473 failures in single record/single failure data
103710.1 total analysis time at risk, at risk from t = 0 earliest observed entry t = 30 last observed exit t = 94
. sts graph , by(sex) tmin(30) xlabel(30 40 to 90) ylabel(0,.1 to .8) failure
> l1title("Cumulative CHD Morbidity") gap(3) noborder {3}
258 7. Hazard regression analysis
failure _d: chdfate analysis time _t: exitage enter on or after: time age . *
. * Calculate the logrank test corresponding to these morbidity functions . *
. sts test sex {4}
failure _d: chdfate analysis time _t: exitage enter on or after: time age
Log-rank test for equality of survivor functions ---
| Events
sex | observed expected ---+---
Men | 823 571.08
Women | 650 901.92
---+--- Total | 1473 1473.00 chi2(1) = 182.91 Pr>chi2 = 0.0000 . *
. * Calculate the relative risk of CHD for men relative to women using age as . * the time variable.
. *
. generate male = sex == 1
. stcox male {5}
failure _d: chdfate analysis time _t: exitage enter on or after: time age
{Output omitted}
Cox regression -- Breslow method for ties
No. of subjects = 4699 Number of obs = 4699
No. of failures = 1473 Time at risk = 103710.0914
LR chi2(1) = 177.15 Log likelihood = −11218.785 Prob > chi2 = 0.0000
259 7.10. Hazard regression models with time-dependent covariates
--- _t |
_d | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
---+--- male | 2.011662 .1060464 13.259 0.000 1.814192 2.230626 ---
Comments
1 We defineexitageto be the patient’s age at exit. This is the age when she either suffered CHD or was censored.
2 This command specifies the survival-time and fate variables for the sub- sequent survival commands. It definesexitageto be the time (age) when the subject’s follow-up ends,ageto be the time (age) when she is recruited into the cohort, andchdfateto be her fate at exit. Recall thatageis the patient’s age at her baseline exam and that she was free of CHD at that time (see Section 3.10).
3 This command plots cumulative CHD morbidity as a function of age for men and women. Strictly speaking, these plots are for people who are free of CHD at age 30, since this is the earliest age at recruitment.
However, since CHD is rare before age 30, these plots closely approximate the cumulative morbidity curves from birth.
4 The logrank text is performed in exactly the same way as in Section 7.7.
Changing the survival-time variable from years of follow-up to age in- creases the statistical significance of this test.
5 This command performs the proportional hazards regression defined by model (7.16).