For large data sets, Poisson regression is much faster than hazard regression analysis with time dependent covariates. If we have reason to believe that the proportional hazards assumption is false, it makes sense to do our ex- ploratory analyses using Poisson regression. Before we can do this we must first convert the data from survival format to person-year format.
8.8.1. Recoding Survival Data on Patients as Patient-Year Data
Table 8.2 shows a survival data set consisting of entry age, exit age, treat- ment, and fate on five patients. The conversion of this survival data set to a patient-year data set is illustrated in Figure 8.2. Individual patients
277 8.8. Poisson regression and survival analysis
Table 8.2. This table shows survival data for five hypothetical patients. Each patient contributes person-years of follow-up to several strata defined by age and treatment. The conversion of this data set into a person-year data set for Poisson regression analysis is depicted in Figure 8.2.
Patient ID Entry age Exit age Treatment Fate
A 1 4 1 Alive
B 3 5 1 Dead
C 3 6 2 Alive
D 2 3 2 Dead
E 1 3 2 Dead
Years of Follow-up 0
1 2 3 4 5 6
0 1 2 3 4
B
C
D E
L L L
J
Treatment 1 Person- Years of Follow-up
Deaths
Treatment 2 Person- Years of Follow-up
Deaths
0 0 1 0 1 1 1 0 2 0 1 0 2 0 3 2 1 0 2 0 1 0 1 0 Treatment 1
Treatment 2
LDead L JAlive
J 0 0 0 0
JA
Age
Figure 8.2 The survival data from Table 8.2 is depicted in the graph on the left of this figure.
As the study subjects age during follow-up, they contribute person-years of observation to strata defined by age and treatment. Before performing Poisson regression, the survival data must be converted into a table of person-year data such as that given on the right of this figure. For example, three patients (B, C, and A) contribute follow-up to four-year-old patients. Two of them (B and A) are in Treatment 1 and one (C) is in Treatment 2. No deaths were observed at this age. Patients D and E do not contribute to this age because they died at age three.
278 8. Introduction to Poisson regression
contribute person-years of follow-up to a number of different ages. For example, Patient B enters the study at age 3 and dies at age 5. She contri- butes one year of follow-up at age 3, one at age 4, and one at age 5. To create the corresponding person-year data set we need to determine the number of patient-years of follow-up and number of deaths for each age in each treat- ment. This is done by summing across the rows of Figure 8.2. For example, consider age 3. There are three person-years of follow-up in Treatment 2 at this age that are contributed by patients C, D, and E. Deaths occur in two of these patient-years (Patients D and E). In Treatment 1 there are two person-years of follow-up for age 3 and no deaths (Patients B and A). The remainder of the table on the right side of Figure 8.2 is completed in a simi- lar fashion. Note that the five patient survival records are converted into 14 records in the person-year file.
8.8.2. Converting Survival Records to Person-Years of Follow-Up using Stata The following program may be used as a template to convert survival records on individual patients into records giving person-years of follow-up. It also demonstrates many of the ways in which data may be manipulated with Stata.
. * 8.8.2.Survival_to_Person-Years.log . *
. * Convert survival data to person-year data.
. * The survival data set must have the following variables:
. * id = patient id,
. * age_in = age at start of follow-up, . * age_out = age at end of follow-up,
. * fate = fate at exit: censored = 0, dead = 1, . * treat = treatment variable.
. *
. * The person-year data set created below will contain one . * record per unique combination of treatment and age.
. *
. * Variables in the person-year data set that must not be in the . * original survival data set are
. * age_now = an age of people in the cohort,
. * pt_yrs = number of patient-years of observations of people . * who are age_now years old,
. * deaths = number of events (fate=1) occurring in pt_yrs of . * follow-up for this group of patients.
. *
279 8.8. Poisson regression and survival analysis
. use C:\WDDtext\8.8.2.Survival.dta, clear . list
id age_in age_out treat fate
1. A 1 4 1 0
2. B 3 5 1 1
3. C 3 6 2 0
4. D 2 3 2 1
5. E 1 3 2 1
. expand age_out - age_in + 1 {1}
(11 observations created)
. sort id {2}
. list if id == "B" {3}
id age_in age_out treat fate
5. B 3 5 1 1
6. B 3 5 1 1
7. B 3 5 1 1
. generate first = id[_n] ˜= id[_n-1] {4}
. generate age_now = age_in
. replace age_now = age_now[_n-1]+1 if ˜first {5}
(11 real changes made)
. generate last = id[_n] ˜= id[_n+1] {6}
. generate observed = fate*last {7}
. generate one = 1 {8}
. list id age_in age_out first age_now if id == "B" {9}
id age_in age_out first age_now
5. B 3 5 1 3
6. B 3 5 0 4
7. B 3 5 0 5
. list id treat fate last observed one if id == "B" {10} id treat fate last observed one
5. B 1 1 0 0 1
6. B 1 1 0 0 1
7. B 1 1 1 1 1
. sort treat age_now {11}
. collapse (sum) pt_yrs = one deaths = observed, by(treat age_now) {12}
. list treat age_now pt_yrs deaths {13}
280 8. Introduction to Poisson regression
treat age_now pt_yrs deaths
1. 1 1 1 0
2. 1 2 1 0
3. 1 3 2 0
4. 1 4 2 0
5. 1 5 1 1
6. 2 1 1 0
7. 2 2 2 0
8. 2 3 3 2
9. 2 4 1 0
10. 2 5 1 0
11. 2 6 1 0
. save C:\WDDtext\8.8.2.Person-Years.dta, replace {14} file C:\WDDtext\8.8.2.Person-years.dta saved
Comments
1 We expand the number of records per patient so that each patient has as many records as years of follow-up.
2 The file is sorted byidto make all records on the same patient contiguous.
3 For example, patient B enters the study at age 3 and exits at age 5. There- fore we create three records for this patient corresponding to ages 3, 4, and 5.
4 The variablefirstis set equal to 1 on the first record for each patient and equals zero on subsequent records. This is done by settingfirst=TRUE if the value ofidin the preceding record is not equal to its value in the current record;first=FALSE otherwise. Recall that the numeric values for TRUE and FALSE are 1 and 0, respectively.
5 Increment the value ofage nowby 1 for all but the first record of each patient. In theithrecord for each patient,age nowequals the patient’s age in herithyear of follow-up.
6 The variablelast=1 in the last record for each patient;last=0 otherwise.
7 The variableobserved=1 if the patient dies during the current year;obser- ved=0 otherwise. Since the patient must have survived the current year if she has an additional record,observed=1 if and only if the patient dies in her last year of follow-up (fate=1) and we have reached her last year (last=1).
8 We will use the variableone=1 to count patient-years of follow-up.
9 For example, patientBis followed for three years. Her age in these years is recorded inage now, which is 3, 4, and 5 years, respectively.
10 PatientBdies in her fifth year of life. She was alive at the end of her third
281 8.9. Converting the Framingham survival data set to person-time data and fourth year. Hence,observedequals 0 in her first two records and equals 1 in her last.
11 We now sort bytreatandage nowto make all records of patients with the same treatment and age contiguous.
12 This statement collapses all records with identical values of treat and age now into a single record. The variable pt yrs is set equal to the number of records collapsed (the sum ofone over these records) and deathsis set equal to the number of deaths (the sum ofobservedover these records). All variables are deleted from memory excepttreat, age now, pt yrs,anddeaths.
13 The data set now corresponds to the right-hand side of Figure 8.2. Note, however, that the program only creates records for which there is at least one person-year of follow-up. The reason why there are 11 rather than 14 records in the file is that there are no person-years of follow-up for 6 year-old patients on treatment 1 or for patients on either treatment in their first year of life.
14 The data set is saved for future Poisson regression analysis.
N.B.If you are working on a large data set with many covariates, you can reduce the computing time by keeping only those covariates that you will need in your model(s) before you start to convert to patient-year data. It is a good idea to check that you have not changed the number of deaths or number of years of follow-up in your program. See the8.9.Framingham.log file in the next section for an example of how this can be done.