Stata can analyze hazard regression models with time-dependent covari- ates that are step-functions. To do this, we must first define multiple data records per patient in such a way that the covariate functions for the patient are constant for the period covered by each record. This is best explained
262 7. Hazard regression analysis
Table 7.5. Reformatting of the data for Framingham patient 18 needed to analyze model (7.18). For time-dependent hazard regression models the data must be split into multiple records per patient in such a way that the covariates remain constant over the time period covered by each record (see text).
id male1 male2 enter exit fate
18 1 0 42 57 0
18 0 1 57 63 1
by an example. Suppose that we wished to analyze model (7.18). In the Framingham data set, patient 18 is a man who entered the study at age 42 and exits with CHD at age 63. For this patientid=18,age=42,exitage= 63, andchdfate=1. We replace the record for this patient with two records.
The first of these records describes his covariates from age 42 to age 57 while the other describes his covariates from age 57 to age 63. We define new vari- ablesmale1, male2, enter,exitandfatewhose values are shown in Table 7.5.
The variableenterequals his true entry age (42) in the first record and equals 57 in the second;exitequals 57 in the first record and equals his true exit age (63) in the second. The variablefatedenotes his CHD status at the age given byexit. In the first record,fate =0 indicating that he had not de- veloped CHD by age 57. In the second record,fate=1 indicating that he did develop CHD at age 63. The variablemale1equals the age dependent covariatemalei1(age). In the first recordmale1=1 sincemalei1(age)=1 from entry until age 57. In the second record male1=0 sincemalei1(age)= 0 after age 57 even though patient 18 is a man. Similarly male2 equals malei2(age), which equals 0 before age 57 and 1 afterwards.
We need two records only for patients whose follow-up spans age 57.
Patients who exit the study before age 57 or enter after age 57 will have a single record. In this case,enterwill equal the patient’s entry age andexit will equal his or her age at the end of follow-up; fate will equalchdfate andmale1andmale2will be defined according to the patient’s gender and age during follow-up. Time-dependent analyses must have an identification variable that allows Stata to keep track of which records belong to which patients. In this example, this variable isid.
The only tricky part of a time-dependent hazard regression analysis is defining the data records as described above. Once this is done, the analysis is straightforward. We illustrate how to modify the data file and do this analysis below. The log file7.9.4.Framingham.logcontinues as follows.
263 7.11. Modeling time-dependent covariates with Stata
. *
. * Perform hazard regression with time dependent covariates for sex . *
. tabulate chdfate male {1}
Coronary |
Heart | male
Disease | 0 1 | Total
---+---+---
Censored | 2000 1226 | 3226
CHD | 650 823 | 1473
---+---+---
Total | 2650 2049 | 4699
. generate records = 1
. replace records = 2 if age < 57 & exitage > 57 {2} (3331 real changes made)
. expand records {3}
(3331 observations created)
. sort id {4}
. generate enter = age
. replace enter = 57 if id == id[_n-1] {5}
(3331 real changes made) . generate exit = exitage
. replace exit = 57 if id == id[_n+1] {6}
(3331 real changes made) . generate fate = chdfate
. replace fate = 0 if id == id[_n+1] {7}
(855 real changes made)
. tabulate fate male {8}
| male
fate | 0 1 | Total
---+---+---
0 | 3946 2611 | 6557
1 | 650 823 | 1473
---+---+---
Total | 4596 3434 | 8030
264 7. Hazard regression analysis
. generate male1 = male*(enter < 57) {9}
. generate male2= male*(exit > 57) {10}
. tabulate male1 male2 {11}
| male2
male1 | 0 1 | Total
---+---+---
0 | 4596 1672 | 6268
1 | 1762 0 | 1762
---+---+---
Total | 6358 1672 | 8030
. stset exit, id(id) enter(time enter) failure(fate) {12} id: id
failure event: fate ∼= 0 & fate ∼=.
obs. time interval: (exit[_n-1], exit]
enter on or after: time enter exit on or before: failure
--- 8030 total obs.
0 exclusions
--- 8030 obs. remaining, representing
4699 subjects
1473 failures in single failure-per-subject data
103710.1 total analysis time at risk, at risk from t = 0 earliest observed entry t = 30 last observed exit t = 94
. stcox male1 male2 {13}
failure _d: fate analysis time _t: exit enter on or after: time enter
id: id
{Output omitted}
Cox regression -- Breslow method for ties
No. of subjects = 4699 Number of obs = 8030
No. of failures = 1473 Time at risk = 103710.0914
265 7.11. Modeling time-dependent covariates with Stata
LR chi2(2) = 209.41 Log likelihood = −11202.652 Prob > chi2 = 0.0000 ---
_t |
_d | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
---+--- male1 | 3.636365 .4457209 10.532 0.000 2.859782 4.623831 male2 | 1.718114 .1024969 9.072 0.000 1.528523 1.93122 ---
. lincom male1-male2 {14}
(1) male1 − male2 = 0.0
--- _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--- (1) | .7497575 .1363198 5.500 0.000 .4825755 1.01694 ---
Comments
1 The next few commands will create the multiple records that we need. It is prudent to be cautious doing this and to create before and after tables to confirm that we have done what we intended to do.
2 We definerecordsto equal the number of records needed for each patient.
This is either 1 or 2. Two records are needed only if the subject is recruited before age 57 and exits after age 57.
3 Theexpandcommand creates identical copies of records in the active data set. It creates one fewer new copy of each record than the value of the variablerecords. Hence, after this command has been executed there will be either 1 or 2 records in the file for each patient depending on whetherrecordsequals 1 or 2. The new records are appended to the bottom of the data set.
4 The2.20.Framingham.dtadata set contains a patient identification vari- able calledid. Sorting by idbrings all of the records on each patient together. It is prudent to open the Stata editor frequently during data manipulations to make sure that your commands are having the desired effect. You should also make sure that you have a back-up copy of your original data as it is all too easy to replace the original file with one that has been modified.
266 7. Hazard regression analysis
5 Stata allows us to refer to the values of records adjacent to the cur- rent record. The value nalways equals the record number of the cur- rent record;id[n−1] equals the value ofidin the record preceding the current record, whileid[n+1] equals the value ofidin the record fol- lowing the current record.
Ifid==id[ n−1] is true then we are at the second of two records for the current patient. The previous command definedenterto equal age. This command replacesenterwith the value 57 whenever we are at a patient’s second record. Henceenterequals the patients entry age in the first record for each patient, and equals 57 in the second. If there is only one record for the patient, thenenterequals the patient’s entry age.
6 Similarlyexitis set equal toexitageunless we are at the first of two records for the same patient, in which caseexitequals 57.
7 The variablefateequalschdfateunless we are at the first of two records for the same patient. If a second record exists, then the first record must be for the first age interval and her follow-up must extend beyond age 57. Hence, the patient must not have developed CHD by age 57. For this reason we setfate=0 whenever we encounter the first of two records for the same patient.
8 This table shows that there are 650 records for women showing CHD and 823 such records for men. This is the same as the number of women and men who had CHD. Thus, we have not added or removed any CHD events by the previous manipulations.
9 We setmale1=1 if and only if the subject is male and the record de- scribes a patient in the first age interval. Otherwise,male1=0. (Note that ifenter<57 then we must have thatexit≤57.)
10 Similarly,male2= 1 if and only if the subject is male and the record describes a patient in the second age interval.
11 No records have bothmale1andmale2equal to 1. There are 4596 records of women with bothmale1andmale2equal 0, which agrees with the pre- ceding table.
12 We define exit to be the exit time,id to be the patient identification variable,enterto be the entry time, andfateto be the fate indicator. The stsetcommand also checks the data for errors or inconsistencies in the definition of these variables. Note that the total number of subjects has not been changed by our data manipulation.
13 Finally, we perform a hazard regression analysis with the time-dependent covariatesmale1andmale2. The age-adjusted relative risks of CHD for men prior to, and after, age 57 are highlighted and agree with those given in Section 7.10.
267 7.13. Exercises
14 Thislincomstatement tests the null hypothesis thatβ1=β2in model (7.18) (see Section 7.10.2).