Using Stata to Derive Survival Functions and the L- 123docz.net

. * 6.9. Hemorrhage.do . *

. * Plot Kaplan--Meier Survival functions for recurrent lobar intracerebral . * hemorrhage in patients who are, or are not, homozygous for the epsilon3 . * allele of the apolipoprotein E gene (O’Donnell et al. 2000).

. *

. use C:\WDDtext\6.9.Hemorrhage.dta, clear

. summarize {1}

Variable | Obs Mean Std. Dev. Min Max

---+---

genotype | 70 .5428571.5017567 0 1

time | 71 22.50051 15.21965 .2299795 53.88091

recur | 71.2676056 .4458618 0 1

. table genotype recur, col row {2}

---+--- Apolipopro |

tein E | Recurrence Genotype | No yes Total ---+---

e3/e3 | 28 4 32

e2+ or e4+ | 24 14 38

Total | 52 18 70

---+---

. stset time, failure(recur) {3}

failure event: recur ~= 0 & recur ~= . obs. time interval: (0, time]

exit on or before: failure

--- 71total obs.

0 exclusions

--- 71obs. remaining, representing

19 failures in single record/single failure data

216 6. Introduction to survival analysis

1597.536 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 53.88091 . set textsize 120

. *

. * Graph survival function bygenotype . *

. sts graph, by(genotype) ylabel(0 .2 to 1) ytick(.1 .3 to .9) {4}

> xlabel(0 10 to 50) xtick(5 15 to 45)

> l1title("Probabilityof Hemorrhage-Free Survival")

> b2title("Months of Follow-up") gap(2) noborder

{Graph omitted, see Figure 6.4}

failure_d: recur analysis time_t: time . *

. * List survival statistics . *

. sts list, by(genotype) {5}

failure_d: recur analysis time_t: time

Beg. Net Survivor Std.

Time Total Fail Lost Function Error [95% Conf. Int.]

--- e3/e3

.23 32 10 0.9688 0.0308 0.7982 0.9955

1.051 31 0 1 0.9688 0.0308 0.7982 0.9955

1.511 30 0 1 0.9688 0.0308 0.7982 0.9955

3.055 29 10 0.9353 0.0443 0.76510.9835

8.082 28 0 10.9353 0.0443 0.76510.9835

12.32 27 10 0.9007 0.0545 0.7224 0.9669

{Output omitted}

24.77 20 10 0.8557 0.0679 0.6553 0.9441

{Output omitted}

53.88 10 1 0.8557 0.0679 0.6553 0.9441

e2+ or e4+

1.38 38 0 1 1.0000 . . .

1.413 37 1 0 0.9730 0.0267 0.8232 0.9961

1.577 36 1 1 0.9459 0.0372 0.8007 0.9862

3.318 34 1 0 0.9181 0.0453 0.7672 0.9728

3.515 33 1 0 0.8903 0.0518 0.7335 0.9574

217 6.9. Using Stata to derive survival functions and the logrank test

3.548 32 10 0.8625 0.05710.7005 0.9404

4.04131 0 1 0.8625 0.0571 0.7005 0.9404

4.632 30 0 10.8625 0.05710.7005 0.9404

4.764 29 10 0.8327 0.0624 0.6646 0.9213

8.444 28 0 10.8327 0.0624 0.6646 0.9213

9.528 27 10 0.8019 0.0673 0.6280 0.9005

10.61 26 0 1 0.8019 0.0673 0.6280 0.9005

10.68 25 0 1 0.8019 0.0673 0.6280 0.9005

11.86 24 0 1 0.8019 0.0673 0.6280 0.9005

13.27 23 0 1 0.8019 0.0673 0.6280 0.9005

{Output omitted}

46.88 1 0 1 0.3480 0.1327 0.1174 0.5946

--- . *

. * Graph survival functions bygenotype with 95% confidence intervals.

. * Show loss to follow-up.

. *

. sts graph, by(genotype) lost gwoodylabel(0 .2 to 1) ytick(.1 .3 to .9) {6}

> xlabel(0 10 to 50) xtick(5 15 to 45)

> l1title("Probabilityof Hemorrhage-Free Survival")

> b2title("Months of Follow-up") gap(2) noborder

{Graph omitted, See Figure 6.5}

failure_d: recur analysis time_t: time . *

. * Calculate cumulative morbidityfor homozygous epsilon3 patients . * together with 95% confidence intervals for this morbidity.

. *

. sts generate s0 = s if genotype == 0 {7}

. sts generate lb_s0 = lb(s) if genotype == 0 . sts generate ub_s0 = ub(s) if genotype == 0

. generate d0 = 1 - s0 {8}

(39 missing values generated)

. generate lb_d0 = 1 - ub_s0 {9}

(39 missing values generated) . generate ub_d0 = 1 - lb_s0 (39 missing values generated)

218 6. Introduction to survival analysis

. *

. * Plot cumulative morbidityfor homozygous epsilon3 patients.

. * Show 95% confidence intervals and loss to follow-up . *

. graph lb_d0 ub_d0 d0 time, symbol(iiO) connect(JJJ) ylabel(0 0.05 to0.35) {10}

> xlabel(0 10to50) xtick(5 15to45) l1title("Probabilityof Hemorrhage") gap(3) {Graph omitted, see Figure 6.6}

. *

. * Compare survival functions for the two genotypes using the logrank test.

. *

. sts test genotype {11}

failure_d: recur analysis time_t: time

Log-rank test for equality of survivor functions ---

| Events

genotype | observed expected ---+---

e3/e3 | 4 9.28

e2+ or e4+ | 14 8.72

---+---

Total | 18 18.00

chi2(1) = 6.28

Pr>chi2 = 0.0122

Comments

1 The hemorrhage data set contains three variables on 71 patients. The variable time denotes length of follow-up in months; recur records whether the patient had a hemorrhage (recur=1) or was censored (recur=0) at the end of follow-up;genotypedivides the patients into two groups determined by their genotype. The value ofgenotypeis missing on one patient who did not give a blood sample.

2 This command tabulates study subjects by hemorrhage recurrence and genotype. The value labels of these two variables are shown.

3 This stsetcommand speciﬁes that the data set contains survival data.

Each patient’s follow-up time is denoted bytime; her fate at the end of follow-up is denoted by recur. Stata interprets recur = 0 to mean that the patient is censored andrecur =0 to mean that she suffered the event of interest at exit. A stset command must be speciﬁed

219 6.9. Using Stata to derive survival functions and the logrank test

before other survival commands such assts list,sts graph,sts testorsts generate.

4 The sts graph command plots Kaplan–Meier survival curves; by (genotype)speciﬁes that separate plots will be generated for each value ofgenotype.By default, thests graphcommand does not title they-axis and titles thex-axis “analysis time”.Thel1titleandb2titleoptions pro- vides the titles “Probability of Hemorrhage-Free Survival”and “Months of Follow-up”for they- and x-axes, respectively. Thenoborderoption prevents a border from being drawn on the top and right side of the graph. The resulting graph is similar to Figure 6.4.

5 This command lists the values of the survival functions that are plotted by the preceding command. The by(genotype)option speciﬁes that a separate survival function is to be calculated for each value ofgenotype.

The number of patients at riskprior to each failure or loss to follow-up is also given, together with the 95% conﬁdence interval for the survival function. The highlighted values agree with the hand calculations in Sections 6.4 and 6.5.

6 Stata also permits users to graph conﬁdence bounds for ˆS(t) and indicate the number of subjects lost to follow-up. This is done with thegwoodand lostoptions, respectively. In this example, a separate plot is generated for each value of genotype. Figure 6.5 is similar to one of these two plots.

7 Thests generatecommand creates regular Stata variables from survival analyses. Here,s0=s definess0, to equal the survival function for patients with genotype=0 (i.e. homozygousε3/ε3 patients). In the next two commandslb s0=lb(s)andub s0=ub(s)definelb s0andub s0to be the lower and upper bounds of the 95% confidence interval fors0.

8 The variabled0is the cumulative morbidity function (see equation (6.3)).

9 The variableslb d0andub d0are the lower and upper bounds of the 95% conﬁdence interval ford0. Note the lower bound equals one minus the upper bound fors0and vice versa.

10 This command produces a graph that is similar to Figure 6.6. The J symbol in theconnectoption produces the stepwise connections that are needed for a morbidity or survival function. The O symbol in the symboloption produces dots at times when patients have hemorrhages or are censored.

11 Perform a logranktest for the equality of survivor functions in patient groups deﬁned by different values ofgenotype. In this example, patients who are homozygous for theε3 allele are compared to other patients.

The highlighted chi-squared statistic and P value agree with our hand calculations for the uncorrected test.

220 6. Introduction to survival analysis

Using Stata to Derive Survival Functions and the Logrank Test

The Stata Statistical Software Package

Simple Linear Regression with Stata