1. Trang chủ
  2. » Ngoại Ngữ

C11 Survival Analysis

16 131 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 16
Dung lượng 237,74 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Of central importance in the analysis of survival time data are two functions used to describe their distribution, namely the survival or survivor function and the hazard function.. Whe

Trang 1

CHAPTER 11

Survival Analysis:

Glioma Treatment and Breast Cancer Survival

11.1 Introduction

Grana et al (2002) report results of a non-randomised clinical trial investi-gating a novel radioimmunotherapy in malignant glioma patients The overall survival, i.e., the time from the beginning of the therapy to the disease-caused death of the patient, is compared for two groups of patients A control group underwent the standard therapy and another group of patients was treated with radioimmunotherapy in addition The data, extracted from Tables 1 and

2 in Grana et al (2002), are given in Table 11.1 The main interest is to inves-tigate whether the patients treated with the novel radioimmunotherapy have,

on average, longer survival times than patients in the control group

glioma treated with the standard therapy or a novel radioimmunotherapy (RIT)

age sex histology group event time

Trang 2

Table 11.1: glioma data (continued).

age sex histology group event time

per-mission

The effects of hormonal treatment with Tamoxifen in women suffering from node-positive breast cancer were investigated in a randomised clinical trial as reported by Schumacher et al (1994) Data from randomised patients from this trial and additional non-randomised patients (from the German Breast Can-cer Study Group 2, GBSG2) are analysed by Sauerbrei and Royston (1999) Complete data of seven prognostic factors of 686 women are used in Sauerbrei and Royston (1999) for prognostic modelling Observed hypothetical prognos-tic factors are age, menopausal status, tumour size, tumour grade, number of positive lymph nodes, progesterone receptor, estrogen receptor and the infor-mation of whether or not a hormonal therapy was applied We are interested

in an assessment of the impact of the covariates on the survival time of the patients A subset of the patient data are shown in Table 11.2

11.2 Survival Analysis

In many medical studies, the main outcome variable is the time to the oc-currence of a particular event In a randomised controlled trial of cancer, for example, surgery, radiation and chemotherapy might be compared with

Trang 3

Table 11.2: GBSG2 data (package ipred) Randomised clinical

trial data from patients suffering from node-positive breast cancer Only the data of the first 20 patients are shown here

Source : From Sauerbrei, W and Royston, P., J Roy Stat Soc A, 162, 71–94, 1999 With permission.

Trang 4

spect to time from randomisation and the start of therapy until death In this case, the event of interest is the death of a patient, but in other situations,

it might be remission from a disease, relief from symptoms or the recurrence

of a particular condition Other censored response variables are the time to credit failure in financial applications or the time a roboter needs to success-fully perform a certain task in engineering Such observations are generally

referred to by the generic term survival data even when the endpoint or event

being considered is not death but something else Such data generally require special techniques for analysis for two main reasons:

1 Survival data are generally not symmetrically distributed – they will often appear positively skewed, with a few people surviving a very long time compared with the majority; so assuming a normal distribution will not be reasonable

2 At the completion of the study, some patients may not have reached the endpoint of interest (death, relapse, etc.) Consequently, the exact survival times are not known All that is known is that the survival times are greater than the amount of time the individual has been in the study The survival

times of these individuals are said to be censored (precisely, they are

right-censored)

Of central importance in the analysis of survival time data are two functions

used to describe their distribution, namely the survival (or survivor ) function and the hazard function.

11.2.1 The Survivor Function

The survivor function, S(t), is defined as the probability that the survival time, T , is greater than or equal to some time t, i.e.,

S(t) = P(T ≥ t)

A plot of an estimate ˆS(t) of S(t) against the time t is often a useful way of describing the survival experience of a group of individuals When there are

no censored observations in the sample of survival times, a non-parametric survivor function can be estimated simply as

ˆ S(t) = number of individuals with survival times ≥ t

n where n is the total number of observations Because this is simply a propor-tion, confidence intervals can be obtained for each time t by using the variance estimate

ˆ S(t)(1 − ˆS(t))/n

The simple method used to estimate the survivor function when there are

no censored observations cannot now be used for survival times when censored observations are present In the presence of censoring, the survivor function

is typically estimated using the Kaplan-Meier estimator (Kaplan and Meier,

Trang 5

SURVIVAL ANALYSIS 201 1958) This involves first ordering the survival times from the smallest to the largest such that t(1)≤ t(2) ≤ · · · ≤ t(n), where t(j) is the jth largest unique survival time The Kaplan-Meier estimate of the survival function is obtained as

ˆ S(t) = Y j:t (j) ≤t



1 −dj rj



where rj is the number of individuals at risk just before t(j) (including those censored at t(j)), and dj is the number of individuals who experience the event

of interest (death, etc.) at time t(j) So, for example, the survivor function at the second death time, t(2), is equal to the estimated probability of not dying

at time t(2), conditional on the individual being still at risk at time t(2) The estimated variance of the Kaplan-Meier estimate of the survivor function is found from

Var( ˆS(t)) = ˆS(t)2 X

j:t (j) ≤t

dj rj(rj− dj).

A formal test of the equality of the survival curves for the two groups can be

made using the log-rank test First, the expected number of deaths is computed for each unique death time, or failure time in the data set, assuming that

the chances of dying, given that subjects are at risk, are the same for both groups The total number of expected deaths is then computed for each group

by adding the expected number of deaths for each failure time The test then compares the observed number of deaths in each group with the expected number of deaths using a chi-squared test Full details and formulae are given

in Therneau and Grambsch (2000) or Everitt and Rabe-Hesketh (2001), for example

11.2.2 The Hazard Function

In the analysis of survival data it is often of interest to assess which periods have high or low chances of death (or whatever the event of interest may be), among those still active at the time A suitable approach to characterise such risks is the hazard function, h(t), defined as the probability that an individual experiences the event in a small time interval, s, given that the individual has survived up to the beginning of the interval, when the size of the time interval approaches zero; mathematically this is written as

h(t) = lim s→0

P(t ≤ T ≤ t + s|T ≥ t)

s where T is the individual’s survival time The conditioning feature of this definition is very important For example, the probability of dying at age

100 is very small because most people die before that age; in contrast, the probability of a person dying at age 100 who has reached that age is much greater

Trang 6

0 20 40 60 80 100

Time

The hazard function and survivor function are related by the formula

S(t) = exp(−H(t))

where H(t) is known as the integrated hazard or cumulative hazard, and is

defined as follows:

H(t) =

Z t 0 h(u)du;

details of how this relationship arises are given in Everitt and Pickles (2000)

In practise the hazard function may increase, decrease, remain constant or have a more complex shape The hazard function for death in human beings, for example, has the ‘bath tub’ shape shown in Figure 11.1 It is relatively high immediately after birth, declines rapidly in the early years and then remains approximately constant before beginning to rise again during late middle age The hazard function can be estimated as the proportion of individuals ex-periencing the event of interest in an interval per unit time, given that they have survived to the beginning of the interval, that is

ˆ

nj(t(j+1)− t(j)). The sampling variation in the estimate of the hazard function within each interval is usually considerable and so it is rarely plotted directly Instead the integrated hazard is used Everitt and Rabe-Hesketh (2001) show that this

Trang 7

SURVIVAL ANALYSIS 203 can be estimated as follows:

ˆ H(t) =X

j

dj

nj.

11.2.3 Cox’s Regression

When the response variable of interest is a possibly censored survival time,

we need special regression techniques for modelling the relationship of the response to explanatory variables of interest A number of procedures are

available but the most widely used by some margin is that known as Cox’s

David Cox in 1972 (see Cox, 1972), the method has become one of the most commonly used in medical statistics and the original paper one of the most heavily cited

The main vehicle for modelling in this case is the hazard function rather than the survivor function, since it does not involve the cumulative history

of events But modelling the hazard function directly as a linear function

of explanatory variables is not appropriate since h(t) is restricted to being positive A more suitable model might be

log(h(t)) = β0+ β1x1+ · · · + βqxq (11.1) But this would only be suitable for a hazard function that is constant over time; this is very restrictive since hazards that increase or decrease with time,

or have some more complex form are far more likely to occur in practise In general it may be difficult to find the appropriate explicit function of time to include in (11.1) The problem is overcome in the proportional hazards model proposed by Cox (1972) by allowing the form of dependence of h(t) on t to remain unspecified, so that

log(h(t)) = log(h0(t)) + β1x1+ · · · + βqxq where h0(t) is known as the baseline hazard function, being the hazard function for individuals with all explanatory variables equal to zero The model can be rewritten as

h(t) = h0(t) exp(β1x1+ · · · + βqxq)

Written in this way we see that the model forces the hazard ratio between two individuals to be constant over time since

h(t|x1) h(t|x2)=

exp(β⊤x1) exp(β⊤x2)

where x1 and x2 are vectors of covariate values for two individuals In other words, if an individual has a risk of death at some initial time point that is twice as high as another individual, then at all later times, the risk of death remains twice as high Hence the term proportional hazards

Trang 8

In the Cox model, the baseline hazard describes the common shape of the

survival time distribution for all individuals, while the relative risk function,

exp(β⊤x), gives the level of each individual’s hazard The interpretation of the parameter βj is that exp(βj) gives the relative risk change associated with an increase of one unit in covariate xj, all other explanatory variables remaining constant

The parameters in a Cox model can be estimated by maximising what

is known as a partial likelihood Details are given in Kalbfleisch and

Pren-tice (1980) The partial likelihood is derived by assuming continuous survival times In reality, however, survival times are measured in discrete units and there are often ties There are three common methods for dealing with ties which are described briefly in Everitt and Rabe-Hesketh (2001)

11.3 Analysis Using R

11.3.1 Glioma Radioimmunotherapy

The survival times for patients from the control group and the group treated with the novel therapy can be compared graphically by plotting the Kaplan-Meier estimates of the survival times Here, we plot the Kaplan-Kaplan-Meier esti-mates stratified for patients suffering from grade III glioma and glioblastoma (GBM, grade IV) separately; the results are given inFigure 11.2 The Kaplan-Meier estimates are computed by the survfit function from package survival

(Therneau and Lumley, 2009) which takes a model formula of the form

Surv(time, event) ~ group

where time are the survival times, event is a logical variable being TRUE when the event of interest, death for example, has been observed and FALSE when

in case of censoring The right hand side variable group is a grouping factor Figure 11.2 leads to the impression that patients treated with the novel radioimmunotherapy survive longer, regardless of the tumour type In order

to assess if this informal finding is reliable, we may perform a log-rank test via

R> survdiff(Surv(time, event) ~ group, data = g3)

Call:

survdiff(formula = Surv(time, event) ~ group, data = g3)

N Observed Expected (O-E)^2/E (O-E)^2/V

which indicates that the survival times are indeed different in both groups However, the number of patients is rather limited and so it might be danger-ous to rely on asymptotic tests As shown in Chapter 4, conditioning on the data and computing the distribution of the test statistics without additional

Trang 9

ANALYSIS USING R 205 R> data("glioma", package = "coin")

R> library("survival")

R> layout(matrix(1:2, ncol = 2))

R> g3 <- subset(glioma, histology == "Grade3")

R> plot(survfit(Surv(time, event) ~ group, data = g3),

+ main = "Grade III Glioma", lty = c(2, 1),

+ ylab = "Probability", xlab = "Survival Time in Month", + legend.text = c("Control", "Treated"),

+ legend.bty = "n")

R> g4 <- subset(glioma, histology == "GBM")

R> plot(survfit(Surv(time, event) ~ group, data = g4),

+ main = "Grade IV Glioma", ylab = "Probability",

+ lty = c(2, 1), xlab = "Survival Time in Month",

+ xlim = c(0, max(glioma$time) * 1.05))

Grade III Glioma

Survival Time in Month

Grade IV Glioma

Survival Time in Month

assumptions are one alternative The function surv_test from package coin (Hothorn et al., 2006a, 2008b) can be used to compute an exact conditional test answering the question whether the survival times differ for grade III pa-tients For all possible permutations of the groups on the censored response variable, the test statistic is computed and the fraction of whose being greater than the observed statistic defines the exact p-value:

R> library("coin")

R> surv_test(Surv(time, event) ~ group, data = g3,

+ distribution = "exact")

Trang 10

Exact Logrank Test

Z = 2.1711, p-value = 0.02877

alternative hypothesis: two.sided

which, in this case, confirms the above results The same exercise can be performed for patients with grade IV glioma

R> surv_test(Surv(time, event) ~ group, data = g4,

+ distribution = "exact")

Exact Logrank Test

Z = 3.2215, p-value = 0.0001588

alternative hypothesis: two.sided

which shows a difference as well However, it might be more appropriate to answer the question whether the novel therapy is superior for both groups of

tumours simultaneously This can be implemented by stratifying, or blocking,

with respect to tumour grading:

R> surv_test(Surv(time, event) ~ group | histology,

+ data = glioma, distribution = approximate(B = 10000))

Approximative Logrank Test

group (Control, RIT) stratified by histology

Z = 3.6704, p-value = 1e-04

alternative hypothesis: two.sided

Here, we need to approximate the exact conditional distribution since the exact distribution is hard to compute The result supports the initial impression implied byFigure 11.2

11.3.2 Breast Cancer Survival

Before fitting a Cox model to the GBSG2 data, we again derive a Kaplan-Meier estimate of the survival function of the data, here stratified with respect to whether a patient received a hormonal therapy or not (see Figure 11.3) Fitting a Cox model follows roughly the same rules as shown for linear models in Chapter 6 with the exception that the response variable is again

coded as a Surv object For the GBSG2 data, the model is fitted via

R> GBSG2_coxph <- coxph(Surv(time, cens) ~ , data = GBSG2) and the results as given by the summary method are given inFigure 11.4 Since

we are especially interested in the relative risk for patients who underwent a hormonal therapy, we can compute an estimate of the relative risk and a corresponding confidence interval via

Ngày đăng: 09/04/2017, 12:12

TỪ KHÓA LIÊN QUAN