In this section, a new approach to the analysis of survival data with non-proportional hazards is proposed. Here, models are re-parameterised so that, as opposed to interpret- ing in terms of a hazard ratio, models are assessed in terms of a dispersion parameter, measuring the magnitude of divergence due to a covariate and an asymmetry parameter measuring departure from proportionality.
Assume that time-to-event models are expressed in terms of a counting process [40]
and consider the semi-transformational models [64, 65] which allow for the form of the survival models to be interpreted on scales other than the hazard scale. Following this notation, each patient is defined via a counting process,N(t), which records the number of events until timet. Also let Y(t) be the at risk process defined as Y(t) = I(T ỉt) whereI(.) is the indicator function. LetZ be an◊pcovariates matrix with associated parameters —. A conditional survival function is defined as
S(t|Z) = {exp(—TZ) (t)},
where (t) is a non-decreasing function, s0t⁄(u)du with ⁄ as the intensity function to the associated counting process. Note that this conflicts slightly with the notation used in Section 2.3.4 and is not to be confused with the rate parameters used in the exponential and piecewise exponential models. Allowing (.) to be either the Box-Cox or logarithmic family of transformations allows the estimation of both the proportional hazards or proportional odds models as special cases. Specifically, (x) = exp(≠x) gives a proportional hazards model and (x) = (1 +x)≠1 gives the proportional odds models. In the special case of the proportional hazards function, note that ⁄ is an analogue to the hazard function. This class of models is extended by introducing a second ‘asymmetry’ parameter which allows for departure away from the assumption of proportionality. The asymmetry parameter, denoted–, acts on a set of covariatesU which may contain elements of Z. As an example, when modelling a single two level covariate, the definition would be U = Z. The relationship for conditional survival is re-defined as
S(t|Z, U) = {exp(—TZ) (t)exp(–TU)}. (4.2) In this form, model (4.2) looks similar to the relationship observed by assuming a Weibull distribution where both the scale and shape parameter are allowed to vary dependent on covariates. The similarity ends here however, as this approach requires no dependency on a parametric form for a baseline hazard function. A formulation similar to this was proposed by Quantin et al. [124] who propose a test on – as a means of assessing proportionality. Full details describing the behaviour of the ◊ = (—,–) are
given in Section 4.5.2. Justification for the derivation of the asymmetry parameter with respect to the proportional odds model are given in Section 4.5.1.
With respect to model estimation, given (4.2), a hazard function is defined as
h(t|Z, U) = Õ{exp(—TZ) 0(t)exp(–TU)} {exp(—TZ) 0(t)exp(–TU)}
5exp(—TZ+–TU) 0(t)(exp{–TU}≠1)⁄0(t)6.
Here⁄0 is a baseline intensity process. A log likelihood function is given by
l(t|Z, U,◊) =ÿn
i=1
‹i
5log(⁄0(t)) +—TZ+–TU + (exp{–TU}≠1) log( 0(t))+
log!≠ Õ)exp(—TZ) 0(t)exp(–TU)*"6+ (1≠‹i)5log! )exp(—TZ) 0(t)exp(–TU)*"6, (4.3) where‹i = 1 represents an observed event and‹i = 0 represents a censored observation.
In order to maximise the likelihood, it is necessary to express the intensity function in terms of a step function and replace ⁄(t) with {t}. Maximisation of (4.3) as a non parametric maximum likelihood estimation (NPMLE) can be carried out using standard maximisation techniques available in statistical packages. Some simplification of (4.3) can be achieved when all event times are unique. Note that an estimate of the cumulative intensity function is obtained by
H(tˆ |Z, U) =≠log!S(tˆ |Z, U)".
As a cumulative intensity process can be defined for all patients at observed time- points, an estimate of the intensity process is obtained via
ˆh(t|Z, U) = ˆH(t|Z, U)≠Hˆ(t≠|Z, U) (4.4) where H(t≠|Z, U) is the cumulative hazard function at the observed time point im- mediately prior tot. The log likelihood under this formulation is estimated by
ˆl(t|Z, U) =ÿn
i
‹ilog)hˆi(t|Z, U)*+ log)Sˆi(t|Z, U)*. (4.5) In this form, only a definition of a survival function is required in order to produce parameter estimates. Use of the NPMLE is straightforward for small datasets. For larger datasets however, the routine can be difficult to compute due to the need to invert a large matrix in order to obtain standard errors for the fitted parameters. More attractive in this case may be an approach similar to that taken by Yin and Zeng [125] who provide an efficient algorithm which uses a Lagrange multiplier to allow the step sizes given by {t} to be calculated via a set of recursive equations. Parameter
estimation can then be reduced to maximising over fl+ 2 parameters, where fl is the number of parameters of interest. Furthermore, it is illustrated that evaluating the model via a profile likelihood - taking the cumulative hazard function to be a nuisance parameter, standard errors of the key parameters of interest can still be obtained. More details are provided by Murphey[126]. As a guide, the model will fit to datasets of size approximately 100 and provide standard error estimates within a few minutes. For larger datasets, greater than 250 observations say, parameter estimates can be found relatively quickly but standard error estimates via a Hessian matrix may take a few hours. Models are fit using the ‘optim’ functions in R [127]. Code is provided in the Appendix for reference.
4.5.1 Derivation of the asymmetry parameter
Here, derivation of the asymmetry parameter with respect to the proportional odds model is given.
It has been noted by Chen [64] that a transformation ofS(t) = 1/(1+ (t)) will yield a proportional odds model. Considering only the condition where survival between two groups is compared, the proportional odds model is defined as satisfying the condition:
logit{S1(t)}=logit{S0(t)}+Â
where„is the odds ratio between two survival functions. Considering the two survival functions as being naturally bounded by (0,1) there has been much work on the analysis of parametric ROC curves which include the comparisons on similarly bounded function with the inclusion of asymmetry parameters. Define the following structures
V =logit{S1(t)}≠logit{S0(t)}
W =logit{S1(t)}+logit{S0(t)}.
The relationship between the survival functions is then estimated via the regression formula
V =Í+ËW.
The above can be rearranged to provided a solution forS1(t) in terms ofS0(t) such that:
S1(t) =inv.logit
;Í+logit(S0(t))(1 +Ë) 1≠Ë
<
.
RecallingS(t) = (1 + (t))≠1 and noting thatinv.logit(x) = (1 + exp(≠x))≠1, re-write the above as
1(t) = exp;≠Í+logit(1/(1 + 0))(1 +Ë) (1≠Ë)
<
.
Lastly note thatlogit{1/(1 + 0)}=≠log( 0) and rearrange to (t) = exp;≠ Í
(1≠Ë)
<exp;log( 0(t))(1 +Ë) (1≠Ë)
<
. (4.6)
From (4.6) define
1(t) =Ê{ 0(t)}›
whereÊ= exp{≠Í/(1≠Ë)}and›= 1+Ë1≠Ë. It follows that when measuring the difference between two levels of a covariate in terms of their relative survival odds, an asymmetric (or non-proportional) model can be formulated in terms of a divergence parameter Ê and an asymmetry parameter›.
4.5.2 Illustration of the parameter of asymmetry
To simplify the notation, define„= exp{—TZ}as a function of covariates that defines the departure away fromH0 due toZ, and “ = exp{–TU}as a function of asymmetry due to U. Model (4.1) becomes
S(t|Z, U) = {„(H0(t)“)}.
Here, consider„to be a parameter which measures the divergence due to parame- ters for Z. Further,“ acts on the survival/hazard function with values of“ >1 resulting in greater divergence at lower probabilities and values of “ < 1 giving greater diver- gence at higher probabilities. Here the dispersion parameter can be thought of acting proportionally on an adjusted cumulative hazard function. Under the special case of
“ = 1, „ is interpreted as the standard proportional hazards/odds parameter if the transformations as given by [64] are followed.
Illustration of the behaviour of the model parameters when modelling a single two- level covariate is illustrated via the PP-plot. This method has the advantage that it does not require time to be included on the plot, this may be particularly attractive as the Cox proportional hazard method for estimation of covariates does not directly include time either. Figures 4.7, 4.8 and 4.9 demonstrate traditional Kaplan Meier plots of survival functions against the PP-plot so that the reader may make direct comparisons.
Figure 4.6 shows the behaviour of„and“ are illustrated for both the special cases of the proportional hazards and proportional odds model. In each plot, the diagonal is refered to as the null line, as a curve that follows the diagonal would represents two identical survival functions. In both plots, the solid line in the upper triangle represents the standard proportional hazards/odds line. The upper triangle in each plot illustrates the effect of the asymmetry parameter given a fixed value for„. Conversely, the lower triangle shows the different relationships that can be modelled by fixing“ and allowing
„to vary. These plots illustrate the wide range of flexible models that can be achieved from the two parameters.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Hazards Models
Treatment 1
Treatment 2
φ = 0.5 γ = 1 γ = 0.5 γ = 1.5
γ = 0.5 φ = 2 φ = 1.5 φ = 3
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Odds Models
Treatment 1
Treatment 2
φ = 0.33 γ = 1 γ = 0.5 γ = 1.5
γ = 1.5 φ = 4 φ = 2 φ = 8
Figure 4.6: Figure to illustrate the flexibility of proportional hazards and proportional odds models with the inclusion of asymmetry parameters.
It is shown from Figure 4.6 that value of„”= 1 has the effect of dragging the fitted model away from the null diagonal line, but only with a value of“ = 1 is proportionality attained. With a value of“ ”= 1, „can no longer be regarded as a hazard ratio (or an odds ratio) and thus „ is referred to as a dispersion parameter. Here in the presence of non-proportionality, „ can still be interpreted as a parameter which measures the magnitude of the overall difference between two treatments in a similar fashion to a hazard ratio. The effect of the asymmetry parameter“ is also illustrated with values of
“ >1 resulting in greater divergence at higher probabilities and values of“ <1 giving a greater divergence at lower probabilities.
Note here only the special cases of the proportional hazards and proportional odds models are included. A wider range of models can be achieved by allowing some model between a hazards model or an odds model using either the Box-Cox or logarithmic class of functions as described by Chen [64].