Here a method is proposed for the estimation of baseline hazard parameters where only summary information is available.
Assume initially that prior information, Dp = (Ëp, tp), is available which takes the form of survival probabilitiesËp at associated time points tp. This information could
be taken from obtained datasets, published material such as Kaplan Meier curves or derived from expert opinion.
It is further assumed at the design stage that the data will be modelled using one of the parametric family of models, i.e. a model which includes some parametric de- scription of the baseline hazard function as well a hazard ratio upon which the trial will be assessed. Methodology is presented here specific to the piecewise exponential model as this provides a flexible modelling approach with practical applications. Adaption to other parametric forms are not presented here but can be easily obtained.
Assume a PEM and that previous data,Dp, are available in the form of a set of sur- vival probabilities{ p}along with associated time points{tp}. This information could be taken from obtained datasets, published material or derived from expert opinion.
GivenDp, the objective is to convert the prior survival probabilities { p}into survival probabilities,{„j}, at time points corresponding to those in the time grid used in the analysis of the PEM{aj}. Estimates of{„j}could be obtained via simple linear inter- polation or by fitting a spline model with knots Ÿ such that p =f(tp,Ÿ); estimates of „j can be taken as the fitted values of the spline function at the partitions of the time grid. It is the structure { j} and the time-grid {aj} which are used to estimate {“j}, the hyper parameters which define the point estimates of the prior distributions P r(⁄).
From the definition of the PEM given by 2.2, a survival function is defined as S(t) = exp;≠Ë⁄j(t≠aj≠1) +jÿ≠1
g=1
⁄g(ag≠ag≠1)È<.
Replacing ⁄j with“j and evaluating the survival function at partitions in the time grid,
„j = exp{≠“j(aj≠aj≠1)}„j≠1 (7.1) is obtained. Consequently point estimates for prior distributions on a baseline hazard function are derived via
“j =≠log(„„j≠1j ) aj≠aj≠1.
A graphical illustration of the process of obtaining point estimates for the prior densities is given in Figure 7.1.
The full form of the prior distributions depend on whether hazard ratios are mea- sured on the standard or log scale. Including baseline hazard parameters on a nomi- nal scale, priors are set using a Gamma distribution for each “j individually such as
≥(÷j,’j) which is constrained by“j =÷j/’j. For practical and computational con- venience, throughout this thesis all baseline hazard parameters are defined on the log scale andÈ= log(“) defined. Prior distributions are defined such that
●
●
●
●
●
●
●
●
● ●
0 5 10 15 20 25
0.00.20.40.60.81.0
t
S(t)
a)
0 5 10 15 20 25
0.00.20.40.60.81.0
t
S(t)
b)
0 5 10 15 20 25
0.00.20.40.60.81.0
t
S(t)
●
●
●
●
●
●
c)
0 5 10 15 20 25
0.00.20.40.60.81.0
t
S(t)
d)
Figure 7.1: Figure to illustrate the process of deriving parameters for informative prior distributions on a baseline hazard function. Figure a): the prior estimates of survival probabilities and associated times are obtained. Figure b): a spline function fitted to the prior estimates. Figure c): data are observed (rug plot) and the time grid is set.
Figure d): prior parameter estimates“ are obtained and the resulting piecewise model estimate is given.
P r(log(⁄))≥M V N(È, ). (3) Here is a j◊j covariance matrix with elements fli,j which quantifies the degree of confidence in the prior parameters. It is assumed a-priori that all baseline hazard parameters are independent and set fli,j = 0 if i”= j. This is not to assume that the baseline hazard parameters themselves are independent, only that there is no prior knowledge of any correlation between prior parameters. Here this assumption is made for convenience but may be relaxed by assuming structured correlation structures. Full definitions of the diagonal elements are non trivial and discussed further in Section 7.3.2
7.3.1 Prior precision for the baseline hazard function
A particular challenge in this approach is to set the precision of the prior distribu- tions. Fully data dependent methods such as those proposed by Neuenschwander [205]
are based on predictive distributions which allow for both between and within study variability. Deriving prior probabilities from summary information however makes it difficult to obtain reliable estimates of both within and between study variability.
In deriving prior distributions from summary information, clinicians and statisti- cians need to determine to what degree a future trial can be considered relative to historical information. Whilst this may be no easy task, this does offer an advantage over fully data dependent approaches as prior distributions can be amended to reflect the degree to which future data are believed to relate to historical information, or to what extent any scepticism over the validity of historical information exists. For example if the historical information is taken from data collected a number of years previously, clinicians may wish to inflate prior variability to account for the fact that medical standards may have progressed.
Two approaches are explored here, a graphical approach and an approach based on the effective number of events. Both approaches are adapted to weight more prior information on the earlier partitions as these prior survival estimates are expected to be more reliable.
For the graphical approach, each of the j diagonal elements of the matrix is defined byfljj =cẻj where c is an overall level of variability andẻis a variance inflation function. In practice, appropriate values ofc and ẻ will be dependent on the amount of data being analysed and the time-grid that is set and it is for this reason that a graphical approach is proposed. Examples of prior survival functions are given in Figure 7.3. These figures are obtained by taking samples from the design prior distributions based on informative priors and converting into survival functions using equation (5.2).
Figures such as these can also be useful in explaining to medical professional how the prior information is likely to impact on future data analysis.
As an example, useful functions of ẻ may be to increases from one to two by equal steps over the J categories. Other approaches are applicable, for example let- ting ẻ = 1/ễaj≠1, which has the property of allowing prior precision to decrease at a rate proportional to the assumed survival function.
A more practical approach may be to determine the prior information in the number of effective events. For example, given a prior baseline function, a clinician may state that they want this function to have the equivalent effect of 20 events in an upcoming trial.
Following this approach, consider that for each individual interval in the PEM, the hazard rate parameter can be considered to follow a gamma distribution with parameters ÷j and ’j, where÷j can be taken as the number of events observed within an interval and’j is the patient time at risk. This distribution has mean given by
“j =÷j’j≠1
which is estimated using (7.1). As prior distributions are defined on the log scale, variability aboutlog(“j) can be obtained via the delta method such that
V ar(log“) =V ar(“)[(log“)Õ]2
=÷’≠2(÷’)2
=÷≠1 (7.2)
where Õ represents the first derivative. Denoting hazard rates on the log scale therefore shows that variability about a baseline hazard rate can be determined using only the number of events within each interval. Given that a hazard rate and an effective number of events (Ep) has been pre-specified therefore, the prior variability for each partition÷j can be defined as
÷j = Ep(„j+1+„j) 2qJi=1„i . .
This again weights the available events so that more prior events are attributed to earlier time partitions as these are the partitions upon which estimates are considered to have greater reliability. For the remainder of this paper, prior distributions are formed based on the effective number of events approach.
It is lastly noted that it may be reasonable in practice to set more than one set of prior distributions. Ultimately however, a single scenario may need to be defined on which a future trial will be assessed.
7.3.2 Definition of the time grid
A brief note is included here to highlight that the time-grid, which is generally assumed fixed for the PEM will play an important role in both the derivation of prior parameters and the analysis of the data. It is possible to have one time-grid that is responsible for deriving prior distributions, perhaps in a design setting, and a second time-grid which is data dependent and is used for analysing trial data. One complication in this approach however is that prior distributions based on the ‘design’ time-grid would have to be amended in light of the ‘analysis’ time-grid.
Whilst this approach is both feasible and in some respect desirable as the ‘design’
time-grid may be sub-optimal for analysis of trial data, it does introduce an extra layer of complexity. For this reason, a fixed time-grid as proposed by Kalbfleisch is used throughout the remainder of this thesis for both the design and analysis of trial data.