Example of the CHW Method

3.3.3 Methods Based on Combination Tests

3.3.3.3 Example of the CHW Method

Let us consider a hypothetical two-arm trial with equal allocation to be designed to test the efficacy of an experimental new treatment

compared to a control, which is also the standard of care. The primary endpoint is a binary outcome with a higher rate indicating improvement of the disease state. We denote as the rate of response in the

treatment and as the rate of response in the control arm. The control treatment is based on findings from previous studies and is expected to have a response rate of 9%, while it is hypothesized that the new

experimental treatment is expected to increase the response by 40%.

Though an improvement of 40% is very desirable, in practice, even an improvement of 32% has clinical value. Hence, the sponsor would like to have an insurance policy to cover for improvement in range of clinical relevance.

While designing this study, we consider a few design options including a classical design (fixed sample size), a group sequential design, and an adaptive design with sample size re-estimation. There are pros and

cons for each of the design options; however, we will specifically explore scenarios under which adaptive design with sample size re-estimation would be beneficial.

Classical Design

For a classical design with a single look, a total sample size of 2,326 subjects are needed to achieve 80% power to detect a 40%

improvement for a one-sided =0.025. If the improvement is around 32% then this trial would be underpowered with the power drop as low as 62% if the true improvement is 32%, while if the total sample size is increased to 3,532 and if the true improvement is 40% the trial is

overpowered with power around 93%. Power for Classical Design shows the sample size and the power in the range of scenarios from a clinically meaningful improvement of 32% to a desirable improvement of 40%.

Table 3.1 Power for Classical Design

True Percent Improvement

Optimistic Improvement ( )

=2326

Pessimistic

Improvement ( )

=3532

32% 62% 80%

34% 67% 84%

36% 72% 88%

38% 76% 91%

40% 80% 93%

There are some ways to deal with this situation:

1. Power the trial conservatively at 32% with a classical design and protect the trial in case the true percent improvement is as 32%, or

2. Power the trial conservatively at 32% with a group sequential design giving the option to decrease the average sample size through realistic futility or efficacy stopping boundaries, or

3. Start with the optimistic scenario of 40% and increase the sample size if the futility or efficacy boundary is not crossed.

Group Sequential Design

While considering the flexible designs, group sequential is the first natural consideration for any pivotal trial due to regulatory comfort in these designs. We consider a GSD with one interim look to detect an improvement of 32% with a possibility of stopping early for efficacy or futility. Group sequential designs can be a very safe and attractive option. However, the study population, the time at which the endpoint is observed, and the enrollment rate should be taken into consideration. In other words, the saving in the potential duration for the study should be balanced with the saving in the sample size. In our hypothetical example, we perform an interim analysis when approximately 50% of the subjects have observed the primary endpoint. We now choose the error spending function to evaluate the interim look. In most cases, the type I error or type II error is spent very conservatively in the early look so that a decision of efficacy or futility is triggered only if the results are compelling towards or against the new treatment.

The stopping boundaries can be chosen in several different ways and should be aligned to the project objectives. In this example, we choose the interim futility boundary such that the study can be stopped at interim if there is no difference between control and the test treatment, i.e., no improvement in the condition while the interim efficacy boundary is chosen such that the trial can be stopped at the interim if there is overwhelming efficacy. To achieve this objective, Example Code 3.1 Sequential Design with Efficacy and Futility Boundaries uses an error spending function using the conservative O’Brien-Fleming-like boundary to control type I error and an error spending function using a gamma boundary with to control type II error. In the program, proc seqdesign

uses the method (alpha)=errfuncobf option for the type I error and the

method(beta)=errfuncgamma(gamma=-4) option for the type II error. Hwang, Shih, and DeCani have shown that error spending is similar to using the conservative O’Brien Fleming boundary [31,32]. The option

stop=both(betaboundary=nonbinding) gives a non-binding futility boundary. The non-binding futility means that the trial may be continued even if the futility boundary is crossed. There is a slight penalty that needs to be paid at the final analysis if the futility rule is non-binding as compared to the binding rule. The penalty is usually very negligible ; for instance using the above boundaries, the final critical value for the non-binding futility would be 1.969 while it would be 1.962 for a binding futility.

Example Code 3.1 Sequential Design with Efficacy and Futility Boundaries proc seqdesign altref=0.0288 errspend;

OneSidedErrorSpending: design nstages=2 method(alpha)=errfuncobf

method(beta)=errfuncgamma(gamma=-4)

alt=upper stop=both(betaboundary=nonbinding) alpha=0.025 beta=0.20;

samplesize model=twosamplefreq(nullprop=0.09 test=prop);

ods output Boundary=Bnd_Prop;

run;

Here, the sample size is presented based on completers who are available for the primary endpoint evaluation. However, sample size savings should also consider overruns, those patients who are enrolled between the time the last patient is enrolled in the first stage until the completion of the interim analysis. The group sequential design does save expected sample size; however, for populations and indications where the study enrolls very rapidly or if the endpoint is observed after a delay, then the potential benefit of the interim analysis is severely

diminished. Group Sequential Design with Improvement of 32%

summarizes the operating characteristics for the GSD with an initial total sample size 3,532.

Table 3.2 Group Sequential Design with Improvement of 32%

True Percent Improvement

Probability of Early Efficacy Stopping

Probability of Early Futility Stopping

Average Sample Size

Overall Power Under Under Under Under Under Under

32% 0.0016 0.160 0.518 0.026 2614 3205 80.0%

34% 0.0015 0.190 0.519 0.019 2611 3162 83.9%

36% 0.0017 0.224 0.522 0.014 2607 3111 87.6%

38% 0.0017 0.261 0.521 0.012 2606 3051 90.6%

40% 0.0017 0.297 0.522 0.008 2605 2992 93.1%

Based on Group Sequential Design with Improvement of 32%, this group sequential design has decent properties in that it has a low probability of stopping the trial for efficacy under the null hypothesis, while there is more than a 50% probability of stopping for futility if there

is no improvement. It is recommended that readers assess different stopping boundaries to ensure desirable operating characteristics in order to incorporate program needs.

Adaptive Design with Unblinded Sample Size Re-estimation In this design there is an opportunity to start with the optimistic total sample size of 2,326 and invest adaptively if the trial is not stopped for futility or efficacy. In this specific example, neither of the classical

designs give us any flexibility. For example, in the first case when the improvement is 40%, a trial with 2,326 would lead to an underpowered trial if the true improvement is 32%, while the trial with 3,532 subjects would be overpowered if the true improvement is 32%. As discussed earlier, the GSD suffers from a large initial commitment of cost, and not sufficient gain if the trial recruits rapidly with a delayed endpoint. Some of these drawbacks encountered by a GSD can be mitigated by an adaptive design with sample size re-estimation. It provides an

opportunity to start with an initial sample size of 2,326 subjects as in the optimistic scenario (design delta of 40%) of the classical design and adapt at an interim analysis. The interim analysis will be conducted when about 50% of the subjects complete the induction period.

At the interim analysis, one of the following decisions would be made:

1. Trial would be stopped for futility ( ).

2. Trial would be stopped for efficacy ( ), or

3. Trial would continue with a sample size re-estimation if neither futility nor efficacy is met ( ).

In this example, as done in GSDs, we choose the interim futility

boundary such that the study can be stopped at the interim if there is no observed difference between control and the test treatment. For

example, no improvement in the condition while the interim efficacy boundary is chosen such that the trial can be stopped at the interim if the improvement (i.e. the interim estimate) is at least 40%. If the interim statistic is between and ; then the sample size would be

increased using the interim estimate up to a maximum total sample size

of 5,000 subjects. The type I error control at the final analysis would be done using the weighted -statistic as outlined in Cui, Hung, and Wang [22]. Sample Size Re-Estimation summarizes the results for this design with an initial total sample size of 2,326 i.e. .

Table 3.3 Sample Size Re-Estimation

True Percent Improvement

Probability of Early Efficacy Stopping

Probability of Early Futility Stopping

Average Sample Size

Overall Power Under Under Under Under Under Under

32% 0.0017 0.087 0.523 0.059 2870 3377 80.0%

34% 0.0017 0.102 0.523 0.046 2869 3293 83.2%

36% 0.0023 0.119 0.526 0.038 2856 3196 86.5%

38% 0.0023 0.137 0.526 0.031 2856 3101 88.9%

40% 0.0023 0.158 0.526 0.0257 2855 2994 91.3%

Based on Group Sequential Design with Improvement of 32% and

Sample Size Re-Estimation, the properties are similar in that the saving of sample size for the GSD is through the average sample size.

However, there is an initial upfront commitment. The sample size re- estimation gives a little more flexibility with a smaller initial sample size commitment. The properties of the adaptive sample size re-estimation can be increased further through the use of promising zone approach, which is discussed in the subsequent sections. Group Sequential Design with Improvement of 32% and Sample Size Re-Estimation are

generated using SAS macro %SSR_EFF in Example Code 3.2 SAS Macro SSR_EFF.

Below we briefly summarize the macro. The initial macro calls are the following:

1. nSamp = number of simulations 2. alpha =desired level

3. beta = desired level

4. cont = control response to be simulated 5. trt = treatment response to be simulated 6. pc = assumed control response

7. ptrt = assumed treatment response

8. delta = difference between treatment and control response 9. Nmax = maximum sample size to be committed

10. r = randomization ratio of treatment and control 11. t = timing of interim analysis

12. c1 = efficacy boundary at interim analysis in the standardized scale 13. c2 = efficacy boundary at the final analysis in the standardized

scale

14. b1 = futility boundary at the interim analysis in the standardized scale

In the %SSR_EFF macro, Nmax is set to initial total sample size of 3,532 for the group sequential design, while it is set to a total sample size of 5,000 for the sample size re-estimation. Download sample code from the

companion site to see explicit macro calls to reproduce Group Sequential Design with Improvement of 32% and Sample Size Re- Estimation.

Example Code 3.2 SAS Macro SSR_EFF

%macro SSR_EFF(nSamp=,alpha=,beta=, cont=, trt=,pc=,ptrt=, delta=, Nmax=, r=,t=, c1=, c2=,b1=,titl=);

data SSR_EFF;

/*** r= randomization allocation ***/

/*** total sample size by n and total numbers of subjects in each group by n1 = (1 - r)n and n2 = rn so that the fraction of subjects the active arm is r and total sample size is **/

eSize=abs((&delta.)/((&pc.*(1-&pc.)+&ptrt.*(1-&ptrt.))/2)**0.5);

nFixed=2*ceil(2*((probit(1-&alpha.)+probit(1-&beta.))/eSize)**2);

/** Total Sample Size at the Interim Analysis**/

n1=ceil(&t.*nfixed);

n2=nFixed-n1;

c_seed1=1736; t_seed1=6214;

c_seed2=7869; t_seed2=9189;

do i=1 To &nSamp;

n11=round((1-&r.)*n1);

n12=round(&r.*n1);

cont1=Ranbin(c_seed1,N11,&cont.)/n11;

trt1=Ranbin(t_seed1,N12,&trt.)/n12;

deltahat1=trt1-cont1;

pbar1=(cont1*n11+trt1*n12)/(n11 + n12);

se1=sqrt(pbar1*(1-pbar1)*(1/n11+1/n12));

z1=deltahat1/se1;

improve=((trt1-cont1)/cont1)*100;

rejectho=0;estop=0;fstop=0;power=0;

if Z1 > &c1. then do;

rejectho=1;

ESTOP=1;

nfinal=n1;

end;

if Z1 < &b1. then do;

rejectho=0;

FSTOP=1;

nfinal=n1;

end;

if &b1. =< Z1 < &c1. then do;

eRatio=abs(&delta/(abs(deltahat1)+0.0000001));

n_adj=(eRatio**2)*nfixed;

nFinal=Min(&Nmax,Max(nfixed,n_adj));

end;

w1=sqrt(n1/(n1+n2));

w2=sqrt(n2/(n1+n2));

/***** Simulate Data for Stage II *******/;

cont2=.;

trt2=.;

z2=.;

zchw=z1;

if nfinal > n1 then do;

n21=round((1-&r.)*(nfinal-n1));

n22=round(&r.*(nfinal-n1));

n2=n21+n22;

cont2=Ranbin(c_seed1,N21,&cont.)/n21;

trt2=Ranbin(t_seed1,N22,&trt.)/n22;

deltahat2=trt2-cont2;

pbar2=(cont2*n21+trt2*n22)/(n21 + n22);

se2=sqrt(pbar2*(1-pbar2)*(1/n21+1/n22));

z2=deltahat2/se2;

ZCHW=w1*Z1+w2*Z2;

end;

if ZCHW > &c2. then rejectho=1;

else rejectho=rejectho;

output;

end;

run;

title "&titl";

proc means data=SSR_EFF;

var rejectho nfinal estop fstop nfixed;

run;

%SSR_EFF(nSamp=100000,alpha=0.025,beta=0.2,cont=0.1188, trt=0.1188,pc=0.09,ptrt=0.1188,delta=0.0288,Nmax=3532, r=0.5,t=0.5, c1=2.963, c2=1.969,b1=0.011,

titl=GSD with 32% Improvement under H0);

Evolution of Clinical Trials and the Emergence

Examples of Classical Fixed-Sample Designs