In most clinical trials the group of patients (or sample) who participate is just a small por- tion of a heterogeneous patient population with the intended disease. As indicated earlier, a well-controlled randomized clinical trial is necessary to provide an unbiased and valid assessment of the study medicine. A well-controlled randomized trial is conducted under well-controlled experimental conditions, which are usually very different from a physi- cian’s best clinical practice. Therefore it is a concern whether the clinical results observed from the well-controlled randomized clinical trial can be applied on the patient population with the disease. As a result the feasibility and generalization of well-controlled random- ized trials have become an important issue in public health (Rubins, 1994). For illustration purposes, consider the following two examples.
In early 1970s, a high cholesterol level was known to be a risk factor for developing coronary heart disease. To confirm this, a trial known as the Lipids Research Clinics Coro- nary Primary Prevention Trials (CPPT) was initiated by the National Heart, Lung, and Blood Institute to test the hypothesis whether lowering cholesterol can prevent the devel- opment of coronary heart disease. In the CPPT trial, a total of 4000 healthy, middle-age males were randomized to receive either the cholesterol-lowering agent cholestyramine or its matching placebo (Lipids Research Clinics Program, 1984). The primary endpoint was the incidence of coronary heart disease after a seven-year follow-up. A statistically signifi- cant reduction of 1.7% in 7-year incidence of coronary heart diseases was observed for the cholestyramine group as compared to the placebo (8.1% versus 9.8%). An expert panel recommended to extrapolate the results for the treatment of high cholesterol in populations that had never been studied and whose benefit has not yet been demonstrated (The Expert Panel, 1989; Recommendations for the Treatment of Hypercholesterolemia, 1984). Moore (1989), however, raises a serious doubt regarding the expert panel’s recommendation for the treatment of patients with high cholesterol levels. Moore points out that the CPPT trial was conducted on middle-age males which cannot be applied to a general patient popula- tion with hypercholesterolemia. Another example concerning the generation of controlled randomized trials is the U.S. Physician’s Health Study described earlier. The question is whether the benefit regarding fatal and nonfatal coronary heart disease, which was observed using 22,000 highly educated males aged over 40 years old, can also be observed in an average individual regardless of gender, race education, and socioeconomic back- ground. This question is indeed a tough one to answer. We can address the question in part by performing a subgroup analysis with respect to the composition of the patients in the trial. This study led to the United States Congress passing legislation (National Institute of Health Reauthorization Bill, 1993) which requires the specification of the composition of any human studies sponsored by the NIH. More detail can be found in Wittes (1994).
GENERALIZATION OF CONTROLLED RANDOMIZED TRIALS 155 One way to ensure the generalization of controlled randomized trials is to understand the process for drawing statistical and clinical inference. Basically statistical and clinical inference for the generalization of results obtained from clinical trials to other patients is a two-step process. The first step is to internallyapply the statistical and clinical inference on the targeted population to other patients within the population. The second step is to externallygeneralize the statistical/clinical inference made on the targeted population to another patient population with different characteristics. These steps involve the concept of population efficacy(or safety),individual efficacy(or safety), reproducibility and general- izability which will be illustrated below.
Note that the current conduct of clinical trials is to compare the difference in distribu- tions of the clinical responses observed from patients under a test therapy and a standard (or reference) therapy or a placebo. This concept is referred to as population efficacy (or safety).
Suppose that the distribution of a clinical response can be adequately described by a normal probability distribution. Then the population efficacy can be assessed through the compari- son of the first two moments of the distributions between the test and the reference thera- pies. This is because a normal distribution is uniquely determined by its first two moments.
The comparison of the first moment of the efficacy endpoints for the two therapies is usually referred to as average efficacy, while the comparison of the second moments is called the variability of efficacy.To provide a better understanding of average efficacy and variability of efficacy, the comparison in averages and variabilities are illustrated in Figures 4.5.1 through 4.5.3. For example, to compare the reduction in diastolic blood pressure for evalua- tion of a new antihypertensive agent against a placebo, Figure 4.5.1 shows that the two distributions are very close in both average and variability, which indicates that there is no difference in average and variability of the reduction of diastolic blood pressure. There- fore the new agent may not be efficacious. On the other hand, Figure 4.5.2 demonstrates that the new agent is more effective in reducing blood pressure. Note that in most clinical trials with continuous primary endpoints, the objectives are often formulated as hypotheses for testing the average efficacy. As a result, the population efficacy of the new therapy is often
Figure 4.5.1 Population efficacy in averages and variabilities.
assessed through the average efficacy under the assumption of equal variability of efficacy.
This assumption, which should be verified, is often ignored by both clinicians and biosta- tisticians. As illustrated in Figure 4.5.3, it is not uncommon that the new agent shows a better efficacy than the placebo and yet exhibits a much larger variability. Since the large variability of the new agent may cause a safety concern, it is recommended that the possible causes of the large variability be carefully examined. A large variability may be due to diff- erences in the composition of patients such as biological variation between two populations.
This will certainly have an impact on the generalization of the results to other populations.
For population efficacy (or safety), we might first generalize the results to similar but slightly different populations and then, in stages, to much different populations. This con- cept of generalization is illustrated in Figure 4.5.4 as similarity circles. The strength of the Figure 4.5.2 Population efficacy in averages and variabilities. Unequal averages and equal variabilities.
Figure 4.5.3 Population efficacy in averages and variabilities. Superior average efficacy and unequal variabilities.
GENERALIZATION OF CONTROLLED RANDOMIZED TRIALS 157
generalization is assessed by the distance between any two points within the circle. Note that the distance is a measure of similarity beween populations, which is a function of fac- tors such as basic science, animal models, biological variation, and results from other types of studies.
Note that the establishment of population efficacy does not guarantee that the results can be generalized to a patient with his or her own biological and genetic makeup, educational status, and socioeconomic status who is cared by a particular physician at a different geo- graphical location. The reason is that the efficacy is not established within the patient. The concept for the comparison between the two distributions of the primary efficacy (or safety) endpoints obtained from the same patient under repeated administrations of the new agent and the reference is called individual efficacy(or safety). The concept of individual efficacy is not new and has been advocated by many clinical researchers. See, for example, Guyatt et al. (1986) and Sackett (1989). Guyatt et al. (1986) attempt to evaluate individual efficacy of theophylline through a N-of-1 randomized trial concerning a patient with asthma. The N-of-1 randomized trial was conducted based on the following assumptions and proce- dures: First, the patient and his or her attending physician determined symptoms such as shortness of breath on ordinarily daily activities, nocturnal spasms of dyspnea, and cough- ing as primary clinical responses for the treatment of theophylline. The patient agreed to record standardized measures of severity of these symptoms. It was also decided that a 10-day treatment would be long enough to evaluate the effectiveness of the treatments. The N-of-1 randomized trial was performed in a double-blind fashion with randomization of the order of treatments. At the end of each pair of treatment periods, the patient and physician met to examine the results (also in a blinded fashion) and decided whether to stop or to con- tinue another pair of treatments. After administration of two pairs of treatments in a blinded and random fashion, the analysis detected a statistically significant difference between the
Figure 4.5.4 Generalization of clinical results as similarity circle.
treatments. When the randomization codes were unblinded, it was found that the patient was better on placebo than theophylline.
Repeated administrations of the test and reference therapies within the same patients made the comparison between distributions of the primary clinical end-points within the same individual possible. If we perform this type of trial over Npatients in a similar manner, then a total of Npairs of distributions of the test and reference therapies can be generated.
Consequently both population and individual efficacy can be made based on these Npairs of distributions. First, within each individual, the individual average efficacy of the test therapy is assessed as the difference between averages of two distributions. In addition the individ- ual variability of efficacy can be evaluated as the ratio of individual intrapatient variabilities between the distributions obtained under the two treatments from the same patient. As a result individual efficacy can be evaluated by comparing averages and variability of the two distributions obtained from the same patient. Since the individual average efficacy and vari- ability of efficacy are obtained from all Npatients, we can perform a statistical test to see whether the individual average efficacy and individual intrapatient variability are homoge- neous across these Npatients. The concept of homogeneity of individual average efficacy and individual intrapatient variability is referred to as patient-by-treatment interaction for average and variability, respectively. A patient-by-treatment interaction implies that the rel- ative efficacy of the test therapy varies from patient to patient. Therefore, if a patient-by- treatment interaction is found, then the relative efficacy of the test agent must be assessed individually for each patient, that is, the individual efficacy. On the other hand, if the relative efficacy is not heterogeneous, then the information of individual average efficacy and indi- vidual intrasubject variability can be combined over Npatients to provide a basis for popu- lation efficacy. The concepts of population and individual efficacy are motivated from population and individual bioequivalence (e.g., see Chow and Liu, 1995b, 2000); they are important concepts for evaluation of bioequivalence between a brand-name drug product and its generic copies. However, the concept of individual efficacy (safety) has not been accepted by nor has convinced the clinical/medical community. Guyatt et al. (1986) point out that the limitations of individual efficacy include (1) it cannot be applied to a disease that can be cured in a short period of time, and (2) it cannot be assessed with the hard clinical endpoints such as death or other irreversible condition indicators.