Under the klieg lights of a packed hearing room at the FDA, an advisory panel picked by the agency’sCenter for Drugs and Biologics declined to recommend approval of TPA, a drug that diss
Trang 1794 PERSONAL POSTSCRIPTtested against a placebo; later, citing ethical reasons, the researchers dropped the placebo and nowall heart patients in the TIMI trial are receiving TPA.
It is for these reasons that we call TPA the most noteworthy unavailable drug in the U.S The FDAmay believe it is already moving faster than usual with the manufacturer’s new-drug application
Nonetheless, bureaucratic progress [sic] must be measured against the real-world costs of keeping
this substance out of the nation’s emergency rooms The personal, social and economic consequences
of heart disease in this country are immense The American Heart Association estimates the totalcosts of providing medical services for all cardiovascular disease at $71 billion annually
By now more than 4,000 patients have been treated with TPA in clinical trials With well over athousand Americans going to their deaths each day from heart attack, it is hard to see what additionaldata can justify the government’s further delay in making a decision about this drug If tomorrow’smeeting of the FDA’s cardio-renal advisory committee only results in more temporizing, some inCongress or at the White House should get on the phone and demand that the American public begiven a reason for this delay
The publicity before the meeting of the advisory committee was quite unusual since nies are prohibited from preapproval advertising; thus the impetus presumably came from other sources.
compa-The cardiorenal advisory committee members met and considered the two thrombolytic drugs, streptokinase and tPA They voted to recommend approval of streptokinase but felt that further data were needed before tPA could be approved The reactions to the decision were extreme, but probably predictable given the positions expressed prior to the meeting.
The Wall Street Journal responded with an editorial on Tuesday, June 2, 1987, entitled
“Human Sacrifice.” It follows in its entirety:
Last Friday an advisory panel of the Food and Drug Administration decided to sacrifice thousands
of American lives on an altar of pedantry
Under the klieg lights of a packed hearing room at the FDA, an advisory panel picked by the agency’sCenter for Drugs and Biologics declined to recommend approval of TPA, a drug that dissolves bloodclots after heart attacks In a 1985 multicenter study conducted by the U.S National Heart, Lung andBlood Institute, TPA was so conclusively effective at this that the trial was stopped The decision towithhold it from patients should be properly viewed as throwing U.S medical research into a majorcrisis
Heart disease dwarfs all other causes of death in the industrialized world, with some 500,000 icans killed annually; by comparison, some 20,000 have died of AIDS More than a thousand livesare being destroyed by heart attacks every day In turning down treatment with TPA, the committeedidn’t dispute that TPA breaks up the blood clots impeding blood flow to the heart But the commit-tee asked that Genentech, which makes the genetically engineered drug, collect some more mortalitydata Its submission didn’t include enough statistics to prove to the panel that dissolving blood clotsactually helps people with heart attacks
Amer-Yet on Friday, the panel also approved a new procedure for streptokinase, the less effective clotdissolver—or thrombolytic agent—currently in use Streptokinase previously had been approved foruse in an expensive, specialized procedure called intracoronary infusion An Italian study, involving11,712 randomized heart patients at 176 coronary-care units in 1984–1985, concluded that adminis-tering streptokinase intravenously reduced deaths by 18% So the advisory panel decided to approveintravenous streptokinase, but not approve the superior thrombolytic TPA This is absurd
Indeed, the panel’s suggestion that it is necessary to establish the efficacy of thrombolysis stunnedspecialists in heart disease Asked about the committee’s justification for its decision, Dr EugeneBraunwald, chairman of Harvard Medical School’s department of medicine, told us: “The real ques-tion is, do you accept the proposition that the proximate cause of a heart attack is a blood clot
in the coronary artery? The evidence is overwhelming, overwhelming It is sound, basic medical
knowledge It is in every textbook of medicine It has been firmly established in the past decadebeyond any reasonable question If you accept the fact that a drug [TPA] is twice as effective as
Trang 2SCIENCE, REGULATION, AND THE STOCK MARKET 795
streptokinase in opening closed vessels, and has a good safety profile, then I find it baffling howthat drug was not recommended for approval.”
Patients will die who would otherwise live longer Medical research has allowed statistics to becomethe supreme judge of its inventions The FDA, in particular its bureau of drugs under Robert Temple,has driven that system to its absurd extreme The system now serves itself first and people later.Data supersede the dying
The advisory panel’s suggestion that TPA’s sponsor conduct further mortality studies poses graveethical questions On the basis of what medicine already knows about TPA, what U.S doctor willgive a randomized placebo or even streptokinase? We’ll put it bluntly: Are American doctors going
to let people die to satisfy the bureau of drugs’ chi-square studies?
Friday’s TPA decision should finally alert policy makers in Washington and the medical-researchcommunity that the theories and practices now controlling drug approval in this country are sig-nificantly flawed and need to be rethought Something has gone grievously wrong in the FDAbureaucracy As an interim measure FDA Commissioner Frank Young, with Genentech’s assent,could approve TPA under the agency’s new experimental drug rules Better still, Dr Young shouldtake the matter in hand, repudiate the panel’s finding and force an immediate reconsideration More-over, it is about time Dr Young received the clear, public support of Health and Human ServicesSecretary Dr Otis Bowen in his efforts to fix the FDA
If on the other hand Drs Young and Bowen insist that the actions of bureaucrats are beyond challenge,then perhaps each of them should volunteer to personally administer the first randomized mortalitytrials of heart-attack victims receiving the TPA clot buster or nothing Alternatively, coronary-careunits receiving heart-attack victims might use a telephone hotline to ask Dr Temple to randomize thetrial himself by flipping a coin for each patient The gods of pedantry are demanding more sacrifice.Soon after joining the Cardiovascular and Renal Drugs Advisory Committee, L.F noticed that a number of people left the room at what seemed inappropriate times, near the end of some advisory deliberations I was informed that often, stock analysts with expertise in the pharmaceutical industry attended meetings about key drugs; when the analysts thought they knew how the vote was going to turn out, they went out to the phones to send instructions That
was the case during the tPA deliberations (and made it particularly appropriate that the Wall Street Journal take an interest in the result) Again we convey the effect of the deliberations
through quotations taken from the press On June 1, 1978, the Wall Street Journal had an article
under the heading “FDA Panel Rejection of Anti-Clot Drug Set Genentech Back Months, Perils Stock.” The article said in part:
A Food and Drug Administration advisory panel rejected licensing the medication TPA, spoilingthe summer debut of what was touted as biotechnology’s first billion-dollar drug Genentech’sstock—which reached a high in March of $64.50 following a 2-for-1 split—closed Friday at $48.25,off $2.75, in national over-the-counter trading, even before the close of the FDA panel hearingattended by more than 400 watchful analysts, scientists and competitors Some analysts expect theshares to drop today Wall Street bulls will also be rethinking their forecasts For example,Kidder Peabody & Co.’s Peter Drake, confident of TPA’s approval, last week predicted sales of $51million in the second half of 1987, rising steeply to $205 million in 1988, $490 million in 1989 and
Trang 3Mon-796 PERSONAL POSTSCRIPTgroup from buy to “unattractive.” His reasoning: The stocks are driven by “a blend of psychologyand product possibilities And right now, the psychology is terrible.”
Biotechnology stocks as a group dropped with the Genentech panel vote This seemed strange
to me because the panel had not indicated that the drug, tPA, was bad but only that in a number
of areas the data needed to be gathered and analyzed more appropriately (as described below) The panel was certainly not down on thrombolysis (as the streptokinase approval showed); it felt that the risk/benefit ratio of tPA needed to be clarified before approval could be made.
The advisory committee members replied to the Wall Street Journal editorials both
individ-ually and in groups, explaining the reasons for the decision [Borer, 1987; Kowey et al., 1988;
Fisher et al., 1987] This last response to the Wall Street Journal was submitted with the title “The
Prolongation of Human Life”; however, after the review of the article by the editor, the title was
changed by the Wall Street Journal to “The FDA Cardio-Renal Committee Replies.” The reply:
The evaluation and licensing of new drugs is a topic of legitimate concern to not only the medical
profession but our entire populace Thus it is appropriate when the media, such as the Wall Street
Journal, take an interest in these matters The Food and Drug Administration recognizes the publicinterest by holding open meetings of advisory committees that review material presented by phar-maceutical companies, listen to expert opinions, listen to public comment from the floor and thengive advice to the FDA The Cardiovascular and Renal Drugs Advisory Committee met on May 29
to consider two drugs to dissolve blood clots causing heart attacks The Journal published editorials
prior to the meeting (“The TPA Decision,” May 28) and after the meeting (“Human Sacrifice,” June
2 and “The Flat Earth Committee,” July 13) The second editorial began with the sentence: “LastFriday an advisory committee of the Food and Drug Administration decided to sacrifice thousands
of American lives on an altar of pedantry.” How can such decisions occur in our time? This reply
by members of the advisory panel presents another side to the story In part the reply is technical,although we have tried to simplify it We first discuss drug evaluation in general and then turn tothe specific issues involved in the evaluation of the thrombolytic drugs streptokinase and TPA.The history of medicine has numerous instances of well-meaning physicians giving drugs and treat-ments that were harmful rather than beneficial For example, the drug thalidomide was widelymarketed in many countries—and in West Germany without a prescription—in the late 1950s andearly 1960s The drug was considered a safe and effective sleeping pill and tranquilizer Marketingwas delayed in the U.S despite considerable pressure from the manufacturer upon the FDA Thedrug was subsequently shown to cause birth defects and thousands of babies world-wide were bornwith grotesque malformations, including seal-like appendages and lack of limbs The FDA physicianwho did not approve the drug in the U.S received an award from President Kennedy One can hardlyargue with the benefit of careful evaluation in this case We present this, not as a parallel to TPA, but
to point out that there are two sides to the approval coin—early approval of a good drug, with minimalsupporting data, looks wise in retrospect; early approval, with minimal supporting data, of a poor drugappears extremely unwise in retrospect Without adequate and well-controlled data one cannot distin-guish between the two cases Even with the best available data, drugs are sometimes found to haveadverse effects that were not anticipated Acceptance of unusually modest amounts of data, based
on assumptions and expectations rather than actual observation is very risky As will be explainedbelow, the committee concluded there were major gaps in the data available to evaluate TPA.The second editorial states that “Medical research has allowed statistics to become the supreme judge
of its inventions.” If this means that data are required, we agree; people evaluate new therapies withthe hope that they are effective—again, before licensing, proof of effectiveness and efficacy is needed
If the editorial meant that the TPA decision turned on some arcane mathematical issue, it is incorrect.Review of the transcript shows that statistical issues played no substantial role
We now turn to the drug of discussion, TPA Heart attacks are usually caused by a “blood clot in
an artery supplying the heart muscle with blood.” The editorial quotes Dr Eugene Braunwald, “Thereal question is, do you accept the proposition that the proximate cause of a heart attack is a bloodclot in the coronary artery?” We accept the statement, but there is still a significant question: “Whatcan one then do to benefit the victim?” It is not obvious that modifying the cause after the event
Trang 4SCIENCE, REGULATION, AND THE STOCK MARKET 797
occurs is in the patient’s best interest, especially when the intervention has toxicity of its own Bloodclots cause pulmonary embolism; it is the unusual patient who requires dissolution of the clot bystreptokinase Several trials show the benefit does not outweigh the risk
On May 29 the Cardiovascular and Renal Drugs Advisory Committee reviewed two drugs that
“dissolve” blood clots The drug streptokinase had been tested in a randomized clinical trial in Italyinvolving 11,806 patients The death rate in those treated with streptokinase was 18% lower than
in patients not given streptokinase; patients treated within six hours did even better Review of 10smaller studies, and early results of a large international study, also showed improved survival It isimportant to know that the 18% reduction in death rate is a reduction of a few percent of the patientsstudied The second drug considered—recombinant tissue plasminogen activator (TPA)—which alsowas clearly shown to dissolve blood clots, was not approved Why? At least five issues contributed,
to a greater or lesser amount, to the vote not to recommend approval for TPA at this time Theseissues were: the safety of the drug, the completeness and adequacy of the data presented, the dose
to be used, and the mechanism of action by which streptokinase (and hopefully TPA) saves lives.Safety was the first and most important issue concerning TPA Two formulations of TPA werestudied at various doses; the highest dose was 150 milligrams At this dose there was an unacceptableincidence of cerebral hemorrhage (that is, bleeding in the brain), in many case leading to both severestroke and death The incidence may be as high as 4% or as low as 1.5% to 2% (incomplete data
at the meeting made it difficult to be sure of the exact figure), but in either case it is disturbinglyhigh; this death rate due to side effects is of the same magnitude as the lives saved by streptokinase.This finding led the National Heart, Lung and Blood Institute to stop the 150-milligram treatment
in a clinical trial It is important to realize that this finding was unexpected, as TPA was thought to
be relatively unlikely to cause such bleeding Because of bleeding, the dose of TPA recommended
by Genentech was reduced to 100 milligrams The safety profile at doses of 100 milligrams looksbetter, but there were questions of exactly how many patients had been treated and evaluated fully.Relatively few patients getting this dose had been reported in full Without complete reports from thestudies there could be smaller strokes not reported and uncertainty as to how patients were examined.The committee felt a substantially larger database was needed to show safety
The TPA used to evaluate the drug was manufactured by two processes Early studies used thedouble-stranded (roller bottle) form of the drug; the sponsor then changed to a predominantly single-stranded form (suspension culture method) for marketing and production reasons The second drugdiffered from the first in how long the drug remained in the blood, in peak effect, in the effect onfibrinogen and in the dose needed to cause lysis of clots Much of the data was from the early form;these data were not considered very helpful with respect to the safety of the recommended dose ofthe suspension method drug This could perhaps be debated, but the intracranial bleeding makes theissue an important one The excessive bleeding may well prove to be a simple matter of excessivedose, but this is not yet known unequivocally
Data were incomplete in that many of the patients’ data had not been submitted yet and much of thedata came from treatment with TPA made by the early method of manufacture There was uncertaintyabout the data used to choose the 100-milligram dose, i.e., perhaps a lower dose is adequate Whenthere is a serious dose-related side effect it is crucial that the dose needed for effectiveness has beenwell-defined and has acceptable toxicity
Let us turn to the mechanism of action, the means by which the beneficial effect occurs Theremay be a number of mechanisms The most compelling is clot lysis (dissolution) However, expertspresented data that streptokinase changes the viscosity of the blood that could improve the bloodflow; the importance is uncertain Streptokinase also lowers blood pressure, which may decreasetissue damage during a heart attack While there is convincing evidence that TPA (at least by thefirst method of manufacture) dissolves clots faster than streptokinase (at least after a few hours fromthe onset of the heart attack), we do not have adequate knowledge to know what portion of thebenefit of streptokinase comes from dissolving the clot TPA, thus, may differ in its effect on theheart or on survival The drugs could differ in other respects, such as how often after opening avessel they allow reclosure, and, of course, the frequency of important adverse effects
These issues delay possible approval Fortunately, more data are being collected It is our sincerehope that the drug lives up to its promise, but should the drug prove as valuable as hoped, that would
Trang 5798 PERSONAL POSTSCRIPTnot imply the decision was wrong The decision must be evaluated as part of the overall process ofdrug approval.
The second editorial suggests that if the drug is not approved, Dr Temple (director of the Bureau
of Drugs, FDA), Dr Young (FDA commissioner) and Dr Bowen (secretary of health and humanservices) should administer “randomized mortality trials of heart-attack victims receiving the TPA clotbuster or nothing.” This indignant rhetoric seems inappropriate on several counts First, the advisorycommittee has no FDA members; our votes are independent and in the past, on occasion, we havevoted against the FDA’s position It is particularly inappropriate to criticize Drs Temple and Youngfor the action of an independent group The decision (by a vote of eight against approval, one for andtwo abstaining) was made by an independent panel of experts in cardiovascular medicine and researchfrom excellent institutions These unbiased experts reviewed the data presented and arrived at thisdecision; the FDA deserves no credit or blame Second, we recommend approval of streptokinase;
we are convinced that the drug saves lives of heart-attack victims (at least in the short term) To us itwould be questionable to participate in a trial without some treatment in patients of the type shown
to benefit from streptokinase A better approach is to use streptokinase as an active control drug in arandomized trial If it is as efficacious or better than streptokinase, we will rejoice We have spent ouradult lives in the care of patients and/or research to develop better methods for treatment Both forour patients and our friends, our families and ourselves, we want proven beneficial drugs available
In summary, with all good therapeutic modalities the benefits must surely outweigh the risks oftreatment In interpreting the data presented by Genentech in May 1987 the majority of the Cardio-vascular and Renal Drugs Advisory Committee members could not confidently identify significantbenefits without concomitant significant risk The review was clouded by issues of safety, manufac-turing process, dose size and the mechanism of action We are hopeful these issues will be addressedquickly, allowing more accurate assessment of TPA’s risk-benefit ratio with conclusive evidence that
treatment can be recommended that allows us to uphold the physician’s credo, primum non nocere
(first do no harm)
The July 28 1987, USA Today’s Life section carried an article on the first page entitled
“FDA Speeds Approval of Heart Drug.” The article mentioned that the FDA commissioner Frank Young was involved in the data gathering Within a few months of the advisory com- mittee meeting, tPA was approved for use in treating myocardial infarctions The drug was 5
to 10 times more expensive than streptokinase; however, it opened arteries faster and that was thought to be a potential advantage A large randomized comparison of streptokinase and tPA was performed (ISIS 3); the preliminary results were presented at the November 1990 American Heart Association meeting The conclusion was that the efficacy of the two drugs was essen- tially equivalent Thus by approving streptokinase, even in retrospect, no period of the lack of availability of a clearly superior drug occurred because of the time delay needed to clear up the questions about tPA This experience shows that biostatistical collaboration has consequences above and beyond the scientific and humanitarian aspects; large political and financial issues also are often involved.
One of the most common maladies in the industrialized world is the occurrence of low-back problems By the age of 50, nearly 85% of humans can recall back symptoms; and as someone has said, the other 15% probably forgot Among persons in the United States, back and spine impairment are the chronic conditions that most frequently cause activity limitation The occur- rence of industrial back disability is one of the most expensive health problems afflicting industry and its employees The cost associated with back injury in 1976 was $14 billion; the costs are greatly skewed, with a relatively low percent of the cost accrued by a few chronic back injury cases [Spengler et al., 1986] The costs and human price associated with industrial back injury prompted the Boeing Company to contact the orthopedics department at the University of Wash- ington to institute a collaborative study of back injury at a Boeing factory in western Washington
Trang 6OH, MY ACHING BACK! 799
State Collaboration was obtained from the Boeing company management, the workers and their unions, and a research group at the University of Washington (including one of the authors, L.F.) The study was supported financially by the National Institutes of Health, the National Institute for Occupational Safety and Health, the Volvo Foundation, and the Boeing Company The study was designed in two phases The first phase was a retrospective analysis of past back injury reports and insurance costs from already existing Boeing records; the second phase was a prospective study looking at a variety of possible predictors (to be described below) of industrial back injury The retrospective Boeing data were analyzed and presented in a series of three papers [Spen- gler et al., 1986; Bigos et al., 1986a,b] The analysis covered 31,200 employees who reported
900 back injuries among 4645 claims filed by 3958 different employees The data emphasized the cost to Boeing of this malady, and as in previous studies, showed that a small percentage
of the back injury reports lead to most of the cost; for example, 10% of the cases accounted for 79% of the cost The incurred costs of back injury claims was 41% of the Boeing total, although only 19% of the claims were for the back The most expensive 10% of the back injury claims accounted for 32% of all the Boeing injury claims Workers were more likely to have reported
an acute back injury if they had a poor employee appraisal rating from their supervisor within
6 months prior to the injury.
The prospective study was unique and had some very interesting findings (the investigators were awarded the highest award of the American Academy of Orthopedic Surgeons, the Kappa Delta award, for excellence in orthopedic research) Based on previously published results and investigator conjectures, data were collected in a number of areas with potential ability to predict reports of industrial back injury Among the information obtained prospectively from the 3020 aircraft employees who volunteered to participate in the study were the following:
• Demographics: race, age, gender, total education, marital status, number in family, method, and time spent in commuting to work.
• Medical history: questions about treatment for back pain by physicians and by tors; hospitalization for back pain; surgery for back injury; smoking status.
chiroprac-• Physical examination: flexibility; spinal canal size by ultrasonography; and anthropometric measures such as height and weight.
• Physical capacities: arm strength; leg strength; and aerobic capacity measured by a maximal treadmill test.
sub-• Psychological testing: the MMPI (Minnesota Multiphasic Inventory and its subscales); a schedule of recent life change events; a family questionnaire about interactions at home;
a health locus of control questionnaire.
• Job satisfaction: subjects were asked a number of questions about their job: did they enjoy their job almost always, some of the time, hardly ever; do they get along well with their supervisor; do they get along well with their fellow employees, etc.
The details of the design and many of the study results may be found in Battie et al [1989, 1990a,b] and Bigos et al [1991, 1992a,b] The extensive psychological questionnaires were given to the employees to be taken home and filled out; 54% of the 3020 employees returned completed questionnaires, and some data analyses were necessarily restricted to those who completed the questionnaire(s) Figure 20.4 summarizes graphically some of the important pre- dictive results.
The results of several stepwise, step-up multivariate Cox models are presented in Table 20.1 There are some substantial risk gradients among the employees However, the predictive power
is not such that one can conclusively identify employees likely to report an acute industrial back injury report Of more importance, given the traditional approaches to this field, which have been largely biomechanical, work perception and psychological variables are important predictors, and the problem cannot be addressed effectively with only one factor in mind This
is emphasized in Figure 20.5, which represents the amount of information (in a formal sense)
Trang 7800 PERSONAL POSTSCRIPT
in each of the categories of variables as given above The figure is a Venn diagram of the estimated amount of predictive information for variables in each of the data collection areas [Fisher and Zeh, 1991] The job perception and psychological areas are about as important as the medical history and physical examination areas To truly understand industrial back injury,
a multifactorial approach must be used.
Among the more interesting aspects of the study is speculation on the meaning and cations of the findings Since, as mentioned above, most people experience back problems at
Exposure Time (months)
No History of Treated Back Problems History of Treated Back Problems
Exposure Time (months)
1st Quintile (lowest risk) 2nd Quintile
3rd Quintile
4th Quintile
5th Quintile (highest risk)
Figure 20.4 Panel (a) shows the product limit curves for the time to a subsequent back injury report for those reporting previous back problems and those who did not report such problems Panel (b) divides the
MMPI scale 3 (hysteria) values by cut points taken from the quintiles of those actually reporting events
Panel (c) divides the subjects by their response to the question: “Do you enjoy your job (1) almost always; (2) some of the time; or (3) hardly ever?” Panel (d ) gives the results of the multivariate Cox model of
Table 20.1; the predictive equation uses the variables from the first three panels (From Bigos et al [1991].)
Trang 8SYNTHESIZING INFORMATION ABOUT MANY COMPETING TREATMENTS 801
Exposure Time (months)
Almost Always Some of the Time Hardly Ever
Exposure Time (months)
1st Quintile (lowest risk) 2nd Quintile
3rd Quintile
4th Quintile
5th Quintile (highest risk)
Figure 20.4 (continued)
some time in their lives, could legitimate back discomfort be used as an escape if one does not enjoy his or her job? Can the problem be reduced by taking measures to make workers more satisfied with their employment, or do a number of people tend to be unhappy no matter what? Is the problem a mixture of these? The results invite systematic, randomized intervention studies Because of the magnitude of the problem, such approaches may be effective in both human and financial terms; however, this remains for the future.
TREATMENTS
Randomized controlled trials, discussed in Chapter 19, are the gold standard for deciding if a drug is effective and are required before new drugs are marketed These trials may compare a
Trang 9802 PERSONAL POSTSCRIPT
Table 20.1 Predicting Acute Back Injury Reportsa
Univariate Analysis Multivariate Analysis (95% Confidence
Entire Population (n= 1326, injury = 117)
Physical
Exam
logical
Psycho-Job Satisfaction
Predictive Information
Capacities (No Sig Inf.) Demography
Figure 20.5 Predictive information by type of variable collected Note that the job satisfaction and chological areas contribute the same order of magnitude as the more classical medical history and physicalexamination variables The relative lack of overlap in predictive information means that at least these areasmust be considered if the problem is to be fully characterized Capacities and demography variables added
psy-no information and so have psy-no boxes
Trang 10SYNTHESIZING INFORMATION ABOUT MANY COMPETING TREATMENTS 803
new treatment to a placebo or to an accepted treatment When many different treatments are available, however, it is not enough to know that they are all better than nothing, and it is often not feasible to compare all possible pairs of treatments in large randomized trials.
Clinicians would find it helpful to be able to use information from “indirect” comparisons For example, if drug A reduces mortality by 20% compared to placebo, and drug B reduces mortality by 10% compared to drug A, it would be useful to conclude that B was better than placebo However, indirect comparisons may not be reliable The International Conference on Harmonisation, a project of European, Japanese, and U.S regulators and industry experts, says
in its document E10 on choice of control groups [2000, Sec 2.1.7.4]
“Placebo-controlled trials lacking an active control give little useful information about comparativeeffectiveness, information that is of interest and importance in many circumstances Such informationcannot reliably be obtained from cross-study comparisons, as the conditions of the studies may havebeen quite different.”
The major concern with cross-study comparisons is that the populations being studied may
be importantly different People who participate in a trial of drug A when no other treatment is available may be very different from those who participate in a trial comparing drug A as an established treatment with a new experimental drug, B For example, people for whom drug A
is less effective may be more likely to participate in the hope of getting a better treatment The
ICH participants are certainly correct that cross-study comparisons may be misleading, but it would be very useful to know if they are actually misleading in a particular case.
An important example of this comes from the treatment of high blood pressure There are many classes of drugs to treat high blood pressure, working in different ways on the
chan-nel blockers, angiotensin-converting enzyme (ACE) inhibitors, angiotensin receptor blockers, and diuretics The availability of multiple treatments is useful because they have different side effects and because a single drug may not reduce blood pressure sufficiently Some of the drug classes have the advantage of also treating other conditions that may be present in
symp-toms of prostatic hyperplasia) However, in many cases it is not obvious which drug class to try first.
Many clinical trials have been done, but these usually compare a single pair of treatments, and many important comparisons have not been done For example, until late 2002, there had been only one trial in previously healthy people designed to measure clinical outcomes comparing ACE inhibitors with diuretics, although these drug classes are both useful in congestive heart failure and so seem a natural comparison In a situation such as this, where there is reliable information from within-study comparisons of many, but not all, pairs of drugs, it should be possible to assess the reliability of cross-study comparisons and decide whether they can be used That is, the possible cross-study comparisons of, say, ACE inhibitors and calcium channel blockers can be compared with each other and with any direct within-study comparisons The better the agreement, the more confidence we will have in the cross-study comparisons This
technique is called network metaanalysis [Lumley, 2002] The name comes from thinking of
each randomized trial as a link connecting two treatments A cross-study comparison is a path between two treatments composed of two or more links If there are many possible paths joining two treatments, we can obtain an estimate along each path and see how well they agree.
The statistical model behind network metaanalysis is similar to the random-effects models
that trials were comparable, we could model this log relative risk by
Trang 11804 PERSONAL POSTSCRIPT
i and β
sampling error.
When we say that trials of different sets of treatments are not comparable, we mean precisely
i− βj:there is some extra systematic difference These differences can be modeled as random intercepts belonging to each pair of drugs:
Y
ij k= βi− βj+ ξij + ǫij k
biases are, averaged over all the trials If the incoherence is large, the metaanalysis should not
be done If the incoherence is small, the metaanalysis may be worthwhile Confidence intervals
very small, and substantially longer if the incoherence is moderately large.
Clearly, it would be better to have a single large trial that compared all the treatments, but this may not be feasible There is no particular financial incentive for the pharmaceutical companies
to conduct such a trial, and the cost would make even the National Institutes of Health think twice In the case of antihypertensive treatments, a trial of many of the competing treatments was eventually done This trial, ALLHAT [ALLHAT, 2002] compared a diuretic, a calcium
inferior (that portion of the trial was stopped early), and that diuretics were perhaps slightly superior to the other treatments.
Before the results of ALLHAT were available, Psaty et al performed a network metaanalysis
of the available randomized trials, giving much the same conclusions but also including
include the results of ALLHAT, strengthens the conclusion that diuretics are probably slightly superior to the other options in preventing serious cardiovascular events [Psaty et al., 2003] The cross-study comparisons showed good agreement except for the outcome of congestive heart failure, where there seemed to be substantial disagreement (perhaps due to different defi- nitions over time) The network metaanalysis methodology incorporates this disagreement into confidence intervals, so the conclusions are weaker than they would otherwise be, but still valid The most important limitation of network metaanalysis is that it requires many paths and many links to assess the reliability of the cross-study comparisons If each new antihypertensive drug had been compared only to placebo, there would be only a single path between any two treatments, and no cross-checking would be possible Reliability of cross-study comparisons would then be an unsupported (and unsupportable) assumption.
Fine particles in the air have long been known to be toxic in sufficiently high doses Recently, there has been concern that even the relatively low exposures permitted by European and U.S law may be dangerous to sensitive individuals These fine particles come from smoke (wood smoke, car exhaust, power stations), dust from roads or fields, and haze formed by chemical reactions in the air They have widely varying physical and chemical characteristics, which are incompletely understood, but the legal limits are based simply on the total mass per cubic meter of air.
Most of the recent concern has come from time-series studies, which are relatively easy
and inexpensive to carry out These studies examine the associations between total number of deaths, hospital admissions, or emergency room visits in a city with the average pollution levels.
Trang 12SOMETHING IN THE AIR? 805
As the EPA requires regular monitoring of air pollution and other government agencies collect information on deaths and hospital attendance, the data merely need to be extracted from the relevant databases.
This description glosses over some important statistical issues, many of which were pointed out by epidemiologists when the first studies were published:
areas so as not to detect problems).
the results.
and death or illness.
You should be able to think of several other potential problems, but a more useful exercise for the statistician is to classify the problems by whether they are important and whether they are soluble It turns out that the first two are not important because they are more or less constant from day to day and so cancel out of our comparisons The third problem is potentially important and led to some interesting statistical research, but it turns out that addressing it does not alter the results.
The fourth problem, seasonal variation, is important, as Figure 20.6 shows In Seattle, tality and air pollution peak in the winter In many other cities the pattern is slightly different, with double peaks in winter and summer, but some form of strong seasonality is the rule The
Trang 13806 PERSONAL POSTSCRIPTsolution to this confounding problem is to include these seasonal effects in our regression model This is complicated: As gardeners and skiers well know, the seasons are not perfectly regular from year to year Epidemiologists found a statistical solution, the generalized additive model (GAM), which had been developed for completely different problems, and adapted it to these time series The GAM models allow the seasonal variation to be modeled simply by saying how smooth it should be:
day-to-day fluctuations for evaluating the relationship between air pollution and mortality summarized
in Chapter 3.
With the problem of seasonal variation classified as important but soluble, analyses proceeded using data from many different U.S cities and cities around the world Shortly after the EPA had compiled a review of all the relevant research as a prelude to setting new standards, some bad news was revealed Researchers at Johns Hopkins School of Public Health, who had compiled the largest and most systematic set of time-series studies, reported that they and everyone else had been using the GAM software incorrectly The software had been written many years before, when computers were much slower, and had been intended for simpler examples than these time- series studies The computations for a GAM involve iterative improvements to an estimate until
it stops changing, and the default criterion for “stops changing” was not tight enough for the air pollution time-series models At about the same time, researchers in Canada noticed that one
good enough in these time-series models [Ramsay et al., 2003] When the dust settled, it became clear that the problem of seasonal variation was still soluble—fixes were found for these two problems, many studies were reanalyzed, and the conclusions remained qualitatively the same The final problem, the fact that the latency is not known, is just one special case of the problem of model uncertainty—choosing a regression model is much harder than fitting it.
It is easy to estimate the association between mortality and today’s pollution, or yesterday’s pollution, or the previous day’s, or the average of the past week, or any other choice It is very hard to choose between these models Simply reporting the best results is clearly biased, but is sometimes done Fitting all the possible models may obscure the true associations among all the random noise Specifying a particular model a priori allows valid inference but risks missing the true association This final problem is important, but there is no simple mathematical solution.
The neuropathological diagnosis of Alzheimer’s disease (AD) is time consuming and difficult, even for experienced neuropathologists Work in the late 1960s and early 1970s found that the presence of senile neuritic plaques in the neocortex and hippocampus justified a neuropatho- logical diagnosis of Alzheimer’s disease [Tomlinson et al., 1968, 1970] Plaques are proteins associated with degenerating nerve cells in the brain; they tend to be located near the points of contact between cells Typically, they are found in the brains of older persons.
These studies also found that large numbers of neurofibrillary tangles were often present in the neocortex and the hippocampus of brains from Alzheimer’s disease victims A tangle is another protein in the shape of a paired helical fragment found in the nerve cell Neurofibrillary tangles are also found in other diseases Later studies showed that plaques and tangles could be found in the brains of elderly persons with preserved mental status Thus, the quantity and distribution of plaques and tangles, rather than their mere presence, are important in distinguishing Alzheimer’s brains from the brains of normal aging persons.
Trang 14ARE TECHNICIANS AS GOOD AS PHYSICIANS? 807
A joint conference of 1985 [Khachaturian, 1985] stressed the need for standardized clinical and neuropathological diagnoses for Alzheimer’s disease We wanted to find out whether subjects with minimal training can count plaques and tangles in histological specimens of patients with Alzheimer’s disease and controls [van Belle et al., 1997] Two experienced neuropathologists trained three student helpers to recognize plaques and tangles in slides obtained from autopsy material After training, the students and pathologists examined coded slides from patients with Alzheimer’s disease and controls Some of the slides were repeated to provide an estimate of reproducibility Each reader read four fields, which were then averaged.
Ten sequential cases with a primary clinical and neuropathological diagnosis of Alzheimer’s disease were chosen from the Alzheimer’s Disease Research Center’s (ADRC) brain autopsy registry Age at death ranged from 67 years to 88 years, with a mean of 75.7 years and a standard deviation of 5.9 years.
Ten controls were examined for this study Nine controls were selected from the ADRC registry of patients with brain autopsy, representing all subjects in the registry with no neu- ropathological evidence of AD Four of these did have a clinical diagnosis of Alzheimer’s disease, however One additional control was drawn from files at the University of Washing- ton’s Department of Neuropathology This control, aged 65 years at death, had no clinical history
of Alzheimer’s disease.
For each case and control, sections from the hippocampus and from the temporal, parietal, and frontal lobes were viewed by two neuropathologists and three technicians The three technicians were a first-year medical school student, a graduate student in biostatistics with previous histo- logical experience, and a premedical student The technicians were briefly trained (for several hours) by a neuropathologist The training consisted of looking at brain tissue (both Alzheimer’s cases and normal brains) with a double-headed microscope and at photographs of tissue The neuropathologist trained the technicians to identify plaques and tangles in the tissue samples viewed The training ended when the neuropathologist was satisfied that the technicians would
be able to identify plaques and tangles in brain tissue samples on their own for the purposes
of this study The slides were masked to hide patient identity and were arbitrarily divided into batches of five subjects, with cases and controls mixed Each viewer was asked to scan the entire slide to find the areas of the slide with the highest density of plaques and tangles (implied
by Khachaturian [1985]) The viewer then chose the four fields on the slide that appeared to contain the highest density of plaques and tangles when viewed at 25× Neurofibrillary tangles and senile plaques were counted in these four fields at 200× If the field contained more than
30 plaques or tangles, the viewer scored the number of lesions in that field as 30.
The most important area in the brain for the diagnosis of Alzheimer’s is the hippocampus, and the results are presented for that region Results for other regions were similar In addition,
we deal here only with cases and plaques Table 20.2 contains results for the estimated number
of plaques per field for cases; each reading is the average of readings from four fields The estimated number of plaques varied considerably, ranging from zero to more than 20 Inspection
of Table 20.2 suggests that technician 3 tends to read higher than the other technicians and the neuropathologists, that is, tends to see more plaques An analysis of variance confirms this impression:
Trang 15Averages are over four fields.
You will recognize from Chapter 10 the idea of partitioning the variance attributable to observers into three components; there are many ways of partitioning this variance The table above contains one useful way of doing this The analysis suggests that the average levels of response do not vary within neuropathologists There is a highly significant difference among technicians We would conclude that technician 3 is high, rather than technician 1 being low, because of the values obtained by the two neuropathologists Note also that the residual variabil-
since the values represent averages of four readings Using a single reading as a basis produces
√
But how shall agreement be measured or evaluated? Equality of the mean levels suggests only that the raters tended to count the same number of plaques on average We need a more precise formulation of the issue A correlation between the technicians and the neuropathologists will provide some information but is not sufficient because the correlation is invariant under changes in location and scale In Chapter 4 we distinguished between precision and accuracy.
Precision is the degree to which the observations cluster around a line; accuracy is the degree
to which the observations are close to some standard In this case the standard is the score
of the neuropathologist and accuracy can be measured by the extent to which a technician’s
these data In our case, the data are analyzed according to five criteria: location shift, scale shift,
precision, accuracy, and concordance Location shift refers to the degree to which the means of the data differ between technician and neuropathologist A scale shift measures the differences in variability Precision is quantified by a measure of correlation (Pearson’s in our case) Accuracy
as the product of the precision and the accuracy In symbols, denote two raters by subscripts 1 and 2 Then we define
Trang 16RISKY BUSINESS 809Table 20.3 Characteristics of Ratings of Three Technicians and Two Neuropathologistsa
Technician Pathologist Location Shift Scale Shift Precision Accuracy Concordance
We discuss these briefly The location shift is a standardized estimate of the difference
If there is no location difference between the two raters, this quantity is centered around zero The scale shift is a ratio; if there is no scale shift, this quantity is centered around 1 The precision is the usual correlation coefficient; if the paired data fall on a straight line, the correlation is 1 The accuracy is made up of a mixture of the means and the standard deviations Note that if there is no location or scale shift, the accuracy is 1, the upper limit for this statistic The concordance is the product of the accuracy and the precision; it is also bounded by 1 The data in Table 20.2 are ana- lyzed according to the criteria above and displayed in Table 20.3 This table suggests that all the associations between technicians and neuropathologists are comparable In addition, the compar- isons between neuropathologists provide an internal measure of consistency The “location shift” column indicates that, indeed, technician 3 tended to see more plaques than the neuropatholo- gists Technician 3 was also more variable, as indicated in the “scale shift” column Technician 1 tended to be less variable than the neuropathologists The precision of the technicians was com- parable to that of the two neuropathologists compared with each other The neuropathologists also displayed very high accuracy, almost matched by technician 1 and 2 The concordance, the product of the precision and the accuracy, averaged over the two neuropathologists is com- parable to their concordance As usual, it is very important to graph the data to confirm these analytical results by a graphical display Figure 20.7 displays the seven possible graphs.
In summary, we conclude that it is possible to train relatively naive observers to count plaques
in a manner comparable to that of experienced neuropathologists, as defined by the measures above By this methodology, we have also been able to isolate the strengths and weaknesses of each technician.
20.8 RISKY BUSINESS
Every day of our lives we meet many risks: the risk of being struck by lightning, getting into
a car accident on the way to work, eating contaminated food, and getting hepatitis Many risks have associated moral and societal values For example, what is the risk of being infected by AIDS through an HIV-positive health practitioner? How does this risk compare with getting infectious hepatitis from an infected worker? What is the risk to the health practitioner in being identified as HIV positive? As we evaluate risks, we may ignore them, despite their being real
Trang 17What is a risk? A risk is usually an event or the probability of the event Thus, the risk
of being hit by lightning is defined to be the probability of this event The word risk has an
unfavorable connotation We usually do not speak of the risk of winning the lottery For purposes
of this chapter, we relate the risk of an event to the probability of the occurrence of the event.
In Chapter 3 we stated that all probabilities are conditional probabilities When we talk about the risk of breast cancer, we usually refer to its occurrence among women Probabilities are
modified as we define different groups at risk R A Fisher talked about relevant subsets, that
is, what group or set of events is intended when a probability is specified.
In the course of thinking about environmental and occupational risks, one of us (G.vB.) wanted to develop a scale of risks similar to the Richter scale for earthquakes The advantages
of such a scale is to present risks numerically in such a way that the public would have an intuitive understanding of the risks This, despite not understanding the full basis of the scale (it turns out to be fairly difficult to find a complete description of the Richter scale).
What should be the characteristics of such a scale? It became clear very quickly that the scale would have to be logarithmic Second, it seemed that increasing risks should be associated with increasing values of the scale It would also be nice to have the scale have roughly the
same numerical range as the Richter scale Most of its values are in the range 3 to 7 The risk scale for events is defined as follows: Let P (E ) be the probability of an event; then the risk
Trang 18This scale has several nice properties First, the scale is logarithmic Second, if the event is
in probabilities Events with risk units of the order of 1 to 4 are associated with relatively rare events Note that the scale can go below zero.
As with the Richter scale, familiarity with common events will help you get a feeling for the scale Let us start by considering some random events; next we deal with some common risks and locate them on the scale; finally, we give you some risks and ask you to place them on the scale (the answers are given at the end of the chapter) The simplest case is the coin toss The probability of, say, a head is 0.5 Hence the risk units associated with observing a head
decimal place is usually enough) For a second example, the risk units of drawing at random
pair of sevens with two dice has a probability of 1/36 and are RU value of 8.4 Now consider some very small probabilities Suppose that you dial at random; what is the chance of dialing your own phone number? Assume that we are talking about the seven-digit code and we allow all zeros as a possible number The RU value is 3 If you throw in the area code as well, you must deduct three more units to get the value RU = 0 There are clearly more efficient ways to make phone calls.
The idea of a logarithmiclike scale for probabilities appears in the literature quite frequently.
In a delightful, little-noticed book, Risk Watch, Urquhart and Heilmann [1984] defined the safety
The drawback of this definition is that it calibrates events in terms of safety rather than risk People are more inclined to think in terms of risk; they are “risk avoiders” rather than safety
Trang 19812 PERSONAL POSTSCRIPT
Table 20.5 The Risk Unit Scale and Some Associated Risks
9 Pick number 3 at random from 0 to 3
8 Car accident with injury (annual)
7 Killed in hang gliding (annual)
6 EPA action (life time risk)
5 Cancer from 4 tbsp peanut butter/day (annual)
4 Cancer from one transcontinental trip
3 Killed by falling aircraft
2 Dollar bill has specified set of eight numbers
1 Pick spot on earth at random and land within
1 mile of your house
0 Your phone number picked at random
(+ area code)
−0.5 Killed by falling meteorite (annual)
Table 20.6 Events to Be Ranked and Placed on Risk Units Scalea
a Accidental drowning
b Amateur pilot death
c Appear on the Johnny Carson Show (1991)
d Death due to smoking
e Die in mountain climbing accident
f Fatality due to insect bite or sting
g Hit by lightning (in lifetime)
h Killed in college football
i Lifetime risk of cancer due to chlorination
j Cancer from one diet cola per day with
saccharin
k Ace of spades in one draw from 52-card deck
l Win the Reader’s Digest Sweepstakes
m Win the Washington State lottery grand prize
(with one ticket)
is, if the lifetime probability of death is 1/10,000, the agency will take some action This may seem rather anticonservative, but there are many risks, and some selection has to be made All these probabilities are estimates with varying degrees of precision Crouch and Wilson [1982] include references to the data set upon which the estimate is based and also indicate whether the risk is changing Table 20.6 describes some events for which you are asked to estimate the risk units The answers are given in Table 20.7, preceding the References.
Trang 20RISKY BUSINESS 813Table 20.7 Activities Estimated to Increase the Annual Probability of Death by One in a Milliona
Living 2 months with a cigarette smoker Cancer, heart diseaseDrinking Miami drinking water for 1 year Cancer from chloroformLiving 150 years within 5 miles of a nuclear power plant Cancer from radiation
a
All events have a risk unit value of 4
How do we evaluate risks? Why do we take action on some risks but not on others? The study of risks has been become a separate science with its own journals and society The
Borgen [1990] and Slovic [1986] articles in the journal Risk Analysis are worth examining The
following dimensions about evaluating risks have been mentioned in the literature:
We discuss these briefly Recreational scuba diving has an annual probability of death of 4/10,000, or a risk unit of 6.6 [Crouch and Wilson, 1982, Table 7.4] Compare this with some
of the risks in Table 20.5 Another dimension is the timing of the effect If the effect is delayed,
we are usually willing to take a bigger risk; the most obvious example is smoking (which also is
a voluntary behavior) If the exposure is essential, as part of one’s occupation, then again, larger risks are acceptable A “dread” hazard is often perceived as of greater risk than a common hazard The most conspicuous example is an airplane crash vs an automobile accident But perversely, we are less likely to be concerned about hazards that affect special groups to which
we are not immediately linked For example, migrant workers have high exposures to pesticides and resulting increased immediate risks of neurological damage and long-term risks of cancer.
As a society, we are not vigorous in reducing those risks Finally, if the effects of a risk are reversible, we are willing to take larger risks.
Table 20.7 lists some risks with the same estimated value: Each one increases the annual risk of death by 1 in a million; that is, all events have a risk unit value of 4 These examples illustrate that we do not judge risks to be the same even though the probabilities are equal Some of the risks are avoidable; others may not be It may be possible to avoid drinking Miami drinking water by drinking bottled water or by moving to Alaska Most of the people who live
in New York or Boston are not aware of the risk of living in those cities But even if they did,
it is unlikely that they would move A risk of 1 in a million is too small to act on.
How can risks be ranked? There are many ways The primary one is by the probability of occurrence as we have discussed so far Another is by the expected loss (or gain) For example, the probability of a fire destroying your home is fairly small but the loss is so great that it pays
to make the unfair bet with the insurance company An unfair bet is one where the expected gain
is negative Another example is the lottery A typical state lottery takes more than 50 cents from every dollar that is bet (compared to about 4 cents for roulette play in a casino) But the reward
is so large (and the investment apparently small) that many people gladly play this unfair game.
Trang 21814 PERSONAL POSTSCRIPT
Table 20.8 Answers to Evaluation of Risks in Table 20.5
Risk Units Source/Comments
a 5.6 Crouch and Wilson [1982, Table 7.2]
b 7.0 Crouch and Wilson [1982, Table 7.4]
c 4.3 Siskin et al [1990]
d 7.5 Slovic [1986, Table 1]
e 6.8 Crouch and Wilson [1982, Table 7.4]
f 3.4 Crouch and Wilson [1982, Table 7.2]
g 4.2 Siskin et al [1990]
h 5.5 Crouch and Wilson [1982, Table 7.4]
i 4.0 Crouch and Wilson [1982, Table 7.5 and
of the behavior But a great deal of risk reduction can be effected by changes in behavior It behooves each one of us to assess the risks we take and to decide whether they are worth it.
The Journal of the Royal Statistical Society, Series A devoted the June 2003 issue (Volume 166) to statistical issues in risk communication The journal Risk Analysis address risk analysis,
risk assessment, and risk communication.
REFERENCES
Alderman, E L., Bourassa, M G., Cohen, L S., Davis, K B., Kaiser, G C., Killip, T., Mock, M B., tinger, M., and Robertson, T L [1990] Ten-year follow-up of survival and myocardial infarction
Pet-in the randomized Coronary Artery Surgery Study Circulation, 82: 1629–1646.
ALLHAT Officers and Coordinators [2002] Major outcomes in high-risk hypertensive patients randomized
to angiotensin-converting enzyme inhibitor or calcium channel blocker vs diuretic The
antihyperten-sive and lipid-lowering treatment to prevent heart attach trial (ALLHAT) JAMA, 288: 2981–2997.
Battie, M C., Bigos, S J., Fisher, L D., Hansson, T H., Nachemson, A L., Spengler, D M., ley, M D., and Zeh, J [1989] A prospective study of the role of cardiovascular risk factors and
Wort-fitness in industrial back pain complaints Spine, 14: 141–147.
Battie, M C., Bigos, S J., Fisher, L D., Spengler, D M., Hansson, T H., Nachemson, A L., and ley, M D [1990a] Anthropometric and clinical measures as predictors of back pain complaints in
Wort-industry: a prospective study Journal of Spinal Disorders, 3: 195–204.
Battie, M C., Bigos, S J., Fisher, L D., Spengler, D M., Hansson, T H., Nachemson, A L., and ley, M D [1990b] The role of spinal flexibility in back pain complaints within industry: a prospec-
Wort-tive study Spine, 15: 768–773.
Bigos, S J., Spengler, D M., Martin, N A., Zeh, J., Fisher, L., Nachemson, A., and Wang, M H [1986a]
Back injuries in industry—a retrospective study: II Injury factors Spine, 11: 246–251.
Bigos, S J., Spengler, D M., Martin, N A., Zeh, J., Fisher, L., Nachemson, A., and Wang, M H [1986b]
Back injuries in industry—a retrospective study: III Employee-related factors Spine, 11: 252–256.
Trang 22REFERENCES 815
Bigos, S J., Battie, M C., Spengler, D M., Fisher, L D., Fordyce, W E., Hansson, T H., son, A L., and Wortley, M D [1991] A prospective study of work perceptions and psychosocial
Nachem-factors affecting the report of back injury Spine, 16: 1–6.
Bigos, S J., Battie, M C., Fisher, L D., Fordyce, W E., Hansson, T H., Nachemson, A L., and gler, D M [1992a] A longitudinal, prospective study of industrial back injury reporting in industry
Spen-Clinical Orthopaedics, 279: 21–34.
Bigos, S J., Battie, M C., Fisher, L D., Hansson, T H., Spengler, D M., and Nachemson, A L [1992b]
A prospective evaluation of commonly used pre-employment screening tools for acute industrial
back pain Spine, 17: 922–926.
Borer, J S [1987] t-PA and the principles of drug approval (editorial) New England Journal of Medicine,
317: 1659–1661
Borgen, K T [1990] Of apples, alcohol, and unacceptable risks Risk Analysis, 10: 199–200.
CASS Principal Investigators and Their Associates [1981] National Heart, Lung, Blood Institute Coronary
Artery Surgery Study, T Killip, L D Fisher, and M B Mock (eds.) American Heart Association
Monograph 79 Circulation, 63(p II): I-1 to I-81.
CASS Principal Investigators and Their Associates: Coronary Artery Surgery Study (CASS) [1983a] A
randomized trial of coronary artery bypass surgery: survival data Circulation, 68: 939–950.
CASS Principal Investigators and Their Associates: Coronary Artery Surgery Study (CASS) [1983b] Arandomized trial of coronary artery bypass surgery: quality of life in patients randomly assigned to
treatment groups Circulation, 68: 951–960.
CASS Principal Investigators and Their Associates: Coronary Artery Surgery Study (CASS) [1984a] Arandomized trial of coronary artery bypass surgery: comparability of entry characteristics and survival
in randomized patients and nonrandomized patient meeting randomization criteria Journal of the
American College of Cardiology, 3: 114–128.
CASS Principal Investigators and Their Associates [1984b] Myocardial infarction and mortality in the
Coronary Artery Surgery Study (CASS) randomized trial New England Journal of Medicine, 310:
Crouch, E A C., and Wilson, R [1982] Risk Benefit Analysis Ballinger, Cambridge, MA.
Fisher, L D., Giardina, E.-G., Kowy, P R., Leier, C V., Lowenthal, D T., Messerli, F H., Pratt, C M.,
and Ruskin, J [1987] The FDA Cardio-Renal Committee replies (letter to the editor) Wall Street
Journal, Wed., Aug 12, p 19
Fisher, L D., Kaiser, G C., Davis, K B., and Mock, M [1989] Crossovers in coronary bypass grafting
trials: desirable, undesirable, or both? Annals of Thoracic Surgery, 48: 465–466.
Fisher, L D., Dixon, D O., Herson, J., and Frankowski, R F [1990] Analysis of randomized clinical
trials: intention to treat In Statistical Issues in Drug Research and Development, K E Peace (ed.).
Marcel Dekker, New York, pp 331–344
Fisher, L D., and Zeh, J [1991] An information theory approach to presenting predictive value in the Coxproportional hazards regression model (unpublished)
International Conference on Harmonisation [2000] ICH Harmonised Tripartite Guideline: E10 Choice of
Control Group and Related Issues in Clinical Trials http://www.ich.org
Kaiser, G C., Davis, K B., Fisher, L D., Myers, W O., Foster, E D., Passamani, E R., and spie, M J [1985] Survival following coronary artery bypass grafting in patients with severe
Gille-angina pectoris (CASS) (with discussion) Journal of Thoracic and Cardiovascular Surgery, 89:
513–524
Khachaturian, Z S [1985] Diagnosis of Alzheimer’s disease Archives of Neurology, 42: 1097–1105.
Kowey, P R., Fisher, L D., Giardina, E.-G., Leier, C V., Lowenthal, D T., Messerli, F H., andPratt, C M [1988] The TPA controversy and the drug approval process: the view of the Car-
diovascular and Renal Drugs Advisory Committee Journal of the American Medical Association,
260: 2250–2252
Lin, L I [1989] A concordance correlation coefficient to evaluate reproducibility Biometrics, 45: 255–268.
Trang 23nary Artery Surgery Study (CASS) registry Journal of Thoracic and Cardiovascular Surgery, 97:
487–495
Passamani, E., Davis, K B., Gillespie, M J., Killip, T., and the CASS Principal Investigators and TheirAssociates [1985] A randomized trial of coronary artery bypass surgery: survival of patients with a
low ejection fraction New England Journal of Medicine, 312: 1665–1671.
Peto, R., Pike, M C., Armitage, P., Breslow, N L., Cox, D R., Howard, S V., Mantel, N., son, K., Peto, J., and Smith, P G [1977] Design and analysis of randomized clinical trials requiring
McPher-prolonged observation of each patient: II Analysis and examples British Journal of Cancer, 35: 1–39.
Preston, T A [1977] Coronary Artery Surgery: A Critical Review Raven Press, New York.
Psaty, B., Lumley, T., Furberg, C., Schellenbaum, G., Pahor, M., Alderman, M H., and Weiss, N S [2003].Health outcomes associated with various anti-hypertensive therapies used as first-line agents: a net-
work meta-analysis Journal of the American Medical Association, 289: 2532–2542.
Ramsay, T O., Burnett, R T., and Krewski, D [2003] The effect of concurvity in generalized additive
models linking mortality to ambient particulate matter Epidemiology, 14: 18–23.
Rogers, W J., Coggin, C J., Gersh, B J., Fisher, L D., Myers, W O., Oberman, A., and Sheffield, L T.[1990] Ten-year follow-up of quality of life in patients randomized to receive medical therapy or
coronary artery bypass graft surgery Circulation, 82: 1647–1658.
Siskin, B., Staller, J., and Rornik, D [1990] What Are the Chances? Risk, Odds and Likelihood in Everyday
Life Crown Publishers, New York
Slovic, P [1986] Informing and educating the public about risk Risk Analysis, 6: 403–415.
Spengler, D M., Bigos, S J., Martin, N A., Zeh, J., Fisher, L D., and Nachemson, A [1986] Back
injuries in industry: a retrospective study: I Overview and cost analysis Spine, 11: 241–245.
Takaro, T., Hultgren, H., Lipton, M., Detre, K., and participants in the Veterans Administration CooperativeStudy Group [1976] VA cooperative randomized study for coronary arterial occlusive disease: II
Left main disease Circulation, 54(suppl 3): III-107.
Tomlinson, B E., Blessed, G., and Roth, M [1968] Observations on the brains of non-demented old people
Journal of Neurological Science, 7: 331–356.
Tomlinson, B E., Blessed, G., and Roth, M [1970] Observations on the brains of demented old people
Journal of Neurological Science, 11: 205–242.
Urquhart, J., and Heilmann, K [1984] Risk Watch: The Odds of Life Facts on File Publications, New York.
van Belle, G., Gibson, K., Nochlin, D., Sumi, M., and Larson, E B [1997] Counting plaques and tangles
in Alzheimer’s disease: concordance of technicians and pathologists Journal of neurological Science,
145: 141–146
Wall Street Journal [1987a] The TPA decision (editorial) Wall Street Journal, Thurs., May 28, p 26.
Wall Street Journal [1987b] Human sacrifice (editorial) Wall Street Journal, Tues., June 2, p 30.
Weinstein, G S., and Levin, B [1989] Effect of crossover on the statistical power of randomized studies
Annals of Thoracic Surgery, 48: 490–495.
Wilkinson, L [1989] SYGRAPH: The System for Graphics SYSTAT, Inc., Evanston, IL.
Wynne, B [1991] Public perception and communication of risk: what do we know? NIH Journal of Health,
3: 65–71
Trang 25818 APPENDIX
Table A.1 Standard Normal Distribution
LetZbe a normal random variable with mean zero and variance 1 For selected values ofZ, three valuesare tabled: (1) the two-sidedp-value, orP[|Z| ≥ z]; (2) the one-sided p-value, or P [Z ≥ z]; and (3) thecumulative distribution function atZ, orP[Z≤ z]
Cum-z sided sided dist z sided sided dist z sided sided dist.0.00 1.0000 5000 5000 1.30 1936 0968 9032 1.80 0719 0359 96410.05 9601 4801 5199 1.31 1902 0951 9049 1.81 0703 0351 96490.10 9203 4602 5398 1.32 1868 0934 9066 1.82 0688 0344 96560.15 8808 4404 5596 1.33 1835 0918 9082 1.83 0673 0336 96640.20 8415 4207 5793 1.34 1802 0901 9099 1.84 0658 0329 96710.25 8026 4013 5987 1.35 1770 0885 9115 1.85 0643 0322 96780.30 7642 3821 6179 1.36 1738 0869 9131 1.86 0629 0314 96860.35 7263 3632 6368 1.37 1707 0853 9147 1.87 0615 0307 96930.40 6892 3446 6554 1.38 1676 0838 9162 1.88 0601 0301 96990.45 6527 3264 6736 1.39 1645 0823 9177 1.89 0588 0294 97060.50 6171 3085 6915 1.40 1615 0808 9192 1.90 0574 0287 97130.55 5823 2912 7088 1.41 1585 0793 9207 1.91 0561 0281 97190.60 5485 2743 7257 1.42 1556 0778 9222 1.92 0549 0274 97260.65 5157 2578 7422 1.43 1527 0764 9236 1.93 0536 0268 97320.70 4839 2420 7580 1.44 1499 0749 9251 1.94 0524 0262 97380.75 4533 2266 7734 1.45 1471 0735 9265 1.95 0512 0256 97440.80 4237 2119 7881 1.46 1443 0721 9279 1.96 0500 0250 97500.85 3953 1977 8023 1.47 1416 0708 9292 1.97 0488 0244 97560.90 3681 1841 8159 1.48 1389 0694 9306 1.98 0477 0239 97610.95 3421 1711 8289 1.49 1362 0681 9319 1.99 0466 0233 97671.00 3173 1587 8413 1.50 1336 0668 9332 2.00 0455 0228 97721.01 3125 1562 8438 1.51 1310 0655 9345 2.01 0444 0222 97781.02 3077 1539 8461 1.52 1285 0643 9357 2.02 0434 0217 97831.03 3030 1515 8485 1.53 1260 0630 9370 2.03 0424 0212 97881.04 2983 1492 8508 1.54 1236 0618 9382 2.04 0414 0207 97931.05 2937 1469 8531 1.55 1211 0606 9394 2.05 0404 0202 97981.06 2891 1446 8554 1.56 1188 0594 9406 2.06 0394 0197 98031.07 2846 1423 8577 1.57 1164 0582 9418 2.07 0385 0192 98081.08 2801 1401 8599 1.58 1141 0571 9429 2.08 0375 0188 98121.09 2757 1379 8621 1.59 1118 0559 9441 2.09 0366 0183 98171.10 2713 1357 8643 1.60 1096 0548 9452 2.10 0357 0179 98211.11 2670 1335 8665 1.61 1074 0537 9463 2.11 0349 0174 98261.12 2627 1314 8686 1.62 1052 0526 9474 2.12 0340 0170 98301.13 2585 1292 8708 1.63 1031 0516 9484 2.13 0332 0166 98341.14 2543 1271 8729 1.64 1010 0505 9495 2.14 0324 0162 98381.15 2501 1251 8749 1.65 0989 0495 9505 2.15 0316 0158 98421.16 2460 1230 8770 1.66 0969 0485 9515 2.16 0308 0154 98461.17 2420 1210 8790 1.67 0949 0475 9525 2.17 0300 0150 98501.18 2380 1190 8810 1.68 0930 0465 9535 2.18 0293 0146 98541.19 2340 1170 8830 1.69 0910 0455 9545 2.19 0285 0143 98571.20 2301 1151 8849 1.70 0891 0446 9554 2.20 0278 0139 98611.21 2263 1131 8869 1.71 0873 0436 9564 2.21 0271 0136 98641.22 2225 1112 8888 1.72 0854 0427 9573 2.22 0264 0132 98681.23 2187 1093 8907 1.73 0836 0418 9582 2.23 0257 0129 98711.24 2150 1075 8925 1.74 0819 0409 9591 2.24 0251 0125 98751.25 2113 1056 8944 1.75 0801 0401 9599 2.25 0244 0122 98781.26 2077 1038 8962 1.76 0784 0392 9608 2.26 0238 0119 98811.27 2041 1020 8980 1.77 0767 0384 9616 2.27 0232 0116 98841.28 2005 1003 8997 1.78 0751 0375 9625 2.28 0226 0113 98871.29 1971 0985 9015 1.79 0735 0367 9633 2.29 0220 0110 9890
Trang 26APPENDIX 819Table A.1 (continued )
Cum-z sided sided dist z sided sided dist z sided sided dist.2.30 0214 0107 9893 2.80 0051 0026 9974 3.30 0010 0005 99952.31 0209 0104 9896 2.81 0050 0025 9975 3.31 0009 0005 99952.32 0203 0102 9898 2.82 0048 0024 9976 3.32 0009 0005 99952.33 0198 0099 9901 2.83 0047 0023 9977 3.33 0009 0004 99962.34 0193 0096 9904 2.84 0045 0023 9977 3.34 0008 0004 99962.35 0188 0094 9906 2.85 0044 0022 9978 3.35 0008 0004 99962.36 0183 0091 9909 2.86 0042 0021 9979 3.36 0008 0004 99962.37 0178 0089 9911 2.87 0041 0021 9979 3.37 0008 0004 99962.38 0173 0087 9913 2.88 0040 0020 9980 3.38 0007 0004 99962.39 0168 0084 9916 2.89 0039 0019 9981 3.39 0007 0003 99972.40 0164 0082 9918 2.90 0037 0019 9981 3.40 0007 0003 99972.41 0160 0080 9920 2.91 0036 0018 9982 3.41 0006 0003 99972.42 0155 0078 9922 2.92 0035 0018 9982 3.42 0006 0003 99972.43 0151 0075 9925 2.93 0034 0017 9983 3.43 0006 0003 99972.44 0147 0073 9927 2.94 0033 0016 9984 3.44 0006 0003 99972.45 0143 0071 9929 2.95 0032 0016 9984 3.45 0006 0003 99972.46 0139 0069 9931 2.96 0031 0015 9985 3.46 0005 0003 99972.47 0135 0068 9932 2.97 0030 0015 9985 3.47 0005 0003 99972.48 0131 0066 9934 2.98 0029 0014 9986 3.48 0005 0003 99972.49 0128 0064 9936 2.99 0028 0014 9986 3.49 0005 0002 99982.50 0124 0062 9938 3.00 0027 0013 9987 3.50 0005 0002 99982.51 0121 0060 9940 3.01 0026 0013 9987 3.51 0004 0002 99982.52 0117 0059 9941 3.02 0025 0013 9987 3.52 0004 0002 99982.53 0114 0057 9943 3.03 0024 0012 9988 3.53 0004 0002 99982.54 0111 0055 9945 3.04 0024 0012 9988 3.54 0004 0002 99982.55 0108 0054 9946 3.05 0023 0011 9989 3.55 0004 0002 99982.56 0105 0052 9948 3.06 0022 0011 9989 3.56 0004 0002 99982.57 0102 0051 9949 3.07 0021 0011 9989 3.57 0004 0002 99982.58 0099 0049 9951 3.08 0021 0010 9990 3.58 0003 0002 99982.59 0096 0048 9952 3.09 0020 0010 9990 3.59 0003 0002 99982.60 0093 0047 9953 3.10 0019 0010 9990 3.60 0003 0002 99982.61 0091 0045 9955 3.11 0019 0009 9991 3.61 0003 0002 99982.62 0088 0044 9956 3.12 0018 0009 9991 3.62 0003 0001 99992.63 0085 0043 9957 3.13 0017 0009 9991 3.63 0003 0001 99992.64 0083 0041 9959 3.14 0017 0008 9992 3.64 0003 0001 99992.65 0080 0040 9960 3.15 0016 0008 9992 3.65 0003 0001 99992.66 0078 0039 9961 3.16 0016 0008 9992 3.66 0003 0001 99992.67 0076 0038 9962 3.17 0015 0008 9992 3.67 0002 0001 99992.68 0074 0037 9963 3.18 0015 0007 9993 3.68 0002 0001 99992.69 0071 0036 9964 3.19 0014 0007 9993 3.69 0002 0001 99992.70 0069 0035 9965 3.20 0014 0007 9993 3.70 0002 0001 99992.71 0067 0034 9966 3.21 0013 0007 9993 3.71 0002 0001 99992.72 0065 0033 9967 3.22 0013 0006 9994 3.72 0002 0001 99992.73 0063 0032 9968 3.23 0012 0006 9994 3.73 0002 0001 99992.74 0061 0031 9969 3.24 0012 0006 9994 3.74 0002 0001 99992.75 0060 0030 9970 3.25 0012 0006 9994 3.75 0002 0001 99992.76 0058 0029 9971 3.26 0011 0006 9994 3.76 0002 0001 99992.77 0056 0028 9972 3.27 0011 0005 9995 3.77 0002 0001 99992.78 0054 0027 9973 3.28 0010 0005 9995 3.78 0002 0001 99992.79 0053 0026 9974 3.29 0010 0005 9995 3.79 0002 0001 9999
Trang 27820 APPENDIX
Table A.2 Critical Values (Percentiles) for the Standard Normal Distribution
The fourth column is theN(0, 1) percentile for the percent given in column one It is also the upperone-sidedN(0, 1) critical value and two-sidedN(0, 1) critical value for the significance levels given
in columns two and three, respectively
Percent One-sided Two-sided z Percent One-sided Two-sided z
Trang 28APPENDIX 821Table A.3 Critical Values (Percentiles) for the Chi-Square Distribution
For each degree of freedom (d.f.) in the first column, the table entries are the critical values for the upperone-sided significance levels in the column headings or, equivalently, the percentiles for the correspondingpercentages
2 · 2
Trang 29822 APPENDIX
Table A.4 Critical Values (Percentiles) for the t -Distribution
The table entries are the critical values (percentiles) for thet-distribution The column headed d.f (degrees of freedom)gives the degrees of freedom for the values in that row The columns are labeled by “percent,” “one-sided,” and “two-sided.” “Percent” is 100 × cumulative distribution function—the table entry is the corresponding percentile “One-sided”
is the significance level for the one-sided upper critical value—the table entry is the critical value “Two-sided” gives thetwo-sided significance level—the table entry is the corresponding two-sided critical value
Percent
One-Sidedα.25 10 05 025 01 005 0025 001 0005 00025 0001 00005
Trang 34APPENDIX 827Table A.7 Fisher’s Exact Test for 2 × 2 Tables
Consider a 2 × 2 table: a A− a|A
b B− b|B with rows and/or columns exchanged so that (1)
A≥ B and (2) (a/A) ≥(b /B ) The table entries are ordered lexicographically byA(ascending).B(descending) anda(descending) Foreach triple(A ,B ,a )the table presents critical values for one-sided tests of the hypothesis that the true proportioncorresponding toa /Ais greater than the true proportion corresponding tob /B Significance levels of 0.05, 0.025,and 0.01 are considered ForA≤ 15 all values where critical values exist are tabulated For each significancelevel two columns give (1) the nominal critical value forb(i.e., reject the null hypothesis if the observed bisless than or equal to the table entry) and (2) thep-value corresponding to the critical value (this is less than thenominal significance level in most cases due to the discreteness of the distribution)
Trang 36APPENDIX 829Table A.7 (continued )
Trang 38APPENDIX 831Table A.7 (continued )
Trang 40APPENDIX 833Table A.8 Sample Sizes for Comparing Two Proportions with a One-Sided Fisher’s Exact Test in