4.19.1. Example: The Ille-et-Vilaine Study of Esophageal Cancer and Alcohol Tuyns et al. (1977) conducted a case-control study of alcohol, tobacco and esophageal cancer in men from the Ille-et-Vilaine district of Brittany.
Breslow and Day (1980) subsequently published these data. The cases in this study were 200 esophageal cancer patients who had been diagnosed at a district hospital between January 1972 and April 1974. The controls were 775 men who were drawn from local electoral lists. Study subjects were in- terviewed concerning their consumption of alcohol and tobacco as well as other dietary risk factors. Table 4.2 shows these subjects divided by whether they were moderate or heavy drinkers.
132 4. Simple logistic regression
Table 4.2. Cases andcontrols from the Ille-et-Vilaine study of esophageal cancer, groupedby level of daily alcohol consumption. Subjects were considered heavy drinkers if their daily consumption was≥80 grams (Breslow andDay, 1980).
Esophageal Daily alcohol consumption
cancer ≥80g < 80g Total
Yes (cases) d1=96 c1=104 m1=200 No (controls) d0=109 c0=666 m0=775
Total n1=205 n0=770 N=975
4.19.2. Review of Classical Case-Control Theory
Letπ0andπ1denote the prevalence of heavy drinking among controls and cases in the Ille-et-Vilaine case-control study, respectively. That is,πi is the probability that a control (i=0) or a case (i=1) is a heavy drinker. Then theoddsthat a control patient is a heavy drinker isπ0/(1−π0), and the odds that a case is a heavy drinker isπ1/(1−π1). Theodds ratiofor heavy drinking among cases relative to controls is
ψ =(π1/(1−π1))/(π0/(1−π0)). (4.26) Letm0 andm1 denote the number of controls and cases, respectively. Let d0andd1denote the number of controls and cases who are heavy drinkers.
Letc0andc1denote the number of controls and cases who are moderate or non-drinkers. (Note thatmi =ci+di fori=0 or 1.) Then the observed prevalence of heavy drinkers isd0/m0=109/775 for controls andd1/m1= 96/200 for cases. The observed prevalence of moderate or non-drinkers is c0/m0=666/775 for controls andc1/m1=104/200 for cases.Theobserved oddsthat a case or control will be a heavy drinker is
(di/mi)/(ci/mi)=di/ci
=109/666 and 96/104 for controls and cases, respectively. Theobserved odds ratiofor heavy drinking in cases relative to controls is
ψˆ = d1/c1
d0/c0
= 96/104
109/666 =5.64.
If the cases and controls are representative samples from their respective underlying populations then:
1 ˆψis an appropriate estimate of the true odds ratioψ for heavy drinking in cases relative to controls in the underlying population.
133 4.19. Simple 2×2 case-control studies
2 This true odds ratio also equals the true odds ratio for esophageal cancer in heavy drinkers relative to moderate drinkers.
3 If, in addition, the disease under study is rare (as is the case for esophageal cancer) then ˆψ also estimates the relative risk of esophageal cancer in heavy drinkers relative to moderate drinkers.
It is the second of the three facts listed above that makes case-control studies worth doing. We really are not particularly interested in the odds ratio for heavy drinking among cases relative to controls. However, we are very inter- ested in the relative risk of esophageal cancer in heavy drinkers compared to moderate drinkers. It is, perhaps, somewhat surprising that we can esti- mate this relative risk from the prevalence of heavy drinking among cases and controls. Note that we are unable to estimate the incidence of cancer in either heavy drinkers or moderate drinkers. See Hennekens and Buring (1987) for an introduction to case-control studies. A more mathematical explanation of this relationship is given in Breslow and Day (1980).
4.19.3. 95% Confidence Interval for the Odds Ratio: Woolf’s Method
An estimate of the standard error of the log odds ratio is selog(ψˆ)=
1 d0 + 1
c0 + 1 d1 + 1
c1
, (4.27)
and the distribution of log( ˆψ) is approximately normal. Hence, if we let ψˆL =ψˆexp
−1.96selog( ˆψ)
(4.28)
and
ψˆU =ψˆexp
1.96selog( ˆψ)
, (4.29)
then ( ˆψL, ˆψU) is a 95% confidence interval for ψ (Woolf, 1955). In the esophageal cancer and alcohol analysis
selog( ˆψ)= 1
109+ 1 666+ 1
96+ 1
104 =0.1752.
Therefore, Woolf ’s estimate of the 95% confidence interval for the odds ratio is ( ˆψL, ˆψU)=(5.64 exp[−1.96×0.1752], 5.64 exp[+1.96× 0.1752])=(4.00, 7.95).
4.19.4. Test of the Null Hypothesis that the Odds Ratio Equals One
If there is no association between exposure and disease then the odds ratioψ will equal one. Let nj be the number of study subjects who are
134 4. Simple logistic regression
(j =1) or are not (j =0) heavy drinkers and letN =n0+n1=m0+m1
be the total number of cases and controls. Under the null hypothesis that ψ =1, the expected value and variance ofd1are
E[d1|ψ =1]=n1m1/Nand var [d1|ψ =1]=m0m1n0n1/N3. Hence,
χ12 =(|d1−E[d1j |ψ =1]| −0.5)2/var[d1|ψ =1] (4.30) has aχ2distribution with one degree of freedom. In the Ille-et-Vilaine study E[d1|ψ =1]=205×200/975=42.051 and
var [d1|ψ =1]=775×200×770×205/9753=26.397.
Therefore, χ12 =(|96−42.051| −0.5)2/(26.397)=108.22. The P value associated with this statistic is < 10−24, providing overwhelming evidence that the observed association between heavy drinking and esophageal cancer is not due to chance.
In equation (4.30) the constant 0.5 that is subtracted from the numerator is known as Yates’ continuity correction (Yates, 1934). It adjusts for the fact that we are approximating a discrete distribution with a continuous normal distribution. There is an ancient controversy among statisticians as to whether such corrections are appropriate (Dupont and Plummer, 1999).
Mantel and Greenhouse (1968), Fleiss (1981), Breslow and Day (1980) and many others use this correction in calculating this statistic. However, Grizzle (1967) and others, including the statisticians at Stata, do not. This leads to a minor discrepancy between output from Stata and other statistical software.
Without the continuity correction theχ2statistic equals 110.26.
4.19.5. Test of the Null Hypothesis that Two Proportions are Equal
We also need to be able to test the null hypothesis that two proportions are equal. For example, we might wish to test the hypothesis that the proportion of heavy drinkers among cases and controls are the same. It is important to realize that this hypothesis, H0:π0 =π1, is true if and only ifψ =1.
Hence equation (4.30) may also be used to test this null hypothesis.