Let us return to the Framingham didactic data set introduced in Sections 2.19.1 and 3.10. This data set contains long-term follow-up and cardiovas- cular outcome data on a large cohort of men and women. We will investigate the effects of gender and baseline diastolic blood pressure (DBP) on coronary heart disease (CHD) adjusted for other risk factors. Analyzing a complex real data set involves a considerable degree of judgment, and there is no single correct way to proceed. The following, however, includes the typical components of such an analysis.
7.5.1. Univariate Analyses
The first step is to perform a univariate analysis on the effects of DBP on CHD. Figure 7.1 shows a histogram of baseline DBP in this cohort. The range of blood pressures is very wide. Ninety-five per cent of the observations lie between 60 and 110 mm Hg. However, the data set is large enough that there are still 150 subjects with DBP≤60 and 105 patients with pressures greater than 110. We subdivide the study subjects into seven groups based on their DBPs. Group 1 consists of patients with DBP≤60, Group 2 has DBPs between 61 and 70, Group 3 has DBPs between 71 and 80, et cetera. The last
231 7.5. An example: The Framingham Heart Study
Baseline Diastolic Blood Pressure
40 60 80 100 120 140
0 100 200 300 400 500 600
Number of Study Subjects
Figure 7.1 Histogram of baseline diastolic blood pressure among subjects from the Framingham Heart Study (Levy et al., 1999). These pressures were collected prior to the era of effective medical control of hypertension.
Kaplan-Meier survival estimates, by dbp_gr1
Proportion Without CHD
analysis time
0 5 10 15 20 25 30
0
.2 .4 .6 .8
1 .....
.
.
0 5 10 15 20 25 30
0.0 0.2 0.4 0.6 0.8 1.0
Years of Follow-up
Proportion Without CHD
< 60 61–70 71–80 81–90 91–100 101–110
> 110 Baseline DBP
Figure 7.2 Effect of baseline diastolic blood pressure (DBP)on the risk of subsequent coronary heart disease (CHD). The proportion of subjects who subsequently develop CHD increases steadily with increasing DBP. This elevation in risk per- sists for over 30 years (Framingham Heart Study, 1997).
group (Group 7) has DBPs greater than 110. Figure 7.2 shows the Kaplan–
Meier CHD free survival curves for these groups. The risk of CHD increases markedly with increasing DBP. The logrank χ2 statistic equals 260 with sixdegrees of freedom (P <10−52). Hence, we can reject the hypothesis that the survival curves for these groups are all equal with overwhelming
232 7. Hazard regression analysis
statistical significance. Moreover, the logrank tests of each adjacent pair of survival curves are also statistically significant. Hence, the data provides clear evidence that even modest increases in baseline DBP are predictive of increased risk of CHD.
Estimating the relative risks associated with different baseline blood pres- sures proceeds in exactly the same way as for estimating odds ratios in logistic regression. Let
dbpi j =
1: if theithpatient is in DBP Groupj 0: otherwise.
Then a simple proportional hazards model for estimating the relative risks associated with these blood pressures is
λi[t]=λ0[t] exp[β2×dbpi2+β3×dbpi3+β4×dbpi4+β5×dbpi5
+β6×dbpi6+β7×dbpi7]. (7.4)
For a patient in DBP Group 1, the hazard equalsλ0[t] exp[β2×0+β3× 0+ ã ã ã +β7ì0]=λ0[t]. For a patient in Group j, the hazard is λ0[t] exp[βj×1] for 2 ≤ j ≤7. Dividing λ0[t] exp[βj]t by λ0[t]t gives the relative risk for patients in DBP Groupjrelative to Group 1, which is exp[βj]. The log relative risk for Group jcompared to Group 1 is βj. Let ˆβj denote the maximum likelihood estimate of βj and let se[ ˆβj]=
√sj jdenote the estimated standard error of ˆβj. Then the estimated relative risk for subjects in Groupj relative to those in Group 1 is exp[ ˆβj]. The 95% confidence interval for this risk is
(exp[ ˆβj −1.96se[ ˆβj]], exp[ ˆβj +1.96se[ ˆβj]]). (7.5) Table 7.1 shows the estimates of βj together with the corresponding relative risk estimates and 95% confidence intervals that result from mo- del (7.4). These estimates are consistent with Figure 7.2 and confirm the importance of DBP as a predictor of subsequent risk of coronary heart disease.
Figure 7.3 shows the Kaplan–Meier CHD morbidity curves for men and women from the Framingham Heart Study. The logrank statistic for com- paring these curves has one degree of freedom and equals 155. This statistic is also highly significant (P <10−34). Let
mal ei =
1: ifithsubject is a man 0: ifithsubject is a woman.
Then a simple hazard regression model for estimating the effects of gender
233 7.5. An example: The Framingham Heart Study
Table 7.1. Effect of baseline diastolic blood pressure on coronary heart disease. The Framingham Heart Study data were analyzed using model (7.4).
Baseline Cases of
diastolic blood Number of coronary heart Relative 95% confidence
pressure subjects disease βˆj risk interval
≤60 mm Hg 150 18 1.0∗
61 – 70 mm Hg 774 182 0.677 1.97 (1.2 – 3.2)
71 – 80 mm Hg 1467 419 0.939 2.56 (1.6 – 4.1)
81 – 90 mm Hg 1267 404 1.117 3.06 (1.9 – 4.9)
91 –100 mm Hg 701 284 1.512 4.54 (2.8 – 7.3)
101 – 110 mm Hg 235 110 1.839 6.29 (3.8 – 10)
>110 mm Hg 105 56 2.247 9.46 (5.6 – 16)
Total 4699 1473
∗Denominator of relative risk
Cumulative CHD Morbidity
analysis time
0 5 10 15 20 25 30
0
.1 .2 .3 .4 .5
Men
Women
Cumulative CHD Morbidity
analys is tim e
0 5 10 15 20 25 30
0
.1 .2 .3 .4 .5
0 5 10 15 20 25 30
0.0 0.1 0.2 0.3 0.4 0.5
Women Men
Years of Follow-up
Cumulative CHD Morbidity
Figure 7.3 Cumulative incidence of coronary heart disease (CHD)in men and women from the Framingham Heart Study (Levy et al., 1999).
on CHD is
λi[t]=λ0[t] exp[β×mal ei]. (7.6)
This model gives ˆβ =0.642 with standard error se[ ˆβ]=0.0525.Therefore, the estimated relative risk of CHD in men relative to women is exp[0.642]= 1.90.We calculate the 95% confidence interval for this risk to be (1.71–2.11) using equation (7.5).
234 7. Hazard regression analysis
7.5.2. Multiplicative Model of DBP and Gender on Risk of CHD
The next step is to fit a multiplicative model of DBP and gender on CHD (see Section 5.18). Consider the model
λi[t]=λ0[t] ex p 7
h=2
βh×dbpi h+γ ×malei
. (7.7)
The interpretation of this model is precisely analogous to that for model (5.38) in Section 5.19. To derive any relative risk under this model, write down the hazards for patients in the numerator and denominator of the desired relative risk. Then, divide the numerator hazard by the denominator hazard. You should be able to convince yourself that
rβhis the log relative risk of women in DBP Grouphrelative to women in DBP Group 1,
rγ is the log relative risk of men in DBP Group 1 relative to women in DBP Group 1, and
rβh+γ is the log relative risk of men in DBP Grouphrelative to women in DBP Group 1.
rIfRhis the relative risk of being in Grouphvs. Group 1 among women, andRmis the relative risk of men vs. women among people in Group 1, then the relative risk of men in Grouphrelative to women in Group 1 equalsRh×Rm. In other words, the effects of gender and blood pressure in model (7.7) are multiplicative.
Model (7.7) was used to produce the relative risk estimates in Table 7.2. Note that model (7.4) is nested within model (7.7). That is, ifγ =0 then model (7.7) reduces to model (7.4). This allows us to use the change in model deviance to test whether adding gender improves the fit of the model to the data. This change in deviance isD=133, which has an approximately chi- squared distribution with one degree of freedom under the null hypothesis thatγ =0. Hence, we can reject this null hypothesis with overwhelming statistical significance (P <10−30).
7.5.3. Using Interaction Terms to Model the Effects of Gender and DBP on CHD We next add interaction terms to weaken the multiplicative assumption in model (7.7) (see Sections 5.18 and 5.19). Let
λi[t]=λ0[t] exp 7
h=2
βh×dbpi h+γ×malei+ 7 h=2
δh×dbpi h×malei
. (7.8) This model is analogous to model (5.40) for esophageal cancer. For men in DBP Group h, the hazard is λ0[t] exp[βh+γ +δh]. For women in
235 7.5. An example: The Framingham Heart Study
Table 7.2. Effect of gender and baseline diastolic blood pressure on coronary heart disease. The Framingham Heart Study data are analyzed using the multiplicative model (7.7).
Gender
Women Men
Baseline
diastolic blood Relative 95% confidence Relative 95% confidence
pressure risk interval risk interval
≤60 mm Hg 1.0* 1.83 (1.7–2.0)
61–70 mm Hg 1.91 (1.2–3.1) 3.51 (2.1–5.7)
71–80 mm Hg 2.43 (1.5–3.9) 4.46 (2.8–7.2)
81–90 mm Hg 2.78 (1.7–4.5) 5.09 (3.2–8.2)
91–100 mm Hg 4.06 (2.5–6.5) 7.45 (4.6–12)
101–110 mm Hg 5.96 (3.6–9.8) 10.9 (6.6–18)
>110 mm Hg 9.18 (5.4 – 15) 16.8 (9.8 – 29)
∗Denominator of relative risk
Group 1, the hazard isλ0[t]. Hence, the relative risk for men in DBP Group hrelative to women in DBP Group 1 is (λ0[t] exp[βh+γ +δh])/λ0[t]= exp[βh+γ +δh]. This model was used to generate the relative risks in Table 7.3. Note the marked differences between the estimates in Table 7.2 and 7.3. Model (7.8) indicates that the effect of gender on the risk of CHD is greatest for people with low or moderate blood pressure and diminishes as blood pressure rises. Gender has no effect on CHD for people with a DBP above 110 mm Hg.
Model (7.7) is nested within model (7.8). Hence, we can use the change in the model deviance to test the null hypothesis that the multiplicative model is correct. This change in deviance isD=21.23.Model (7.8) has sixmore parameters than model (7.7). Therefore, under the null hypothesis D has an approximately chi-squared distribution with six degrees of freedom.
The probability that this statistic exceeds 21.23 is P =0.002. Thus, the evidence of interaction between DBP and gender on CHD risk is statistically significant.
7.5.4. Adjusting for Confounding Variables
So far we have not adjusted our results for other confounding variables. Of particular importance is age at baseline exam. Figure 7.4 shows that this age varied widely among study subjects. As both DBP and risk of CHD increases
236 7. Hazard regression analysis
Table 7.3. Effect of gender and baseline diastolic blood pressure on coronary heart disease. The Framingham Heart Study data are analyzed using model (7.8), which includes interaction terms for the joint effects of gender and blood pressure.
Gender
Women Men
Baseline
diastolic blood Relative 95% confidence Relative 95% confidence
pressure risk interval risk interval
≤60 mm Hg 1.0∗ 2.37 (0.94–6.0)
61–70 mm Hg 1.83 (0.92–3.6) 4.59 (2.3–9.1)
71–80 mm Hg 2.43 (1.2–4.7) 5.55 (2.9–11)
81–90 mm Hg 3.52 (1.8–6.9) 5.28 (2.7–10)
91 –100 mm Hg 4.69 (2.4–9.3) 8.28 (4.2–16)
101 – 110 mm Hg 7.64 (3.8–15) 10.9 (5.4–22)
>110 mm Hg 13.6 (6.6–28) 13.0 (5.9–29)
∗Denominator of relative risk
Percent of Study Subjects
Age in Years
30 35 40 45 50 55 60 65
0 1 2 3 4
Figure 7.4 Histogram showing the age at baseline exam of subjects in the Framingham Heart Study (Levy et al., 1999).
markedly with age, we would expect age to strongly confound the effect of DBP on CHD. Other potential confounding variables that we may wish to consider include body mass indexand serum cholesterol. Letagei,bmii, andscli denote the age, body mass index, and serum cholesterol of theith patient. We add these variables one at a time giving models
237 7.5. An example: The Framingham Heart Study
λi[t]=λ0[t] exp 7
h=2
βh×dbpi h+γ ×malei + 7 h=2
δh×dbpi h
×malei+θ1×agei
, (7.9)
λi[t]=λ0[t] exp 7
h=2
βh×dbpi h+γ ×malei+ 7 h=2
δh×dbpi h
×malei+θ1×agei+θ2×bmii
, and (7.10)
λi[t]=λ0[t] exp 7
h=2
βh×dbpi h+γ ×malei+ 7 h=2
δh×dbpi h
×mal ei+θ1×agei+θ2×bmii+θ3×scli
. (7.11)
Note that model (7.8) is nested within model (7.9), model (7.9) is nested within model (7.10), and model (7.10) is nested within model (7.11). Hence, we can derive the change in model deviance with each successive model to test whether each new term significantly improves the model fit. These tests show that age, body mass index, and serum cholesterol all substantially improve the model fit, and that the null hypotheses θ1 =0, θ2=0, and θ3 =0 may all be rejected with overwhelming statistical significance. These tests also show that these variables are important independent predictors of CHD. Table 7.4 shows the estimated relative risks of CHD associated with DBP and gender that are obtained from model (7.11).
7.5.5. Interpretation
Tables 7.2, 7.3, and 7.4 are all estimating similar relative risks from the same data set. It is therefore sobering to see how different these estimates are. It is very important to understand the implications of the models used to derive these tables. Men in DBP Group 1 have a lower risk in Table 7.2 than in Table 7.3 while the converse is true for men in DBP Group 7. This is because model (7.7) forces the relative risks in Table 7.2 to obey the multiplicative assumption while model (7.8) permits the effect of gender on CHD to diminish with increasing DBP.
238 7. Hazard regression analysis
Table 7.4. Effect of gender and baseline diastolic blood pressure (DBP)on coronary heart disease. The Framingham Heart Study data are analyzed using model (7.11). This model includes gender–DBP interaction terms and adjusts for age, body mass index and serum cholesterol.
Gender
Women Men
Baseline
diastolic blood Relative 95% confidence Relative 95% confidence
pressure risk interval risk interval
≤60 mm Hg 1.0* 1.98 (0.79–5.0)
61–70 mm Hg 1.51 (0.76–3.0) 3.53 (1.8–7.0)
71–80 mm Hg 1.65 (0.85–3.2) 3.88 (2.0–7.6)
81–90 mm Hg 1.91 (0.98–3.7) 3.33 (1.7–6.5)
91–100 mm Hg 1.94 (0.97–3.9) 4.86 (2.5–9.5)
101–110 mm Hg 3.10 (1.5–6.3) 6.29 (3.1–13)
>110 mm Hg 5.27 (2.5–11) 6.40 (2.9–14)
*Denominator of relative risk
†Adjusted for age, body mass index, and serum cholesterol.
The relative risks in Table 7.4 are substantially smaller than those in Table 7.3. It is important to realize that the relative risks in Table 7.4 compare people of the same age, body mass index, and serum cholesterol while those of Table 7.3 compare people without regard to these three risk factors. Our analyses show that age, body mass index, and serum cholesterol are risk factors for CHD in their own right. Also, these risk factors are positively correlated with DBP. Hence, it is not surprising that the unadjusted risks of DBP and gender in Table 7.3 are inflated by confounding due to these other variables. In general, the decision as to which variables to include as confounding variables is affected by how the results are to be used and our knowledge of the etiology of the disease being studied. For example, since it is easier to measure blood pressure than serum cholesterol, it might be more clinically useful to know the effect of DBP on CHD without adjustment for serum cholesterol. If we are trying to establish a causal link between a risk factor and a disease, then it is important to avoid adjusting for any condition that may be an intermediate outcome between the initial cause and the final outcome.
239 7.6. Cox—Snell generalized residuals and proportional hazards models
7.5.6. Alternative Models
In model (7.11) we treat age, body mass index, and serum cholesterol as con- tinuous variables while DBP is recoded into seven dichotomous variables involving sixparameters. We could have treated these confounding variables in the same way as DBP. In our analysis of esophageal cancer in Chapter 5 we treated age in precisely this way. In general, it is best to recode those continu- ous variables that are of primary interest into several dichotomous variables in order to avoid assuming a linear relationship between these risk factors and the log relative risk. It may, however, be reasonable to treat confound- ing variables as continuous. The advantage of putting a continuous variable directly into the model is that it requires only one parameter. The cost of making the linear assumption may not be too important for a variable that is included in the model only because it is a confounder. A method for testing the adequacy of the linearity assumption using orthogonal polynomials is discussed in Hosmer and Lemeshow (1989).