In Section 8.9 we created a person-year data set from the Framingham Heart Study for Poisson regression analysis. Patients were divided into strata based on age, body mass index, serum cholesterol, and baseline diastolic blood pressure. Age was classified into nine strata. The first and last consisted of people≤45 years of age, and people older than eighty years, respectively.
The inner strata consisted of people 46–50, 51–55,. . ., and 76–80 years of age. The values of the other variables were divided into strata defined by quartiles. Each record in this data set consists of a number of person-years of follow-up of people of the same gender who are in the same strata for age, body mass index, serum cholesterol, and diastolic blood pressure. Let nkbe the number of person-years of follow-up in thekthrecord of this file and letdkbe the number of cases of coronary heart disease observed during thesenkperson-years of follow-up. Let
malek =
1: if recordkdescribes men 0: if it describes women, agej k =
1: if recordkdescribes people from the jthage stratum 0: otherwise,
299 9.2. An example: The Framingham Heart Study
bmij k =
1: if recordkdescribes people from the jthBMI quartile 0: otherwise,
sclj k =
1: if recordkdescribes people from the jthSCL quartile 0: otherwise, and
dbpj k =
1: if recordkdescribes people from the jthDBP quartile 0: otherwise.
For any model that we will consider let
xk= (xj k1,xj k2,. . .,xj kq) denote the values of all of the covariates of people in thekth record that are included in the model,
λk= E[dk/nk |xk] be the expected CHD incidence rate for people in the kthrecord given their covariatesxk. We will now build several models with these data.
9.2.1. A Multiplicative Model of Gender, Age and Coronary Heart Disease
Consider the model
log[E[dk |xk]]=log[nk]+α+ 9
j=2
βj ×agej k+γ ×malek, (9.10) whereα,β2,β3,. . .,β9 and γ are parameters in the model. Subtracting log[nk] from both sides of equation (9.10) gives that
log[λk]=α+ 9
j=2
βj×agej k+γ ×malek. (9.11)
If recordf describes women from the first age stratum then equation (9.11) reduces to
log[λf]=α. (9.12)
If recordgdescribes men from the first stratum then equation (9.11) reduces to
log[λg]=α+γ. (9.13)
Subtracting equation (9.12) from equation (9.13) gives that log[λg]−log[λf]=log[λg/λf]=(α+γ)−α=γ.
300 9. Multiple Poisson regression
In other words,γis the log relative risk of CHD for men compared to women within the first age stratum. Similarly, if recordsfandgnow describe women and men, respectively, from the jthage stratum withj >1 then
log[λf]=α+βj and (9.14)
log[λg]=α+βj+γ. (9.15)
Subtracting equation (9.14) from (9.15) again yields that log[λg/λf]=γ.
Hence,γ is the within-stratum (i.e., age-adjusted) log relative risk of CHD for men compared to women for all age strata. We have gone through vir- tually identical arguments many times in previous chapters. By subtracting appropriate pairs of log incidence rates you should also be able to show that βj is the sex-adjusted log relative risk of CHD for people from the jthage stratum compared to the first, and thatβj+γis the log relative risk of CHD for men from the jth stratum compared to women from the first. Hence, the age-adjusted risk for men relative to women is exp[γ], the sex-adjusted risk of people from the jthage stratum relative to the first is exp[βj], and the risk for men from the jthage stratum relative to women from the first is exp[γ]×exp[βj]. Model (9.10) is called a multiplicative model because this latter relative risk equals the risk for men relative to women times the risk for people from the jthstratum relative to people from the first.
The maximum likelihood estimate ofγ in (9.10) is ˆγ =0.6912.Hence, the age-adjusted estimate of the relative risk of CHD in men compared to women from this model is exp[0.6912]=2.00. The standard error of γˆ is 0.0527. Therefore, from equation (9.9), the 95% confidence interval for this relative risk is (exp[0.6912−1.96×0.0527], exp[0.6912+1.96× 0.0527])=(1.8, 2.2). This risk estimate is virtually identical to the estimate we obtained from model (7.16), which was a proportional hazards model with ragged entry.
Model (9.10) is not of great practical interest because we know from Chapter 7 that the risk of CHD in men relative to women is greater for premenopausal ages than for postmenopausal ages. The incidence of CHD in women in the jthstratum is the sum of all CHD events in women from this stratum divided by the total number of women-years of follow-up in this strata. That is, this incidence is
Iˆ0j =
{k:malek=0,agejk=1}
dk
{k:malek=0,agejk=1}
nk. (9.16)
301 9.2. An example: The Framingham Heart Study
Age 0
5 10 15 20 25 30 35
40 Men
Women
45 45-50 50-55
55-60 60-65
65-70 70-75
75-80 81
CHD Morbidity Rate per 1000
Figure 9.1 Age–sex specific incidence of coronary heart disease (CHD) in people from the Framingham Heart Study (Levy, 1999).
Similarly, the incidence of CHD in men from the jth age stratum can be estimated by
Iˆ1j =
{k:malek=1,agejk=1}
dk
{k:malek=1,agejk=1}
nk. (9.17)
Equations (9.16) and (9.17) are used in Figure 9.1 to plot the age-specific incidence of CHD in men and women from the Framingham Heart Study.
This figure shows dramatic differences in CHD rates between men and women; the ratio of these rates at each age diminishes as people grow older. To model these rates effectively we need to add interaction terms into our model.
9.2.2. A Model of Age, Gender and CHD with Interaction Terms
Let us expand model (9.10) as follows:
log[E[dk |xk]]=log[nk]+α+ 9
j=2
βj ×agej k+γ ×malek
+ 9
j=2
δj×agej k×malek. (9.18) If recordfdescribes women from the jthage stratum with j >1 then model (9.18) reduces to
log[λf]=α+βj. (9.19)
302 9. Multiple Poisson regression
Table 9.2. Age-specific relative risks of coronary heart disease (CHD) in men compared to women from the Framingham Heart Study (Levy, 1999). These relative risk estimates were obtained from model (9.18). Five-year age intervals are used. Similar relative risks from contiguous age strata have been highlighted.
Patient-years of
follow-up CHD events
Age Men Women Men Women Relative risk
95% confidence interval
≤45 7370 9205 43 9 5.97 2.9–12
46–50 5835 7595 53 25 2.76 1.7–4.4
51–55 6814 9113 110 46 3.20 2.3–4.5
56–60 7184 10 139 155 105 2.08 1.6–2.7
61–65 6678 9946 178 148 1.79 1.4–2.2
66–70 4557 7385 121 120 1.63 1.3–2.1
71–75 2575 4579 94 88 1.90 1.4–2.5
76–80 1205 2428 50 59 1.71 1.2–2.5
≥81 470 1383 19 50 1.12 0.66–1.9
If record gdescribes men from the same age stratum then model (9.18) reduces to
log[λg]=α+βj + γ +δj. (9.20)
Subtracting equation (9.19) from equation (9.20) gives the log relative risk of CHD for men versus women in the jthage stratum to be
log[λg/.λf]= γ +δj. (9.21)
Hence, we estimate this relative risk by
exp[ ˆγ +δˆj]. (9.22)
A similar argument gives that the estimated relative risk of men compared to women in the first age stratum is
exp[ ˆγ]. (9.23)
Equations (9.22) and (9.23) are used in Table 9.2 to estimate the age-specific relative risks of CHD in men versus women. Ninety-five percent confidence intervals are calculated for these estimates using equation (9.9).
When models (9.10) and (9.18) are fitted to the Framingham Heart data they produce model deviances of 1391.3 and 1361.6, respectively. Note that model (9.10) is nested within model (9.18) (see Section 5.24). Hence, under the null hypothesis that the multiplicative model (9.10) is correct, the change
303 9.2. An example: The Framingham Heart Study
Table 9.3. Age-specific relative risks of coronary heart disease (CHD) in men compared to women from the Framingham Heart Study (Levy, 1999). Age intervals from Table 9.2 that had similar relative risks have been combined in this figure giving age intervals with variable widths.
Patient-years of
follow-up CHD events
Age Men Women Men Women Relative risk
95% confidence interval
≤45 7370 9205 43 9 5.97 2.9–12
46–55 12 649 16 708 163 71 3.03 2.3–4.0
56–60 7184 10 139 155 105 2.08 1.6–2.7
61–80 15 015 24 338 443 415 1.73 1.5–2.0
≥81 470 1383 19 50 1.12 0.66–1.9
in deviance will have a chi-squared distribution with as many degrees of freedom as there are extra parameters in model (9.18). As there are eight more parameters in model (9.18) than model (9.10) this chi-squared statistic will have eight degrees of freedom. The probability that this statistic will exceed 1391.3−1361.6=29.7 isP =0.0002.Hence, these data allow us to reject the multiplicative model with a high level of statistical significance.
Table 9.2 shows a marked drop in the risk of CHD in men relative to women with increasing age. Note, however, that the relative risks for ages 46–50 and ages 51–55 are similar, as are the relative risks for ages 61–65 through 76–80. Hence, we can reduce the number of age strata from nine to five with little loss in explanatory power by lumping ages 46–55 into one stratum and ages 61–80 into another. This reduces the number of parameters in model (9.18) by eight (four age parameters plus four interaction parame- ters). Refitting model (9.18) with only these five condensed age strata rather than the original nine gives the results presented in Table 9.3. Note that the age-specific relative risk of men verses women in this table diminishes with age but remains significantly different from one for all ages less than 80.
Gender does not have a significant influence on the risk of CHD in people older than 80. These data are consistent with the hypothesis that endogenous sexhormones play a cardioprotective role in premenopausal women.
9.2.3. Adding Confounding Variables to the Model
Let us now consider the effect of possibly confounding variables on our esti- mates in Table 9.3. The variables that we will consider are body mass index, serum cholesterol, and diastolic blood pressure. We will add these variables
304 9. Multiple Poisson regression
one at a time in order to gauge their influence on the model deviance. As we will see in Section 9.3, all of these variables have an overwhelmingly significant effect on the change in model deviance. The final model that we will consider is
log[E[dk |xk]]=log[nk]+α+ 5 j=2
βj ×agej k+γ ×malek
+ 5
j=2
δj ×agej k×malek+ 4
f=2
θf ×bmif k
+ 4 g=2
φg ×sclg k+ 4 h=2
ψh×dbphk, (9.24) where the age strata are those given in Table 9.3 rather than those given at the beginning of Section 9.2. Recall thatbmif k,sclg k, anddbphk are in- dicator covariates corresponding to the four quartiles of body mass index, serum cholesterol, and diastolic blood pressure, respectively. By the usual argument, the age-specific CHD risk for men relative to women adjusted for body mass index, serum cholesterol, and diastolic blood pressure is either
exp[γ] (9.25)
for the first age stratum or
exp[γ +δj] (9.26)
for the other age strata. Substituting the maximum likelihood estimates of γ andδj into equations (9.25) and (9.26) gives the adjusted relative risk estimates presented in Table 9.4. Comparing these results with those of Table 9.4. Age-specific relative risks of coronary heart disease (CHD) in men compared to women adjusted for body mass index, serum cholesterol, and baseline diastolic blood pressure (Levy, 1999). These risks were derived using model (9.24). They should be compared with those from Table 9.3.
Adjusted 95% confidence Age relative risk interval
≤45 4.64 2.3–9.5
46–55 2.60 2.0–3.4
56–60 1.96 1.5–2.5
61–80 1.79 1.6–2.0
≥81 1.25 0.73–2.1
305 9.3. Using Stata to perform Poisson regression
Table 9.3 indicates that adjusting for body mass index, serum cholesterol, and diastolic blood pressure does reduce the age-specific relative risk of CHD in men versus women who are less than 56 years of age.