Suppose that data on patient-years of follow-up can be logically grouped intoJ strata based on age or other factors, and that there are K exposure categories that affect morbidity or mortality in the population. For j = 1,. . .,J andk=1,. . .,K let
nj k be the number of person-years of follow-up observed among patients in the jthstratum who are in thekthexposure category,
dj kbe the number of morbid or mortal events observed in thesenj kperson- years of follow-up,
xj k1,xj k2,ã ã ã,xj kq be explanatory variables that describe thekthexposure group of patients in stratumj, and
xj k = (xj k1,xj k2,ã ã ã,xj kq) denote the values of all of the covariates for patients in the jthstratum andkthexposure category.
Then themultiple Poisson regressionmodel assumes that
log[E[dj k|xj k]]= log[nj k]+αj+β1xj k1+β2xj k2+ ã ã ã +βqxj kq, (9.1) 295
296 9. Multiple Poisson regression
where
α1,. . .,αJ are unknown nuisance parameters, and β1,β2,. . .,βq are unknown parameters of interest.
For example, suppose that there are J =5 age strata, and that patients are classified as light or heavy drinkers and light or heavy smokers in each stratum. Then there areK =4 exposure categories (two drinking categories times two smoking categories). We might chooseq =2 and let
xj k1=
1: for patients who are heavy drinkers 0: for patients who are light drinkers, xj k2=
1: for patients who are heavy smokers 0: for patients who are light smokers.
Then model (9.1) reduces to
log[E[dj k|xj k]]=log[nj k]+αj+ β1xj k1+ β2xj k2. (9.2) The relationship between the age strata, exposure categories, and covariates of this model is clarified in Table 9.1.
Letλj k =E[dj k/nj k|xj k] be the expected morbidity incidence rate for people from stratumjwho are in exposure categoryk.If we subtract log(nj k) from both sides of model (9.1) we get
log[E[dj k|xj k]/nj k]=log[E[dj k/nj k |xj k]]=
log[λj k]=αj+β1xj k1+β2xj k2+ ã ã ã +βqxj kqã (9.3)
Table 9.1. This table shows the relationships between the age strata, the exposure categories and the covariates of model (9.2).
Exposure Category
k=1 k=2 k=3 k=4
K =4 Light drinker Light drinker Heavy drinker Heavy drinker
J =5 light smoker heavy smoker light smoker heavy smoker
p=2 xj11=0,xj12=0 xj21=0,xj22=1 xj31=1,xj32=0 xj41=1,xj42=1 j =1 x111=0,x112=0 x121=0,x122=1 x131=1,x132=0 x141=1,x142=1 j =2 x211=0,x212=0 x221=0,x222=1 x231=1,x232=0 x241=1,x242=1 j =3 x311=0,x312=0 x321=0,x322=1 x331=1,x332=0 x341=1,x342=1 j =4 x411=0,x412=0 x421=0,x422=1 x431=1,x432=0 x441=1,x442=1
AgeStratum
j =5 x511=0,x512=0 x521=0,x522=1 x531=1,x532=0 x541=1,x542=1
297 9.1. Multiple poisson regression model
In other words, model (9.1) imposes a log linear relationship between the expected morbidity rates and the model covariates. Note that this model permits people in the same exposure category to have different morbidity rates in different strata. This is one of the more powerful features of Poisson regression in that it makes it easy to model incidence rates that vary with time.
Suppose that two groups of patients from the jth stratum have been subject to exposure categoriesfandg. Then the relative risk of an event for patients in categoryfcompared to categorygisλj f/λj g. Equation (9.3) gives us that
log[λj f]=αj +xj f1β1+xj f2β2+ ã ã ã +xj f qβq, and (9.4) log[λj g]=αj +xj g1β1+xj g2β2+ ã ã ã +xj g qβq. (9.5) Subtracting equation (9.5) from equation (9.4) gives that the within-stratum log relative risk of groupfsubjects relative to groupgsubjects is
log[λj f/λj g]=(xj f1−xj g1)β1+(xj f2−xj g2)β2+ ã ã ã +(xj f q−xj g q)βq. (9.6) Thus, we can estimate log relative risks in Poisson regression models in pre- cisely the same way that we estimated log odds ratios in logistic regression.
Indeed, the only difference is that in logistic regression weighted sums of model coefficients are interpreted as log odds ratios while in Poisson re- gression they are interpreted as log relative risks. An important feature of equation (9.6) is that the relative riskλj f/λj gmay vary between the different strata.
The nuisance parametersα1,α2,. . .,αJ are handled in the same way that we handle any parameters associated with a categorical variable. That is, for any two values ofjandhbetween 1 andJwe let
strataj h =
1: ifj =h 0: otherwise.
Then model (9.1) can be rewritten as log[E[dj k |xj k]]=log[nj k]+
J h=1
αh×strataj h
+β1xj k1+β2xj k2+ ã ã ã +βqxj kq. (9.7) Models (9.1) and (9.7) are algebraically identical. We usually write the sim- pler form (9.1) when the strata are defined by confounding variables that are
298 9. Multiple Poisson regression
not of primary interest. However, in Stata these models are always specified in a way that is analogous to equation (9.7).
We derive maximum likelihood estimates ˆαj forαj and ˆβ1,. . ., ˆβq for β1,. . .,βq. Inferences about these parameter estimates are made in the same way as in Section 5.13 through 5.15. Again the only difference is that in Section 5.14 weighted sums of parameter estimates were interpreted as estimates of log odds ratios, while here they are interpreted as log relative risks. Suppose thatf is a weighted sum of parameters that corresponds to a log relative risk of interest, ˆf is the corresponding weighted sum of parameter estimates from a large study, andsfis the estimated standard error of ˆf. Then under the null hypothesis that the relative risk exp[f]=1, the test statistic
z= fˆ/sf (9.8)
will have an approximately standard normal distribution if the sample size is large. A 95% confidence interval for this relative risk is given by
(exp[ ˆf −1.96sf], exp[ ˆf +1.96sf]). (9.9)