Example: The Ibuprofen in Sepsis Trial

Một phần của tài liệu Statistical modeling for medical researcher (Trang 143 - 146)

The Ibuprofen and Sepsis Trial contained 454 patients with known baseline APACHE scores. The 30-day mortality data for these patients is summa- rized in Table 4.1. Letxidenote the distinct APACHE scores observed in this study. Letmibe the number of patients with baseline scorexiand letdi be the number of patients with this score who died within 30 days. Applying the logistic regression model (4.19) to these data yields parameter estimates

125 4.17. Example: The Ibuprofen in Sepsis Trial

Table 4.1. Survival data from the Ibuprofen in Sepsis Study. The number of patients enrolledwith the indicatedbaseline APACHE score is given together with the number of subjects who died within 30 days of entry into the study (Bernardet al., 1997).

Baseline Baseline

APACHE Number of Number of APACHE Number of Number of

score patients deaths score patients deaths

0 1 0 20 13 6

2 1 0 21 17 9

3 4 1 22 14 12

4 11 0 23 13 7

5 9 3 24 11 8

6 14 3 25 12 8

7 12 4 26 6 2

8 22 5 27 7 5

9 33 3 28 3 1

10 19 6 29 7 4

11 31 5 30 5 4

12 17 5 31 3 3

13 32 13 32 3 3

14 25 7 33 1 1

15 18 7 34 1 1

16 24 8 35 1 1

17 27 8 36 1 1

18 19 13 37 1 1

19 15 7 41 1 0

αˆ = −2.290 327 and ˆβ =0.115 627. The estimated variance of ˆαand ˆβis sα2ˆ =0.076 468 andsβ2ˆ =0.000 256, respectively. The estimated covariance between ˆαand ˆβissαˆβˆ = −0.004 103.Substituting these values into equa- tions (4.20) through (4.25) provides estimates of the probability of death given a baseline score ofx, 95% confidence intervals for these estimates, and 95% confidence intervals for the observed proportion of deaths at any given score. For example, patients with a baseline score of 20 will have a linear pre- dictor of ˆα+βˆ×20=0.0222.Substituting this value into equation (4.20) gives ˆπ[20]=exp[0.0222]/(1+exp[0.0222])= 0.506. Equation (4.21) gives us that

se[ ˆα+βˆ×20]=

0.076 468−2×20×0.004 103+202×0.000 256

=0.1214.

126 4. Simple logistic regression

Substituting into equations (4.22) and (4.23) gives πˆL[20]=exp[0.0222−1.96×0.1214]/(1+exp[0.0222

−1.96×0.1214])=0.446, and

πˆU[20]=exp[0.0222 + 1.96×0.1214]/(1+exp[0.0222 +1.96×0.1214])=0.565.

Hence a 95% confidence interval forπ(20) is (0.446, 0.565).

There arem=13 patients with an APACHE score of 20;d=6 of these subjects died. Hence the observed proportions of dying and surviving pa- tients with this score arep=6/13 andq =7/13, respectively. Substituting these values into equations (4.24) and (4.25) gives

PL=

2×13× 6

13+1.962−1 −1.96

1.962−

2+ 1

13 +4× 6 13

13× 7

13+1

2 (13+1.962) =0.204

and

PU=

2×13× 6

13+1.962+1 +1.96

1.962+

2− 1

13 +4× 6 13

13× 7

13−1

2 (13+1.962) =0.739

as the bounds of the 95% confidence interval for the true proportion of deaths among people with an APACHE score of 20. Note that the difference between ( ˆπL(20), ˆπU(20)) and (PL,PU) is that the former is based on all 454 patients and the logistic regression model while the latter is based solely on the 13 patients with APACHE scores of 20.

We can perform calculations similar to those given above to generate Figure 4.6. The black dots in this figure give the observed mortality for each APACHE score. The error bars give 95% confidence intervals for the observed mortality at each score using equations (4.24) and (4.25). Confi- dence intervals for scores associated with 100% survival or 100% mortality are not given. The solid gray line gives the logistic regression curve using equation (4.20). This curve depicts the expected 30-day mortality as a func- tion of the baseline APACHE score. The dashed lines give the 95% confidence intervals for the regression curve using equations (4.22) and (4.23). These intervals are analogous to the confidence intervals for the linear regression line described in Section 2.10. Note that the expected mortality curve lies within all of the 95% confidence intervals for the observed proportion of deaths at each score. This indicates that the logistic regression model fits the

127 4.18. Logistic regression with grouped data using Stata

Observed Mortality Rate

APACHE Score at Baseline

0 5 10 15 20 25 30 35 40

0 0.2 0.4 0.6 0.8 1.0

Expected Mortality Rate

0.0 0.2 0.4 0.6 0.8 1.0

Figure 4.6 Observedandexpected30-day mortality by APACHE score in patients from the Ibuprofen in Sepsis Study (Bernard et al., 1997). The solid gray line gives the expectedmortality basedon a logistic regression model. The dashed lines give the 95% confidence region for this regression line. The black dots give the observedmortality for each APACHE score. The error bars give 95% confidence intervals for the observedmortality at each score.

data quite well. The width of the 95% confidence intervals for the regression curve depends on the number of patients studied, on the distance along thex-axis from the central mass of observations and on the proximity of the regression curve to either zero or one. Figure 4.7 shows a histogram of the number of study subjects by baseline APACHE score. This figure shows that the distribution of scores is skewed. The interquartile range is 10–20.

Note that the width of the confidence interval at a score of 30 in Figure 4.6 is considerably greater than it is at a score of 15. This reflects the fact that 30 is further from the central mass of observations than is 15, and that as a consequence the accuracy of our estimate ofπ[30] is less than that of π[15]. For very large values of thexvariable, however, we know thatπ[x]

converges to one. Hence, ˆπL[x] and ˆπU[x] must also converge to one. In Figure 4.6 the confidence intervals have started to narrow for this rea- son for scores of 40 or more. We can think of the 95% confidence inter- vals for the regression line as defining the region that most likely contains the true regression line given that the logistic regression model is, in fact, correct.

Một phần của tài liệu Statistical modeling for medical researcher (Trang 143 - 146)

Tải bản đầy đủ (PDF)

(405 trang)