The following log file and comments illustrates how to use Stata to perform the calculations from the preceding section.
128 4. Simple logistic regression
APACHE Score at Baseline
0 5 10 15 20 25 30 35 40
Number of Patients
0 5 10 15 20 25 30
Figure 4.7 Histogram of baseline APACHE scores for patients from the Ibuprofen in Sep- sis Study. This distribution is skewed to the right, with few patients having scores greater than 30. The estimate of the expectedmortality rate in Figure 4.6 is most accurate over the range of scores that were most common in this study.
. * 4.18.Sepsis.log . *
. * Simple logistic regression of mortality against APACHE score . * in the Ibuprofen in Sepsis Study. Each record of 4.18.Sepsis.dta . * gives the number of patients and number of deaths among subjects . * witha specified APACHE score. These variables are named patients . * deaths and apache, respectively.
. *
. use C:\WDDtext\4.18.Sepsis.dta, clear . *
. * Calculate 95% confidence intervals for observed mortality rates . *
. generate p = deaths/patients . generate m = patients
. generate q = 1-p . generate c = 1.96
. generateci95lb= (2*m*p+cˆ2-1-c*sqrt(cˆ2-(2+1/m)+4*p*(m*q+1)))/(2*(m+cˆ2)) {1}
>if d ∼=0 & d∼= m
(11 missing values generated)
. generateci95ub= (2*m*p+cˆ2+1+c*sqrt(cˆ2+(2-1/m)+4*p*(m*q-1)))/(2*(m+cˆ2)) {2}
>if d ∼=0 & d ∼= m
129 4.18. Logistic regression with grouped data using Stata
(11 missing values generated) . *
. * Regress deaths against APACHE score . *
. glm deaths apache, family(binomial patients) link(logit) {3}
{output omitted}
Generalized linear models No. of obs = 38
{output omitted}
Variance function: V(u) = u*(1-u/patients) [Binomial]
Link function : g(u) = ln(u/(patients-u)) [Logit]
{output omitted} ---
deaths | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---+--- apache | .1156272 .0159997 7.227 0.000 .0842684 .146986 {4}
_cons | -2.290327 .2765283 -8.282 0.000 -2.832313 -1.748342 ---
. vce {5}
| apache _cons ---+---
apache | .000256
_cons | -.004103 .076468
. predict logodds, xb {6}
. generate e_prob = exp(logodds)/(1+exp(logodds)) {7}
. label variable e_prob "Expected Mortality Rate"
. *
. * Calculate 95% confidence region for e_prob . *
. predict stderr, stdp {8}
. generate lodds_lb = logodds - 1.96*stderr . generate lodds_ub = logodds + 1.96*stderr
. generate prob_lb = exp(lodds_lb)/(1+exp(lodds_lb)) {9}
. generate prob_ub = exp(lodds_ub)/(1+exp(lodds_ub)) . label variable p "Observed Mortality Rate"
130 4. Simple logistic regression
. set textsize 120
. list p e_prob prob_lb prob_ub ci95lb ci95ub apache if apache == 20 {10}
p e_prob prob_lb prob_ub ci95lb ci95ub apache 20. .4615385 .505554 .4462291 .564723 .2040146 .7388 20 . graph p e_prob prob_lb prob_ub ci95lb ci95ub apache, symbol(Oiiiii) {11}
>connect(.ll[-#]l[-#]II) gap(3) xlabel(0 5 to 40) ylabel(0 .2 to 1.0)
>ytick(.1 .2 to.9) r1title(Expected Mortality Rate) rlabel(0,.2,.4,.6,.8,1.0) {Graph omitted. See Figure 4.6} . graph apache[freq=patients], bin(42) freqgap(3) l1title(Number of Patients)
>xlabel(0, 5 to 40) ylabel(0, 5 to 30) {12}
{Graph omitted. See Figure 4.7}
Comments
1 Calculate the lower bound for a 95% confidence interval for the observed proportion of deaths using equation (4.24). This bound is not calculated if all or none of the patients survive.
2 Calculate the corresponding upper bound using equation (4.25).
3 Thefamilyandlinkoption of thisglmcommand specify a binomial ran- dom component and a logit link function. Thefamily(binomial patients) option indicates that each observation describes the outcomes of multi- ple patients with the same value ofapache;patientsrecords the number of subjects with eachapachevalue;deathsrecords the number of deaths observed among these subjects. In other words, we are fitting a logistic regression model using equation (4.20) withdi =deaths,mi =patients andxi =apache.
4 The estimated regression coefficients forapacheand the constant term _consare ˆβ =0.115 627 and ˆα= −2.290 327, respectively.
5 The vce command prints the variance–covariance matrix for the es- timated regression coefficients. This is a triangular array of numbers that gives estimates of the variance of each coefficient estimate and the covariance of each pair of coefficient estimates. In this example there are only two coefficients. The estimated variance of theapacheand cons coefficient estimates aresβ2ˆ=0.000 256 andsα2ˆ =0.076 468, respectively.
The covariance between these estimates issαˆβˆ =– 0.004 103. Note that the square roots ofsβ2ˆ andsα2ˆ equal 0.1600 and 0.2765, which are the standard errors of ˆβand ˆαgiven in the output from theglmcommand.
We do not usually need to output the variance–covariance matrix in
131 4.19. Simple 2×2 case-control studies
Stata because the predict andlincom post estimation commands can usually derive all of the statistics that are of interest. We output these terms here in order to corroborate the hand calculations performed in Section 4.17. We will introduce thelincomcommand in Section 5.20.
The variance–covariance matrix is further discussed in Section 5.17.
6 This command sets logodds equal to the linear predictor ˆα+βˆ× apache.
7 The variable e prob equals ˆπ[apache], the estimated probability of death for patients with the indicated APACHE score.
8 Thestdpoption of thispredictcommand setsstderrequal to the standard error of the linear predictor.
9 This generate command definesprob lb to equal ˆπL[x] as defined by equation (4.22). The next command setsprob ubto equal ˆπU[x].
10 This list command outputs the values of variables calculated above for the record with an APACHE score of 20. Note that these values agree with the estimates that we calculated by hand in Section 4.17.
11 The graph produced by this command is similar to Figure 4.6. The two
“I”s in theconnectoption specify that the last twoy-variables,ci95lband ci95ub, are to be connected by error bars. The 95% confidence region fore prob=πˆ[apache] are drawn with dashed lines that are specified by the “l[−#]” code in theconnectoption.
12 This histogram is similar to Figure 4.7. Thefreq option specifies that they-axis will be the number of subjects;l1titlespecifies the title of the y-axis.