Based on Hosmer—Lemeshow test statistics, the regression model with the inverse DBP, LACR, and LINSUL as significant covariates for the development ofdiabetes.. title ‘‘Regression model w
Trang 1Table 14.15 Asymptotic Partial Likelihood Inference from the Regression Models with Different Link Functions for the Data in Example 14.9
95% Confidence Interval for Odds Ratio Regression Standard Chi-Square Odds
Variable Coefficient Error Statistic p Ratio Lower Upper
Model with L ogit L ink Function
INTERCPT 98.419 1.792 22.061 0.0001
Hosmer—Lemeshow test statistic 18.9460 0.0152
Model with Inverse Normal Link Function
INTERCPT 94.532 0.953 22.597 0.0001
Hosmer—Lemeshow test statistic 7.386 0.4956
Model with Log-Log Link Function
INTERCPT 97.740 1.530 25.589 0.0001
Hosmer—Lemeshow test statistic 17.415 0.0261
Example 14.10 Consider the data in Example 14.9 as nonstratified data,
(14.2.28), and (14.2.30) by using the stepwise selection method Based on
Hosmer—Lemeshow test statistics, the regression model with the inverse
DBP, LACR, and LINSUL as significant covariates for the development ofdiabetes
The following SAS, SPSS, and BMDP codes may be used to generate theresults in Table 14.15
SAS code:
data w1;
infile ‘c: ex14d2d6.dat’ missover;
input age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn;
run;
Trang 2title ‘‘Regression model with the logit link function-generalized logistic regression’’; proc logistic data : w1 descending;
model dm : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : logit;
run;
title ‘‘Regression model with the inverse normal link function‘;
proc logistic data : w1 descending;
model dm : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : probit;
run;
title ‘‘Regression model with the log-log link funtion‘;
proc logistic data : w1 descending;
model dm : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : cloglog;
run;
data list file : ‘c:ex14d2d6.dat’ free
/ age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn.
Logistic regression dm with age sex sbp dbp lacr hdl linsul smoke htn
/method : fstep
/print : all.
/input file : ‘c:ex14d2d6.dat’
The regression models in Section 14.2 can be extended to handle outcomes thathave more than two categories These categories may be nominal, for example,different types of heart disease or psychological conditions; or ordinal, forexample, different levels of glucose intolerance or different severity of communi-cation disorders An outcome variable with more than two possibilities is called
polychotomous or polytomous In this section we discuss first the model for
Trang 3nominal polychotomous outcomes(generalized logistic regression model), then
(1990), Collett (1991), and Ananth and Kleinbaum (1997)
Generalized Logistic Regression Models
Let YG denote the outcome for individual i The outcome can be one of the m nominal categories, such as different cell types of lung cancer Let YG : k denote that YG belongs to the kth category and k:1,2, , m Suppose that for each
of n subjects, p independent variables xG :(xG, xG, , xGN) are measured These variables can be either qualitative or quantitative Let P(YG :k xG) be the probability that YG:k given the p measured covariates xG; then
KIP(YG:kxG):1 Without loss of generality, using the last catalog as the
reference, the generalized logistic regression model
log P(YG :k xG)
P(YG: m xG) : aI; N
H bIHxGH k : 1, 2, , m 9 1 (14.3.1)
can be used to study the association of the covariates x to the outcome To
simplify the notation, let uIG:aI ;NHbIHxGH Similar to (14.2.1) and (14.2.2),
probability of being in the kth category is
P(YG:k xG): exp(uIG)
l(a, a, , aK\, b, b, , bK\):logL : log L
G P(YG :kG xG) (14.3.3)
where P(YG :kG xG) is given in (14.3.2) and bI:(bI, , bIN), k :1,
estimation and hypothesis testing procedures for the coefficients are similar to
Trang 4those in the logistic regression model for dichotomous outcomes Strictly
Therefore, the interpretation of the coefficients in these models needs to beclarified Let us consider modeling the relationship between gender and
logP(YG :1 SEXG) P(YG :3 SEXG) : a;b·SEXG
logP(YG :2 SEXG) P(YG :3 SEXG) : a;b·SEXG
It is clear that neither of them is a logistic regression model In the following,
we show how to interpret the coefficients b and b in these models From the
first model,
( f /n)/(b/n)
f a be
However, if only the data from the normal and CHD participants are used,
f a
Trang 5Table 14.16 Nominal Cross-Classification of
Cardiovascular (CVD) Status by Gender
SEX CVD
or the ratio of the odds of a male having CHD to the odds of a female having
as an estimate of the ratio of the odds of a male having CHD to the odds of
a female having CHD if only the data from the normal and CHD participants
as an estimate of the ratio of the odds of a male having STROKE to the odds
of a female having STROKE if only the data from the normal and STROKEparticipants are used The same interpretation also holds for coefficients of
coefficient for a continuous covariate is the odds ratio of a 1-unit increase inthe covariate assuming that other covariates are the same
Example 14.11 We use the data in Example 14.9 and assume that DM
referent category be NFG For simplicity, only two covariates, systolic blood
log P(i th participant is DM)
P(i th participant is NFG): logP(YG :1 xG)
P(YG :3 xG) : 97.648 ; 0.026SBPG;1.047LINSULG
log P(i th participant is IFG)
P(i th participant is NFG): logP(YG :2 xG)
P(YG :3 xG) : 94.949 ; 0.011SBPG;0.876LINSULG
Trang 7logP(i th participant is DM)
P(i th participant is IFG): logP(YG :1 xG)
Thus, the odds ratio is 1.03 [exp(0.026)] times (or 3% higher) for a 1-unitincrease in SBP, and 2.85 [exp(1.047)] times (or 185% higher) for a 1-unitincrease in LINSUL from the model for DM vs NFG The odds ratio is 2.40[exp(0.876)] times (or 140% higher) for a 1-unit increase in LINSUL from themodel for IFG versus NFG SBP is not significant in the model for IFG versus
the examples in Chapter 7, 9, 11, and 12 to perform additional statisticalinferences For instance, we can test whether the coefficients for SBP in the first
the model for DM versus NFG is equal to that in the model for IFG versus
X5 :(b9b)/(v;v 92v), has an asymptotic chi-square distribution with 1 degree of freedom, where v and v are the estimated variance of b and b, respectively, and v is the estimated covariance of b and b From
hypoth-esis H: b9b:0 is not rejected (p:0.6066); that is, there is insufficient
evidence to say that the change in odds ratio for a 1-unit increase in LINSUL
in the model for DM versus NFG is not equal to that in the model for IFGversus NFG
The following SAS, SPSS, and BMDP codes can be used to obtain theresults in Table 14.17
SAS code:
data w1;
infile ‘c: ex14d2d6.dat’ missover;
input age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn;
y : 4-dms;
run;
title ‘‘Generalized logistic regression model’’;
proc catmod data : w1;
direct sbp linsul;
Trang 8model y : sbp linsul
/ ml covb;
contrast ‘Equal coefficients for SBP’ all—parms 0 0 1 91 0 0;
contrast ‘Equal coefficients for LINSUL’ all—parms 0 0 0 0 1 91;
run;
SPSS code:
data list file : ‘c:ex14d2d6.dat’ free
/ age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn.
Compute y : 4-dms.
nomreg y with sbp linsul
/print : fit history parameter lrt.
Ordinal Regression Models
If the outcomes involve a rank ordering, that is, the outcome variable isordinal, several multivalued regression models are available Readers interested
following discussion, we introduce the most frequently used model, the tional odds model In this model, the probability of an outcome below or equal
propor-to a given ordinal level, P(Y
Let YG be the outcome of the ith subject Assume that YG can be classified into
m ordinal levels Let YG:k if YG is classified into the kth level and
Trang 9k : 1, 2, , m Suppose that for each of n subjects, p independent variables
xG :(xG, xG, , xGN) are measured These variables can be either qualitative
or quantitative If the logit link function defined in Section 14.2.3 is used, similar
as having only two outcomes [(Y
logistic regression models Thus, interpretation of the coefficients, bH, such as
is similar to that in a logistic regression model
Let k, , kL be observed outcomes from n subjects Then the log-likelihood function based on the n outcomes observed is the logarithm of the product of all P(YG:kG xG)’s from the n subjects, that is,
l(a, a, , aK\, b, b, , bN) :logL :log L
G P(YG :kG xG) (14.3.8)
where P(YG :kG xG) is as given in (14.3.7) The maximum likelihood estimation
and hypothesis-testing procedures for the coefficients are similar to those
Trang 10and formula corresponding to(14.3.5)—(14.3.7) are
The log-likelihood function based on these two models can be obtained by
replacing P(YG:kG xG) in (14.3.8) with the respective expressions above.
Example 14.12 Now consider the NFG, IFG, and DM categories inExample 14.9 that represent three levels of severity in glucose intolerance DM
(impaired fasting glucose) as FPG between 110 and 125 mg/dL, and NFG
procedure LOGISTIC with all the covariates The SAS program allows users
case, we use the stepwise selection method, and the results are given in the firstpart of Table 14.18 The stepwise method identifies SBP and LINSUL as
Trang 12nondiabetes(NFG; IFG)] the estimated model in (14.3.5) is
or remaining NFG For example, the probability of developing IFG is
P(YG :2 xG):P(participant i is IFG)
As noted earlier, the coefficients in these models can be interpreted as those
in the ordinary logistic regression model for binary outcomes In this example,the higher SBP and LINSUL are, the higher the odds of having DM than ofnot having DM, or the higher the odds of having either DM or IFG than ofbeing NFG The odds ratio is 1.02 [exp(0.019)] times (or 2% higher) for a
152% higher) for a 1-unit increase in LINSUL assuming that SBP is the same.From the table, SBP and LINSUL are related significantly to the diabeticstatus in all models with different link functions
SAS and SPSS can also be used for the other two link functions: the inverse
of the cumulative standard normal distribution and the complementary log-log
Trang 13link functions introduced in Section 14.2.3 Table 14.18 includes the resultsfrom models with these two link functions The results are very similar to thoseobtained using the logit link function.
The following SAS, SPSS, and BMDP codes can be used to obtain theresults in Table 14.18
SAS code:
data w1;
infile ‘c: ex14d2d6.dat’ missover;
input age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn;
run;
title ‘‘Ordinal regression model with logic link function’’;
proc logistic data : w1 descending;
model dms : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : logit;
run;
title ‘‘Ordinal regression model with inverse normal link function‘;
proc logistic data : w1 descending;
model dms : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : probit;
run;
title ‘‘Ordinal regression model with complementary log-log link function’’;
proc logistic data : w1 descending;
model dms : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : cloglog;
run;
SPSS code:
data list file : ‘c:ex14d2d6.dat’ free
/ age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn.
Compute y : 4-dms.
plum y with sbp linsul
/link : logit
/print : fit history parameter.
plum y with sbp linsul
/link : probit
/print : fit history parameter.
plum y with sbp linsul
/link : cloglog
/print : fit history parameter.
BMDP PR code for the logit link function only:
/input file : ‘c:ex14d2d6.dat’
variables : 12.
format : free.
Trang 14/variable names : age, ageg, sex, sbp, dbp, lacr, hdl, linsul, smoke,
Hosmer and Lemeshow discuss broad application of the method, includingmodel-building strategies and interpretation and presentation of analysisresults In addition to the papers and books cited in this chapter, other works
Applications of the logistic regression model can easily be found in variousbiomedical journals
EXERCISES
patients in Table 3.10
(a) Construct a summary table similar to Table 3.11
(b) Construct a table similar to Table 3.12
(c) Use the chi-square test to detect any differences in retinopathy rates
Trang 15(d) On the basis of these 40 patients, identify the most important riskfactors using a linear logistic regression method.
Table 3.1 Let ‘‘response’’ be defined as stable, partial response, orcomplete response
(a) Compare each of the five skin test results of the responders withthose of the nonresponders
(b) Use a linear logistic regression method to identify the most ant risk factors related to response
import-(i) Consider the five skin tests only
(ii) Consider age, gender, and the five skin tests
melanoma, and six skin tests) in Exercise 3.3 and Exercise Table 3.3.Identify the most important prognostic factors that are related toremission Use both univariate and multivariate methods
complete response, partial response, or stable disease) Include gender,age, nephrectomy treatment, lung metastasis, and bone metastasis asindependent variables
(a) Identify the most significant independent variables
(b) Obtain estimates of odds ratios and confidence intervals whenapplicable
X Show that the log odds ratio for X:x;m versus X:x is mb, where b is the logistic regression coefficient.
re-gression model for CVD by using the stepwise selection method toselect risk factors among the same factors as those noted at the bottom
of Table 12.7 Compare the results obtained with those in Table 12.7
that is, the sampling probability is independent of the risk factors x,
Trang 1614.10 Consider the data in Table 12.4 Fit the generalized logistic regression
by using the SAS CATMOD, SPSS NOMREG, or BMDP PRprocedure Select risk factors among those noted at the bottom ofTable 12.7 using the stepwise selection method in the BMDP PRprocedure Compare the results with those given in Table 13.5
DM versus NFG, with SEX as the covariate, by using the data from
binary outcome IFG versus NFG, with SEX as the covariate, by using
covariate, and discuss your findings
Trang 17A P P E N D I X A
Newton Raphson Method
is a numerical iterative procedure that can be used to solve nonlinearequations An iterative procedure is a technique of successive approximations,
and each approximation is called an iteration If the successive approximations approach the solution very closely, we say that the iterations converge The
maximum likelihood estimates of various parameters and coefficients discussed
in Chapters 7, 9, and 11 to 14 can be obtained by using the Newton—Raphson
method In this appendix we discuss and illustrate the use of this method, firstconsidering a single nonlinear equation and then a set of nonlinear equations
preferably, and then the first approximate iteration is given by
Example A.1 Consider the function
f (x) : x 9 x ; 2
428
Trang 18Figure A.1 Graphical presentation of the Newton—Raphson method for Example A.1.
method The first derivative of f (x) is
) :2 and f (x) :2 Thus, the first iteration, following (A.1), gives
Trang 19Figure A.1 gives the graphical presentation of f (x) and the iteration.
It should be noted that the Newton—Raphson method can only find the real
shown in Figure A.1;the other two are complex roots
The Newton—Raphson method can be extended to solve a system of
equations with more than one unknown Suppose that we wish to find values
of x, x, , xN such that
f(x, , xN) :0 f(x, , xN) :0
$
fN(x, , xN) :0 Let aGH be the partial derivative of fG with respect to xH;that is, aGH:*fG/*xH.
Trang 20The matrix
J:
a % aN a % aN
bN % bNN
be the corresponding values of the functions f, , fN, that is,
are close enough to zero or when differences in the x values at two consecutive
iterations are negligible
Example A.2 Suppose that we wish to find the value of x and x such that
x
Trang 21terminates and the solution of the two simultaneous equations is x :91, x:2.
The number of iterations required depends strongly on the initial values
chosen In Example A.2, if we use x
to find the solution Interested readers may try it as an exercise
Trang 22A P P E N D I X B
Statistical Tables
433
Trang 23Table B-1 Normal Curve Areas
Source: Abridged from Table 1 of Statistical Tables and Formulas, by A Hald, JohnWiley & Sons,
1952 Reproduced by permissionof JohnWiley & Sons.
434
Trang 24Table B-2 Percentage Points of the -Distribution
Source: ‘‘Tables of the Percentage Points of the-Distribution,’’ by Catherine M Thompson,
Biometrika, Vol 32, pp 188—189 (1941) Reproduced by permissionof the editor of Biometrika.
435
Trang 26437
... higher) for a 1-unit increase in LINSUL from themodel for IFG versus NFG SBP is not significant in the model for IFG versusthe examples in Chapter 7, 9, 11, and 12 to perform additional statisticalinferences... statisticalinferences For instance, we can test whether the coefficients for SBP in the first
the model for DM versus NFG is equal to that in the model for IFG versus
X5 :(b9b)/(v;v 92 v),... coefficients for SBP’ all—parms 0 91 0;
contrast ‘Equal coefficients for LINSUL’ all—parms 0 0 91 ;
run;
SPSS code:
data