14.3.2 Model for Ordinal Polychotomous Outcomes: Ordinal Regression Models If the outcomes involve a rank ordering, that is, the outcome variable isordinal, several multivalued regressio
Trang 1model y : sbp linsul
/ ml covb;
contrast ‘Equal coefficients for SBP’ all—parms 0 0 1 91 0 0;
contrast ‘Equal coefficients for LINSUL’ all—parms 0 0 0 0 1 91;
run;
SPSS code:
data list file : ‘c:ex14d2d6.dat’ free
/ age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn.
Compute y : 4-dms.
nomreg y with sbp linsul
/print : fit history parameter lrt.
14.3.2 Model for Ordinal Polychotomous Outcomes:
Ordinal Regression Models
If the outcomes involve a rank ordering, that is, the outcome variable isordinal, several multivalued regression models are available Readers interested
in these models are referred to McCullagh and Nelder(1989), Agresti (1990),Ananth and Kleinbaum (1997), and Hosmer and Lemeshow (2000) In thefollowing discussion, we introduce the most frequently used model, the propor-tional odds model In this model, the probability of an outcome below or equal
to a given ordinal level, P(Y
higher than the level given, P(Y k).
Let YG be the outcome of the ith subject Assume that YG can be classified into
m ordinal levels Let YG:k if YG is classified into the kth level and
Trang 2k : 1, 2, , m Suppose that for each of n subjects, p independent variables
xG :(xG, xG, , xGN) are measured These variables can be either qualitative
or quantitative If the logit link function defined in Section 14.2.3 is used, similar
to the logistic regression model(14.2.3), we consider the following models:
as having only two outcomes [(Y
logistic regression models Thus, interpretation of the coefficients, bH, such as
the exponentiated coefficient [exp(bH)] for a discrete or a continuous covariate
is similar to that in a logistic regression model
Let k, , kL be observed outcomes from n subjects Then the log-likelihood function based on the n outcomes observed is the logarithm of the product of all P(YG:kG xG)’s from the n subjects, that is,
where P(YG :kG xG) is as given in (14.3.7) The maximum likelihood estimation
and hypothesis-testing procedures for the coefficients are similar to those
discussed previously If the probit link function in(14.2.27) is used, the models
420
Trang 3and formula corresponding to(14.3.5)—(14.3.7) are
The log-likelihood function based on these two models can be obtained by
replacing P(YG:kG xG) in (14.3.8) with the respective expressions above.
Example 14.9 that represent three levels of severity in glucose intolerance DM(diabetes) is defined as fasting plasma glucose (FPG) 126 mg/dL, IFG(impaired fasting glucose) as FPG between 110 and 125 mg/dL, and NFG(normal fasting glucose) as FPG 110 mg/dL Thus, it is reasonable to
consider the outcome variable as ordinal Let the outcome variable Y: 1 if
DM, 2 if IFG, and 3 if NFG We fit the models in (14.3.5) using the SASprocedure LOGISTIC with all the covariates The SAS program allows users
to use a variable selection method(forward, backward, and stepwise) In thiscase, we use the stepwise selection method, and the results are given in the firstpart of Table 14.18 The stepwise method identifies SBP and LINSUL as
significant independent variables For k: 1 [i.e., we compare diabetes with
Trang 5nondiabetes(NFG; IFG)] the estimated model in (14.3.5) is
log
1 : log P (participant i is diabetic)
: 96.753 ; 0.019SBPG;0.925LINSULG For k: 2, the estimated model in (14.3.5) is
log
1 : logP (participant i is either DM or IFG)
P (participant i is NFG) : 95.485 ; 0.019SBPG;0.925LINSULG
According to(14.3.7), we can estimate the probability of developing DM, IFG,
or remaining NFG For example, the probability of developing IFG is
P(YG :2 xG):P(participant i is IFG)
P(participant is IFG): 0.951
1; 0.9519
0.268
1; 0.268: 0.276
As noted earlier, the coefficients in these models can be interpreted as those
in the ordinary logistic regression model for binary outcomes In this example,the higher SBP and LINSUL are, the higher the odds of having DM than ofnot having DM, or the higher the odds of having either DM or IFG than ofbeing NFG The odds ratio is 1.02 [exp(0.019)] times (or 2% higher) for a1-unit increase in SBP assuming that LINSUL is the same, and 2.52 times(or152% higher) for a 1-unit increase in LINSUL assuming that SBP is the same.From the table, SBP and LINSUL are related significantly to the diabeticstatus in all models with different link functions
SAS and SPSS can also be used for the other two link functions: the inverse
of the cumulative standard normal distribution and the complementary log-log
Trang 6link functions introduced in Section 14.2.3 Table 14.18 includes the resultsfrom models with these two link functions The results are very similar to thoseobtained using the logit link function.
The following SAS, SPSS, and BMDP codes can be used to obtain theresults in Table 14.18
SAS code:
data w1;
infile ‘c: ex14d2d6.dat’ missover;
input age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn;
run;
title ‘‘Ordinal regression model with logic link function’’;
proc logistic data : w1 descending;
model dms : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : logit;
run;
title ‘‘Ordinal regression model with inverse normal link function‘;
proc logistic data : w1 descending;
model dms : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : probit;
run;
title ‘‘Ordinal regression model with complementary log-log link function’’;
proc logistic data : w1 descending;
model dms : age sex sbp dbp lacr hdl linsul smoke
/ selection : s lackfit link : cloglog;
run;
SPSS code:
data list file : ‘c:ex14d2d6.dat’ free
/ age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn.
Compute y : 4-dms.
plum y with sbp linsul
/link : logit
/print : fit history parameter.
plum y with sbp linsul
/link : probit
/print : fit history parameter.
plum y with sbp linsul
/link : cloglog
/print : fit history parameter.
BMDP PR code for the logit link function only:
/input file : ‘c:ex14d2d6.dat’
variables : 12.
format : free.
424
Trang 7/variable names : age, ageg, sex, sbp, dbp, lacr, hdl, linsul, smoke,
on the subject include Anderson (1972), Mantel (1973), Prentice (1976),Prentice and Pyke(1979), Holford et al (1978), and Breslow and Day (1980).Applications of the logistic regression model can easily be found in variousbiomedical journals
EXERCISES
14.1 Consider the study presented in Example 3.5 and the data for the 40patients in Table 3.10
(a) Construct a summary table similar to Table 3.11
(b) Construct a table similar to Table 3.12
(c) Use the chi-square test to detect any differences in retinopathy ratesamong the subgroups obtained in part(b)
Trang 8(d) On the basis of these 40 patients, identify the most important riskfactors using a linear logistic regression method.
14.2 Consider the data for the 33 hypernephroma patients given in ExerciseTable 3.1 Let ‘‘response’’ be defined as stable, partial response, orcomplete response
(a) Compare each of the five skin test results of the responders withthose of the nonresponders
(b) Use a linear logistic regression method to identify the most ant risk factors related to response
import-(i) Consider the five skin tests only
(ii) Consider age, gender, and the five skin tests
14.3 Consider all nine risk variables (age, gender, family history ofmelanoma, and six skin tests) in Exercise 3.3 and Exercise Table 3.3.Identify the most important prognostic factors that are related toremission Use both univariate and multivariate methods
14.4 Consider the data of 58 hypernephroma patients given in ExerciseTable 3.2 Apply the logistic regression method to response(defined ascomplete response, partial response, or stable disease) Include gender,age, nephrectomy treatment, lung metastasis, and bone metastasis asindependent variables
(a) Identify the most significant independent variables
(b) Obtain estimates of odds ratios and confidence intervals whenapplicable
14.5 Consider the case where there is one continuous independent variable
X Show that the log odds ratio for X:x;m versus X:x is mb, where b is the logistic regression coefficient.
14.6 Using the data in Table 12.4, define the index function CVD asCVD: 1 if dg 1, and CVD : 0 otherwise, and fit a logistic re-gression model for CVD by using the stepwise selection method toselect risk factors among the same factors as those noted at the bottom
of Table 12.7 Compare the results obtained with those in Table 12.7
14.7 Assuming that P(a person is sampled y, x) : P(a person is sampled y),
that is, the sampling probability is independent of the risk factors x,derive(14.2.15)
14.8 By using(14.2.14) and (14.2.1), show that (14.2.20) reduces to (14.2.21).14.9 Derive(14.3.2)
426
Trang 914.10 Consider the data in Table 12.4 Fit the generalized logistic regressionmodel in(14.3.1) for DG with covariates AGE, SEX, LACR, and LTG
by using the SAS CATMOD, SPSS NOMREG, or BMDP PRprocedure Select risk factors among those noted at the bottom ofTable 12.7 using the stepwise selection method in the BMDP PRprocedure Compare the results with those given in Table 13.5.14.11 Using the same notation and data as in Table 14.11,(1) fit the outcome
variable Y with the generalized logistic regression model in(14.3.1) withSEX as the covariate;(2) fit a logistic regression for the binary outcome
DM versus NFG, with SEX as the covariate, by using the data from
DM and NFG participants only; (3) fit a logistic regression for thebinary outcome IFG versus NFG, with SEX as the covariate, by usingthe data from IFG and NFG participants only; (4) compare thecoefficients obtained from (2) and (3) with the coefficients obtainedfrom(1), and (5) report what you have found
14.12 Perform the same analyses as in Exercise 14.11 but use SBP as thecovariate, and discuss your findings
Trang 10A P P E N D I X A
Newton Raphson Method
The Newton—Raphson method(Ralston and Wilf, 1967;Carnahan et al., 1969)
is a numerical iterative procedure that can be used to solve nonlinearequations An iterative procedure is a technique of successive approximations,
and each approximation is called an iteration If the successive approximations approach the solution very closely, we say that the iterations converge The
maximum likelihood estimates of various parameters and coefficients discussed
in Chapters 7, 9, and 11 to 14 can be obtained by using the Newton—Raphson
method In this appendix we discuss and illustrate the use of this method, firstconsidering a single nonlinear equation and then a set of nonlinear equations
Let f (x) : 0 be the equation to be solved for x The Newton—Raphson method requires an initial estimate of x, say x , such that f(x) is close to zero
preferably, and then the first approximate iteration is given by
x :x9 f f (x (x) ) (A.1)where f (x) is the first derivative of f (x) evaluated at x:x In general, the (k; 1)th iteration or approximation is given by
x I>:xI9 f f (xI) (xI) (A.2)where f (xI) is the first derivative of f (x) evaluated at x:xI The iteration terminates at the kth iteration if f (x I) is close enough to zero or the difference between x I and xI\ is negligible The stopping rule is rather subjective Acceptable rules are that f (x I) or d:xI9xI\ is in the neighborhood of
10\ or 10\
428
Trang 11Figure A.1 Graphical presentation of the Newton—Raphson method for Example A.1.
We wish to find the value of x such that f (x) : 0 by the Newton—Raphson method The first derivative of f (x) is
f (x) : 3x 9 1
Since f ( 91) : 2 and f (92) : 94, graphically (Figure A.1), we see that the curve cuts through the x axis [ f (x): 0] between 91 and 92 This gives us a
good hint of an initial value of x Suppose that we begin with x f (x :91;
) :2 and f (x) :2 Thus, the first iteration, following (A.1), gives
Trang 12shown in Figure A.1;the other two are complex roots.
The Newton—Raphson method can be extended to solve a system of
equations with more than one unknown Suppose that we wish to find values
of x, x, , xN such that
f(x, , xN) :0 f(x, , xN) :0
$
fN(x, , xN) :0 Let aGH be the partial derivative of fG with respect to xH;that is, aGH:*fG/*xH.
Trang 13The matrix
a % aN a % aN
bN % bNN Let x I, x I, , x I N be the approximate root at the kth iteration;let f I, , f I N
be the corresponding values of the functions f, , fN, that is,
The iterative procedure begins with a preselected initial approximate x,
x, , x N, proceeds following (A.3), and terminates either when f, f, , fN are close enough to zero or when differences in the x values at two consecutive
iterations are negligible
Trang 14With these values, f :0 and f :0 Therefore, the iteration procedure
terminates and the solution of the two simultaneous equations is x :91, x:2.
The number of iterations required depends strongly on the initial values
chosen In Example A.2, if we use x
:0, x :0, it requires about 11 iterations
to find the solution Interested readers may try it as an exercise
Trang 15A P P E N D I X B
Statistical Tables
433
Trang 16Table B-1 Normal Curve Areas
Source: Abridged from Table 1 of Statistical Tables and Formulas, by A Hald, JohnWiley & Sons,
1952 Reproduced by permissionof JohnWiley & Sons.
434
Trang 17Table B-2 Percentage Points of the -Distribution
Source: ‘‘Tables of the Percentage Points of the-Distribution,’’ by Catherine M Thompson,
Biometrika, Vol 32, pp 188—189 (1941) Reproduced by permissionof the editor of Biometrika.
435
Trang 19437
Trang 21439
Trang 23441
Trang 26Table B-4 Upper Tail Probabilities for the Null Distribution of the Kruskal Wallis H Statistic: k : 3, n1: 1(1)5, n 2 : n1 (1)5, 2 n 3 : n2 (1)5
444
Trang 27Table B-4 (continued)
445
Trang 28Table B-4 (continued)
446
Trang 29Table B-4 (continued)
447
Trang 30Table B-4 (continued)
448
Trang 31Table B-4 (continued)
449
Trang 32Table B-4 (continued)
450
Trang 33Table B-4 (continued)
451
Trang 34Table B-4 (continued)
452
Trang 35Table B-4 (continued)
453
Trang 36Table B-4 (continued)
454
Trang 37Table B-4 (continued)
455
Trang 38Table B-4 (continued)
456
Trang 39Table B-4 (continued)
457
Trang 40Table B-4 (continued)
Source: Table F of A Nonparametric Introduction to Statistics, by C H Kraft and C van Eedan,
Macmillan, New York, 1968 Reproduced by permission of the Macmillan Publishing Company.
458
Trang 41Table B-5 Selected Critical Values for All Treatments: Multiple Comparisons Based on Kruskal Wallis Rank Sums
Source: ‘‘Rank Sum Multiple Comparisons in One- and Two-Way Classification,’’ by B J.
McDonald and W A Thompson, Biometrika, Vol 54, pp 487—497 (1967) Reproduced by
permissionof the editor of Biometrika The starred values are from ‘‘Distribution-Free Multiple
Comparisons,’’ Ph.D thesis (1963), P Nemenyi, Princeton University, with permission of the
Trang 42Table B-6 Selected Critical Values for the Range of k Independent N(0, 1) Variables:
k: 2(1)20(2)40(10)100
For a given k and , the tabled entry is q(, k, -).
Source: ‘‘Table of Range and Studentized Range,’’ by H L Harter, Ann Math Statist., Vol 31, pp.
1122—1147 (1960) Reproduced by permissionof the editor of the Annals of Mathematical
Statistics.
460
... include Anderson ( 197 2), Mantel ( 197 3), Prentice ( 197 6),Prentice and Pyke( 197 9), Holford et al ( 197 8), and Breslow and Day ( 198 0).Applications of the logistic regression model can easily be found... : 3x 1Since f ( 91 ) : and f (92 ) : 94 , graphically (Figure A.1), we see that the curve cuts through the x axis [ f (x): 0] between 91 and 92 This gives us a
good... 188—1 89 ( 194 1) Reproduced by permissionof the editor of Biometrika.
435
Trang 19< /span>437