1. Trang chủ
  2. » Công Nghệ Thông Tin

Statistical Methods for Survival Data Analysis Third Edition phần 9 pptx

54 255 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Models for Polychotomous Outcomes
Trường học University of Statistics
Chuyên ngành Statistical Methods
Thể loại Bài giảng
Năm xuất bản 2023
Thành phố Hanoi
Định dạng
Số trang 54
Dung lượng 1,17 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

14.3.2 Model for Ordinal Polychotomous Outcomes: Ordinal Regression Models If the outcomes involve a rank ordering, that is, the outcome variable isordinal, several multivalued regressio

Trang 1

model y : sbp linsul

/ ml covb;

contrast ‘Equal coefficients for SBP’ all—parms 0 0 1 91 0 0;

contrast ‘Equal coefficients for LINSUL’ all—parms 0 0 0 0 1 91;

run;

SPSS code:

data list file : ‘c:ex14d2d6.dat’ free

/ age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn.

Compute y : 4-dms.

nomreg y with sbp linsul

/print : fit history parameter lrt.

14.3.2 Model for Ordinal Polychotomous Outcomes:

Ordinal Regression Models

If the outcomes involve a rank ordering, that is, the outcome variable isordinal, several multivalued regression models are available Readers interested

in these models are referred to McCullagh and Nelder(1989), Agresti (1990),Ananth and Kleinbaum (1997), and Hosmer and Lemeshow (2000) In thefollowing discussion, we introduce the most frequently used model, the propor-tional odds model In this model, the probability of an outcome below or equal

to a given ordinal level, P(Y

higher than the level given, P(Y  k).

Let YG be the outcome of the ith subject Assume that YG can be classified into

m ordinal levels Let YG:k if YG is classified into the kth level and

Trang 2

k : 1, 2, , m Suppose that for each of n subjects, p independent variables

xG :(xG, xG, , xGN) are measured These variables can be either qualitative

or quantitative If the logit link function defined in Section 14.2.3 is used, similar

to the logistic regression model(14.2.3), we consider the following models:

as having only two outcomes [(Y

logistic regression models Thus, interpretation of the coefficients, bH, such as

the exponentiated coefficient [exp(bH)] for a discrete or a continuous covariate

is similar to that in a logistic regression model

Let k, , kL be observed outcomes from n subjects Then the log-likelihood function based on the n outcomes observed is the logarithm of the product of all P(YG:kG xG)’s from the n subjects, that is,

where P(YG :kG xG) is as given in (14.3.7) The maximum likelihood estimation

and hypothesis-testing procedures for the coefficients are similar to those

discussed previously If the probit link function in(14.2.27) is used, the models

420        

Trang 3

and formula corresponding to(14.3.5)—(14.3.7) are

The log-likelihood function based on these two models can be obtained by

replacing P(YG:kG xG) in (14.3.8) with the respective expressions above.

Example 14.9 that represent three levels of severity in glucose intolerance DM(diabetes) is defined as fasting plasma glucose (FPG) 126 mg/dL, IFG(impaired fasting glucose) as FPG between 110 and 125 mg/dL, and NFG(normal fasting glucose) as FPG 110 mg/dL Thus, it is reasonable to

consider the outcome variable as ordinal Let the outcome variable Y: 1 if

DM, 2 if IFG, and 3 if NFG We fit the models in (14.3.5) using the SASprocedure LOGISTIC with all the covariates The SAS program allows users

to use a variable selection method(forward, backward, and stepwise) In thiscase, we use the stepwise selection method, and the results are given in the firstpart of Table 14.18 The stepwise method identifies SBP and LINSUL as

significant independent variables For k: 1 [i.e., we compare diabetes with

Trang 5

nondiabetes(NFG; IFG)] the estimated model in (14.3.5) is

log

1 : log P (participant i is diabetic)

: 96.753 ; 0.019SBPG;0.925LINSULG For k: 2, the estimated model in (14.3.5) is

log

1 : logP (participant i is either DM or IFG)

P (participant i is NFG) : 95.485 ; 0.019SBPG;0.925LINSULG

According to(14.3.7), we can estimate the probability of developing DM, IFG,

or remaining NFG For example, the probability of developing IFG is

P(YG :2 xG):P(participant i is IFG)

P(participant is IFG): 0.951

1; 0.9519

0.268

1; 0.268: 0.276

As noted earlier, the coefficients in these models can be interpreted as those

in the ordinary logistic regression model for binary outcomes In this example,the higher SBP and LINSUL are, the higher the odds of having DM than ofnot having DM, or the higher the odds of having either DM or IFG than ofbeing NFG The odds ratio is 1.02 [exp(0.019)] times (or 2% higher) for a1-unit increase in SBP assuming that LINSUL is the same, and 2.52 times(or152% higher) for a 1-unit increase in LINSUL assuming that SBP is the same.From the table, SBP and LINSUL are related significantly to the diabeticstatus in all models with different link functions

SAS and SPSS can also be used for the other two link functions: the inverse

of the cumulative standard normal distribution and the complementary log-log

Trang 6

link functions introduced in Section 14.2.3 Table 14.18 includes the resultsfrom models with these two link functions The results are very similar to thoseobtained using the logit link function.

The following SAS, SPSS, and BMDP codes can be used to obtain theresults in Table 14.18

SAS code:

data w1;

infile ‘c: ex14d2d6.dat’ missover;

input age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn;

run;

title ‘‘Ordinal regression model with logic link function’’;

proc logistic data : w1 descending;

model dms : age sex sbp dbp lacr hdl linsul smoke

/ selection : s lackfit link : logit;

run;

title ‘‘Ordinal regression model with inverse normal link function‘;

proc logistic data : w1 descending;

model dms : age sex sbp dbp lacr hdl linsul smoke

/ selection : s lackfit link : probit;

run;

title ‘‘Ordinal regression model with complementary log-log link function’’;

proc logistic data : w1 descending;

model dms : age sex sbp dbp lacr hdl linsul smoke

/ selection : s lackfit link : cloglog;

run;

SPSS code:

data list file : ‘c:ex14d2d6.dat’ free

/ age ageg sex sbp dbp lacr hdl linsul smoke dms dm sn.

Compute y : 4-dms.

plum y with sbp linsul

/link : logit

/print : fit history parameter.

plum y with sbp linsul

/link : probit

/print : fit history parameter.

plum y with sbp linsul

/link : cloglog

/print : fit history parameter.

BMDP PR code for the logit link function only:

/input file : ‘c:ex14d2d6.dat’

variables : 12.

format : free.

424        

Trang 7

/variable names : age, ageg, sex, sbp, dbp, lacr, hdl, linsul, smoke,

on the subject include Anderson (1972), Mantel (1973), Prentice (1976),Prentice and Pyke(1979), Holford et al (1978), and Breslow and Day (1980).Applications of the logistic regression model can easily be found in variousbiomedical journals

EXERCISES

14.1 Consider the study presented in Example 3.5 and the data for the 40patients in Table 3.10

(a) Construct a summary table similar to Table 3.11

(b) Construct a table similar to Table 3.12

(c) Use the chi-square test to detect any differences in retinopathy ratesamong the subgroups obtained in part(b)

Trang 8

(d) On the basis of these 40 patients, identify the most important riskfactors using a linear logistic regression method.

14.2 Consider the data for the 33 hypernephroma patients given in ExerciseTable 3.1 Let ‘‘response’’ be defined as stable, partial response, orcomplete response

(a) Compare each of the five skin test results of the responders withthose of the nonresponders

(b) Use a linear logistic regression method to identify the most ant risk factors related to response

import-(i) Consider the five skin tests only

(ii) Consider age, gender, and the five skin tests

14.3 Consider all nine risk variables (age, gender, family history ofmelanoma, and six skin tests) in Exercise 3.3 and Exercise Table 3.3.Identify the most important prognostic factors that are related toremission Use both univariate and multivariate methods

14.4 Consider the data of 58 hypernephroma patients given in ExerciseTable 3.2 Apply the logistic regression method to response(defined ascomplete response, partial response, or stable disease) Include gender,age, nephrectomy treatment, lung metastasis, and bone metastasis asindependent variables

(a) Identify the most significant independent variables

(b) Obtain estimates of odds ratios and confidence intervals whenapplicable

14.5 Consider the case where there is one continuous independent variable

X Show that the log odds ratio for X:x;m versus X:x is mb, where b is the logistic regression coefficient.

14.6 Using the data in Table 12.4, define the index function CVD asCVD: 1 if dg 1, and CVD : 0 otherwise, and fit a logistic re-gression model for CVD by using the stepwise selection method toselect risk factors among the same factors as those noted at the bottom

of Table 12.7 Compare the results obtained with those in Table 12.7

14.7 Assuming that P(a person is sampled  y, x) : P(a person is sampled  y),

that is, the sampling probability is independent of the risk factors x,derive(14.2.15)

14.8 By using(14.2.14) and (14.2.1), show that (14.2.20) reduces to (14.2.21).14.9 Derive(14.3.2)

426        

Trang 9

14.10 Consider the data in Table 12.4 Fit the generalized logistic regressionmodel in(14.3.1) for DG with covariates AGE, SEX, LACR, and LTG

by using the SAS CATMOD, SPSS NOMREG, or BMDP PRprocedure Select risk factors among those noted at the bottom ofTable 12.7 using the stepwise selection method in the BMDP PRprocedure Compare the results with those given in Table 13.5.14.11 Using the same notation and data as in Table 14.11,(1) fit the outcome

variable Y with the generalized logistic regression model in(14.3.1) withSEX as the covariate;(2) fit a logistic regression for the binary outcome

DM versus NFG, with SEX as the covariate, by using the data from

DM and NFG participants only; (3) fit a logistic regression for thebinary outcome IFG versus NFG, with SEX as the covariate, by usingthe data from IFG and NFG participants only; (4) compare thecoefficients obtained from (2) and (3) with the coefficients obtainedfrom(1), and (5) report what you have found

14.12 Perform the same analyses as in Exercise 14.11 but use SBP as thecovariate, and discuss your findings

Trang 10

A P P E N D I X A

Newton Raphson Method

The Newton—Raphson method(Ralston and Wilf, 1967;Carnahan et al., 1969)

is a numerical iterative procedure that can be used to solve nonlinearequations An iterative procedure is a technique of successive approximations,

and each approximation is called an iteration If the successive approximations approach the solution very closely, we say that the iterations converge The

maximum likelihood estimates of various parameters and coefficients discussed

in Chapters 7, 9, and 11 to 14 can be obtained by using the Newton—Raphson

method In this appendix we discuss and illustrate the use of this method, firstconsidering a single nonlinear equation and then a set of nonlinear equations

Let f (x) : 0 be the equation to be solved for x The Newton—Raphson method requires an initial estimate of x, say x  , such that f(x) is close to zero

preferably, and then the first approximate iteration is given by

x  :x9 f f (x (x) ) (A.1)where f (x) is the first derivative of f (x) evaluated at x:x In general, the (k; 1)th iteration or approximation is given by

x  I>:xI9 f f (xI) (xI) (A.2)where f (xI) is the first derivative of f (x) evaluated at x:xI The iteration terminates at the kth iteration if f (x  I) is close enough to zero or the difference between x  I and xI\ is negligible The stopping rule is rather subjective Acceptable rules are that f (x  I) or d:xI9xI\ is in the neighborhood of

10\ or 10\

428

Trang 11

Figure A.1 Graphical presentation of the Newton—Raphson method for Example A.1.

We wish to find the value of x such that f (x) : 0 by the Newton—Raphson method The first derivative of f (x) is

f (x) : 3x 9 1

Since f ( 91) : 2 and f (92) : 94, graphically (Figure A.1), we see that the curve cuts through the x axis [ f (x): 0] between 91 and 92 This gives us a

good hint of an initial value of x Suppose that we begin with x f (x  :91;

 ) :2 and f (x) :2 Thus, the first iteration, following (A.1), gives

Trang 12

shown in Figure A.1;the other two are complex roots.

The Newton—Raphson method can be extended to solve a system of

equations with more than one unknown Suppose that we wish to find values

of x, x, , xN such that

f(x, , xN) :0 f(x, , xN) :0

$

fN(x, , xN) :0 Let aGH be the partial derivative of fG with respect to xH;that is, aGH:*fG/*xH.

Trang 13

The matrix

a % aN a % aN

bN % bNN Let x I, x I, , x I N be the approximate root at the kth iteration;let f I, , f I N

be the corresponding values of the functions f, , fN, that is,

The iterative procedure begins with a preselected initial approximate x,

x, , x N, proceeds following (A.3), and terminates either when f, f, , fN are close enough to zero or when differences in the x values at two consecutive

iterations are negligible

Trang 14

With these values, f :0 and f :0 Therefore, the iteration procedure

terminates and the solution of the two simultaneous equations is x :91, x:2.

The number of iterations required depends strongly on the initial values

chosen In Example A.2, if we use x

 :0, x :0, it requires about 11 iterations

to find the solution Interested readers may try it as an exercise

Trang 15

A P P E N D I X B

Statistical Tables

433

Trang 16

Table B-1 Normal Curve Areas

Source: Abridged from Table 1 of Statistical Tables and Formulas, by A Hald, JohnWiley & Sons,

1952 Reproduced by permissionof JohnWiley & Sons.

434

Trang 17

Table B-2 Percentage Points of the  -Distribution

Source: ‘‘Tables of the Percentage Points of the-Distribution,’’ by Catherine M Thompson,

Biometrika, Vol 32, pp 188—189 (1941) Reproduced by permissionof the editor of Biometrika.

435

Trang 19

437

Trang 21

439

Trang 23

441

Trang 26

Table B-4 Upper Tail Probabilities for the Null Distribution of the Kruskal Wallis H Statistic: k : 3, n1: 1(1)5, n 2 : n1 (1)5, 2 n 3 : n2 (1)5

444

Trang 27

Table B-4 (continued)

445

Trang 28

Table B-4 (continued)

446

Trang 29

Table B-4 (continued)

447

Trang 30

Table B-4 (continued)

448

Trang 31

Table B-4 (continued)

449

Trang 32

Table B-4 (continued)

450

Trang 33

Table B-4 (continued)

451

Trang 34

Table B-4 (continued)

452

Trang 35

Table B-4 (continued)

453

Trang 36

Table B-4 (continued)

454

Trang 37

Table B-4 (continued)

455

Trang 38

Table B-4 (continued)

456

Trang 39

Table B-4 (continued)

457

Trang 40

Table B-4 (continued)

Source: Table F of A Nonparametric Introduction to Statistics, by C H Kraft and C van Eedan,

Macmillan, New York, 1968 Reproduced by permission of the Macmillan Publishing Company.

458

Trang 41

Table B-5 Selected Critical Values for All Treatments: Multiple Comparisons Based on Kruskal Wallis Rank Sums

Source: ‘‘Rank Sum Multiple Comparisons in One- and Two-Way Classification,’’ by B J.

McDonald and W A Thompson, Biometrika, Vol 54, pp 487—497 (1967) Reproduced by

permissionof the editor of Biometrika The starred values are from ‘‘Distribution-Free Multiple

Comparisons,’’ Ph.D thesis (1963), P Nemenyi, Princeton University, with permission of the

Trang 42

Table B-6 Selected Critical Values for the Range of k Independent N(0, 1) Variables:

k: 2(1)20(2)40(10)100

For a given k and , the tabled entry is q(, k, -).

Source: ‘‘Table of Range and Studentized Range,’’ by H L Harter, Ann Math Statist., Vol 31, pp.

1122—1147 (1960) Reproduced by permissionof the editor of the Annals of Mathematical

Statistics.

460

... include Anderson ( 197 2), Mantel ( 197 3), Prentice ( 197 6),Prentice and Pyke( 197 9), Holford et al ( 197 8), and Breslow and Day ( 198 0).Applications of the logistic regression model can easily be found... : 3x 1

Since f ( 91 ) : and f (92 ) : 94 , graphically (Figure A.1), we see that the curve cuts through the x axis [ f (x): 0] between 91 and 92 This gives us a

good... 188—1 89 ( 194 1) Reproduced by permissionof the editor of Biometrika.

435

Trang 19< /span>

437

Ngày đăng: 14/08/2014, 09:22

TỪ KHÓA LIÊN QUAN