9.5 Cox models with survey data
9.5.3 Some caveats of analyzing survival data from complex survey de-~
. -~
stgns ,,~
_ _._.;
Here we outline some common issues you may run into when svysetting your survival} .. ,
data. >•
1. We described how to fit Cox models to survey data using svy: stcox in this seq"
tion. You can also fit parametric survival models to survey data using svy: stregf an extension of Stata's streg command for fitting these models to nonsurvey dat¢;
see chapters 12 and 13. Although not explicitly discussed, what you learn her~
about svy: stcox can also be applied to svy: streg.
::f
2. Probability weights, pweights, may be specified in either stset or svyset. Whe¥
specified in both, however, they must be in agreement; see the technical note il;i
[svY] svy estimation for details. '
Cox model with missing data-multiple imputation 169 For multiple-record survival data when option id() is specified with stset, the
ID variable must be nested within the final-stage sampling units.
Consider a simple case of a stratified survey design with stratum identifiers defined by the strataid variable. Suppose we have multiple records per subject identified by the id variable and variables time and fail recording analysis time and failure, respectively. Using the following syntax of svyset will produce an error when fitting the Cox (or any other survival) model.
svyset, strata(strataid)
stset time, id(id) failure(fail) svy: stcox ...
the stset ID variable is not nested within the final-stage sampling units r(459);
In the above example, the final-stage sampling unit is __n, that is, the records, because
. svyset, strata(strataid)
is equivalent to
. svyset ~, strata(strataid)
The error message tells us that the id variable is not nested within __n; individuals are certainly not nested within records. The correct way of svysetting these data is to use the id variable as a primary sampling unit instead of the default __n:
. svyset id, strata(strataid)
The above error message will also appear if, for a multistage cluster design, the
ID variable is not nested within the lowest-level cluster.
4. Survey-adjusted estimates are not available with frailty models or methods other than Breslow for handling ties; see [sT] stcox for other restrictions on options when the svy: prefix is used.
5. Do not confuse svyset's option strata() with stcox's option strata(). The former adjusts standard errors of the estimates for the stratification within the sampling design. The latter allows separate baseline hazard estimates for each stratum.
Cox model with missing data-multiple imputation
~In the presence of missing data, stcox, as other Stata estimation commands, discards all
;bbservations that have a missing value in any of the specified covariates. This method of handling missing data is known as listwise deletion, casewise deletion, or complete case analysis. The major drawbacks of this method are potential loss of power to detect an :association between an outcome and exposures due to reduced sample size and potential bias when the remaining observations are not representative of the population of interest.
Another popular way of handling missing data is multiple imputation (Mr; R:t.H5ã,, 1987). One of the appealing features of MI is that, similar to listwise deletion, it cani.,u
)\1.1)
used with a wide variety of standard complete data methods. Unlike listwise deletib however, MI does not discard data, and thus it preserves all available information in tHe
analysis. I
Multiple imputation is a simulation-based procedure that consists of three step~
imputation, during which missing values are replaced with multiple sets of plausibft values from a chosen imputation model to form M completed datasets; completed-dat!
analysis, during which standard complete data analysis is performed on each complet~~
dataset; and pooling, during which results from step 2 are consolidated into one multipl~
imputation inference using Rubin's combination rules. ' 'I
In Stata, you can perform MI using the mi command. The imputation step is pel formed by using mi impute under the assumption of data missing at random (MAR) Jj
missing completely at random (MCAR), which is when the reason for missing data does not depend on unobserved values. The completed-data analysis and pooling steps af~
combined in one step by the mi estimate prefix. .~
In this section, we concentrate on using mi for missing-data analysis of survival da~1
with MI. For a general overview of multiple imputation, see [MI] intro substantive an8
the references given therein. '~
"''j Consider fictional data on 48 participants in a cancer drug trial. Twenty-eig~~
participants received treatment ( drug=1) and 20 received a placebo ( drug=O). Tini~
until death, measured in months, or the last month that a subject was known to b~
alive, is recorded in variable studytime. We want to use the Cox model to study th1
association between subjects' survival and treatment while adjusting for subjects' agã~~ ,,
To illustrate MI, we randomly deleted roughly 17% of the values for age. .. ;i ~
Here is an overview of our data:
. use http://www.stata-press.com/data/cggm3/cancer (Patient Survival in Drug Trial)
. stset
-> stset studytime, failure(died) failure event:
obs. time interval:
died != 0 & died <
(0, studytime]
exit on or before: failure 48 total obs.
0 exclusions
48 obs. remaining, representing
31 failures in single record/single failure data 744 total analysis time at risk, at risk from t =
earliest observed entry t =
last observed exit t =
0 0 39
'i ;~
\':~