Interpreting the cumulative hazard and hazard rate

Một phần của tài liệu An introduction to survival analysis using stata (Trang 42 - 45)

As previously mentioned, learning to think in terms of the hazard and the cumulative hazard functions, rather than in the traditional density and cumulative density func- tions, has several advantages. Hazard functions give a more natural way to interpret the process that generates failures, and regression models for survival data are more easily grasped by observing how covariates affect the hazard.

2.3.1 Interpreting the cumulative hazard

The cumulative hazard function, H(t), has much more to offer than merely an inter- mediate calculation to derive a survivor function from a hazard function. Hazards are rates, and in that respect they are not unlike the RPM-revolutions per minute-of an automobile engine.

Cumulative hazards are the integral from zero to t of the hazard rates. Because an integral is really just a sum, a cumulative hazard is like the total number of revolutions an automobile's engine makes over a given period. We could form the cumulative- revolution function by integrating RPM over time. If we let a car engine run at a constant 2,000 RPM for 2 minutes, then the cumulative revolution function at time 2 minutes would be 4,000, meaning the engine would have revolved 4,000 times over that period. Similarly, if a person faced a constant hazard rate of 2,000/minute (a big risk) for 2 minutes, he would face a total hazard of 4,000. Going back to the car engine, if we raced the engine at 3,000 RPM for 1 minute and then let it idle at 1,000 for another, the total number of revolutions would still be 4,000. If our fictional risk taker faced a hazard of 3,000/minute for 1 minute and then a hazard of 1,000/minute for another, the total risk would still be 4,000.

Now let's stick with our fictional friend. Whatever the profile of risk, if the cu- mulative hazard is the same over a 2-minute period, then the probability of the event (presumably death) occurring during that 2-minute period is the same.

Let's understand the units of this measurement of risk. In this, cumulative hazards are more easily understood than the hazard rates themselves. Remember that S(t) =

exp{ -H(t)}, so our fictional friend has a probability of surviving the 2-minute interval of exp( -4000): our friend is going to die. One may similarly calculate the probability of survival given other values for the cumulative hazard.

Probabilities, however, are not the best way to think about cumulative hazards.

Another interpretation of the cumulative hazard is that it records the number of times we would expect (mathematically) to observe failures over a given period, if only the failure event were repeatable. With our fictional friend, the cumulative hazard of 4,000 over the 2-minute period means that we would expect him to die 4,000 times if, as in a video game, each time he died we could instantly resurrect him and let him continue on his risky path.

This approach is called the count-data interpretation of the cumulative hazard, and learning to think this way has its advantages.

t> Example

To see the count-data interpretation in action in Stata, let's consider an example using, as before, the Weibull distribution with shape parameter p = 3, which has a cumulative hazard H(t) = t3. For the time interval (0,4), because H(4) = 64, we can interpret this to mean that if failure were repeatable, we would expect 64 failures over this period.

This fact may be verified via simulation. We proceed, as we did in section 2.2, by generating times to failure in a repeated-failure setting, where each failure time is conditional on being greater than the previous one. This time, however, we will repeat the process 1,000 times; for each replication, we observe the random quantity N, the number of failure times that are less than t = 4. For each replication, we count the number of failures that occur in the interval (0,4) and record this count. In Stata, this may be done via the simulate command; see [R] simulate.

clear all set seed 12345 program genfail

1. drop _all

2. set obs 200 3 . gen u = run if orm ()

4. gent = (-ln(1-u))-(1/3) in 1

5. replace t = ((t[_n-1])-3- ln(1-u))-(1/3) in 2/L 6. count if t<4

7. end

simulate nfail=r(N), reps(1000) nodots: genfail command: genfail

nfail: r(N) summarize

Variable Obs Mean Std. Dev. Min Max

nfail 1000 63.788 7.792932 38 95

This simulation thus helps to verify that E ( N) = H ( 4) = 64, and if we replicated this experiment infinitely often, the mean of our simulated N values would equal 64.

<I 0 Technical note

In the above simulation, the line set obs 200 sets the size of the data to 200 ob- servations for each replication of the counting failures experiment. Theoretically, any number of failures is possible in the interval (0,4), yet the probability of observing more than 200 failures is small enough to make this an acceptable upper limit.

0

2.3.2 Interpreting the hazard rate 15 There is no contradiction between the probability-of-survival interpretation and the repeated-failure count interpretation, but be careful. Consider a cumulative hazard equal to 1. One interpretation is that we expect to observe one failure over the interval.

The other interpretation is that the probability we will observe no failures over the interval is exp( -1) = 0.368, so the probability that we will observe the failure event is 1- 0.368 = 0.632. Are you surprised that the chances are not 100%?

More correctly, we should have said that the chance we will observe one or more failures per subject is 0.632, except that if this failure event is absorbing (that is, deathlike in that it can occur only once), we will never observe the second and later failures because the first failure will prohibit these observations. Regardless of whether this is observable in the actual process being described, cumulative hazards must be interpreted in a context that allows repeated failures. The probability that we will observe zero failures is 0.368, and the probability that we will observe one or more failures is 0.632. Moreover, we can decompose that into the probability of one failure, the probability of two failures, etc., and in doing so we can compose a probability mass function for a random variable that has an expected value of 1. Because this expectation of 1 contains contributions based on two and more failures, it must be counteracted by a nonzero probability (0.368) of having no failures.

2.3.2 Interpreting the hazard rate

Hazard rates are rates; that is, they have units 1/t. You can interpret hazard rates just as you interpret cumulative hazards if you multiply them by t. Then you are saying,

"The hazard rate is such that, were that rate to continue for 1 time unit, we would expect that .... "

For instance, if the hazard rate is 2/day, then it is such that, were that rate to continue for an entire day, you would expect two failures, or, if you prefer, the chances of observing at least one failure would be 1 - exp( -2) = 0.8647.

There is a subtle distinction here. If the cumulative hazard over a period is 2-if the integrated instantaneous hazard rate over the period is 2-then over that period you would expect two failures, regardless of the time profile of the hazard rates themselves.

During that period, the hazard rate might be constant, increasing, decreasing, or any combination of these. If, on the other hand, the hazard rate is 2/day at some instant, then failures are happening at the rate of 2/day at that instant. However, you would expect only two failures over a period of a day if that hazard rate stayed constant over that period or varied in such a way as to integrate to 2 over that day.

Hazard rates, were they to stay constant, have a third interpretation. Hazard rates have units 1/t; hence, the reciprocal of the hazard has units t and represents how long you would expect to have to wait for a failure if the hazard rate stayed at that level. If the hazard rate is 2/day, then were the hazard rate to remain at that level, we would expect to wait half a day for a failure. In fact, a constant hazard rate is what characterizes the classic Poisson counting process. In this process, it can be shown that if the expected

time between failures is half a day with constant hazard, then the number of failures that occur in any given day (if failures were repeatable) is a Poisson random variable with expected value (and variance) equal to 2.

Một phần của tài liệu An introduction to survival analysis using stata (Trang 42 - 45)

Tải bản đầy đủ (PDF)

(441 trang)