Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival
Load dependencies
We start by listing the first few observations to get an idea about the data. We then define two 1/0 variables for the events that we are interested in.
## sex age stage mmdx yydx surv_mm surv_yy status subsite
## 1 Female 81 Localised 2 1981 26.5 2.5 Dead: other Head and Neck
## 2 Female 75 Localised 9 1975 55.5 4.5 Dead: other Head and Neck
## 3 Female 78 Localised 2 1978 177.5 14.5 Dead: other Limbs
## 4 Female 75 Unknown 8 1975 29.5 2.5 Dead: cancer Multiple and NOS
## 5 Female 81 Unknown 7 1981 57.5 4.5 Dead: other Head and Neck
## 6 Female 75 Localised 9 1975 19.5 1.5 Dead: cancer Trunk
## year8594 dx exit agegrp id ydx yexit
## 1 Diagnosed 75-84 1981-02-02 1983-04-20 75+ 1 1981.088 1983.298
## 2 Diagnosed 75-84 1975-09-21 1980-05-07 75+ 2 1975.720 1980.348
## 3 Diagnosed 75-84 1978-02-21 1992-12-07 75+ 3 1978.140 1992.934
## 4 Diagnosed 75-84 1975-08-25 1978-02-08 75+ 4 1975.646 1978.104
## 5 Diagnosed 75-84 1981-07-09 1986-04-25 75+ 5 1981.517 1986.312
## 6 Diagnosed 75-84 1975-09-03 1977-04-19 75+ 6 1975.671 1977.296
## Create 0/1 outcome variables
melanoma <-
transform(melanoma,
death_cancer = ifelse( status == "Dead: cancer", 1, 0),
death_all = ifelse( status == "Dead: cancer" |
status == "Dead: other", 1, 0))
We convert the logical variables to 0/1 because some of the R
functions use logical variables (with values TRUE
and FALSE
) as factors. A short-hand approach for converting TRUE
to 1 and FALSE
to 0 is to add 0:
## [1] 1 0
(a) Plot estimates of the survivor function and hazard function by stage
We now tabulate the distribution of the melanoma patients by cancer stage at diagnosis.
## Freq Prop
## Unknown 1631 0.20977492
## Localised 5318 0.68398714
## Regional 350 0.04501608
## Distant 476 0.06122186
We then plot the survival and survival by stage.
par(mfrow=c(1, 2))
mfit <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = melanoma)
plot(mfit, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival")
## legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
hazards <- muhaz2(Surv(surv_mm, death_cancer)~stage, melanoma)
plot(hazards,
col=1:4, lty=1, xlim=c(0,250), ylim=c(0,0.08),
legend.args=list(bty="n"))
Survival depends heavily on stage. It is interesting to note that patients with stage 0 (unknown) appear to have a similar survival to patients with stage 1 (localized).
As an extension, we can use thebshazard
to calculate the hazards with confidence intervals (see below). Note, however, that the bshazard function will adjust for covariates rather than stratify by covariates. This means that we need to divide the dataset into strata and calculate the smoothed hazards separately. I have shown one approach using dplyr
for dividing the data, with he plots use ggplot
, which allows for over-lapping confidence intervals (using the alpha
transparency argument).
library(bshazard)
library(ggplot2)
as.data.frame.bshazard <- function(x, ...) {
with(x, data.frame(time,hazard,lower.ci,upper.ci))
}
hazards <- group_by(melanoma, stage) %>%
do(as.data.frame(bshazard(Surv(surv_mm, death_cancer)~1, data=., verbose=FALSE))) %>%
ungroup
ggplot(hazards,aes(x=time,y=hazard,group=stage)) + geom_line(aes(col=stage)) +
geom_ribbon(aes(ymin=lower.ci, ymax=upper.ci, fill=stage), alpha=0.3) + ylim(0,0.1) +
xlab('Follow-up Time') + ylab('Hazard')
(b) Estimate the mortality rates for each stage using, for example, the survRate
command
## stage tstop event rate lower upper
## stage=Unknown Unknown 123205.5 274 0.002223927 0.001968371 0.002503450
## stage=Localised Localised 463519.0 1013 0.002185455 0.002052929 0.002324292
## stage=Regional Regional 18003.0 218 0.012109093 0.010554905 0.013827698
## stage=Distant Distant 10509.0 408 0.038823865 0.035147603 0.042780122
The time unit is months (since we specified surv_mm
as the analysis time). Therefore, the units of the rates shown above are events/person-month.
(c)
Here we tabulate crude rates per person-year and per 1000 person-years. We could divide the times by 12 to obtain estimates with units of the number of events per person-year. For example,
## stage tstop event rate lower upper
## stage=Unknown Unknown 10267.12 274 0.02668712 0.02362045 0.03004141
## stage=Localised Localised 38626.58 1013 0.02622546 0.02463514 0.02789150
## stage=Regional Regional 1500.25 218 0.14530912 0.12665886 0.16593238
## stage=Distant Distant 875.75 408 0.46588638 0.42177124 0.51336147
To obtain mortality rates per 1000 person-years:
## stage tstop event rate lower upper
## stage=Unknown Unknown 10.26713 274 26.68712 23.62045 30.04141
## stage=Localised Localised 38.62658 1013 26.22546 24.63514 27.89150
## stage=Regional Regional 1.50025 218 145.30912 126.65886 165.93238
## stage=Distant Distant 0.87575 408 465.88638 421.77124 513.36147
(d)
Below we see that the crude mortality rate is higher for males than for females.
## sex tstop event rate lower upper
## sex=Male Male 21.96892 1074 48.88725 46.00684 51.90076
## sex=Female Female 29.30079 839 28.63404 26.72903 30.63898
We see that the crude mortality rate is higher for males than females, a difference which is also reflected in the survival and hazard curves:
(e)
The majority of patients are alive at end of study. 1,913 died from cancer while 1,134 died from another cause. The cause of death is highly depending of age, as young people die less from other causes. To observe this we tabulate the events by age group.
## agegrp
## status 0-44 45-59 60-74 75+
## Alive 1615 1568 1178 359
## Dead: cancer 386 522 640 365
## Dead: other 39 147 461 487
## Lost to follow-up 6 1 1 0
(f)
The survival is worse for all-cause survival than for cause-specific, since you now can die from other causes, and these deaths are incorporated in the Kaplan-Meier estimates. The ”other cause” mortality is particularly present in patients with localised and unknown stage.
(g)
By comparing Kaplan-Meier estimates for cancer deaths with all-cause mortality conditioned on age over 75 years, we see that the “other” cause mortality is particularly influential in patients with localised and unknown stage. Patients with localised disease, have a better prognosis (i.e. the cancer does not kill them), and are thus more likely to experience death from another cause. For regional and distant stage, the cancer is more aggressive and is the cause of death for most of these patients (i.e. it is the cancer that kills these patients before they have “the chance” to die from something else).
par(mfrow=c(1, 2))
mfit75 <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(mfit75, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier survival estimates\nCancer | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
afit75 <- survfit(Surv(surv_mm, death_all) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(afit75, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier survival estimates\nAll-cause | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
(h) Compare Kaplan-Meier estimates for cancer deaths with all-cause mortality by age group.
par(mfrow=c(1, 2))
mfitage <- survfit(Surv(surv_mm, death_cancer) ~ agegrp, data = melanoma)
plot(mfitage, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier estimates of\ncancer survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)
afitage <- survfit(Surv(surv_mm, death_all) ~ agegrp, data = melanoma)
plot(afitage, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier estimates of\nall-cause survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)