Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival
Load dependencies
library(biostat3) # loads the survival and muhaz packages
library(dplyr) # for data manipulation
library(bshazard) # for smoothed hazards
library(knitr) # for html tables
library(tinyplot) # plt() for nicer base graphics
## utility function
as.data.frame.bshazard <- function(x, ...)
with(x, data.frame(time,hazard,lower.ci,upper.ci))
We define two 1/0 variables for the events. We then list the first few observations to get an idea about the data.
## Create 0/1 outcome variables and then show first six rows of the dataset
melanoma <-
transform(biostat3::melanoma,
death_cancer = ifelse(status == "Dead: cancer", 1, 0),
death_all = ifelse(status %in% c("Dead: cancer", "Dead: other"), 1, 0))
head(melanoma) |> knitr::kable("html")
sex | age | stage | mmdx | yydx | surv_mm | surv_yy | status | subsite | year8594 | dx | exit | agegrp | id | ydx | yexit | death_cancer | death_all |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Female | 81 | Localised | 2 | 1981 | 26.5 | 2.5 | Dead: other | Head and Neck | Diagnosed 75-84 | 1981-02-02 | 1983-04-20 | 75+ | 1 | 1981.088 | 1983.298 | 0 | 1 |
Female | 75 | Localised | 9 | 1975 | 55.5 | 4.5 | Dead: other | Head and Neck | Diagnosed 75-84 | 1975-09-21 | 1980-05-07 | 75+ | 2 | 1975.720 | 1980.348 | 0 | 1 |
Female | 78 | Localised | 2 | 1978 | 177.5 | 14.5 | Dead: other | Limbs | Diagnosed 75-84 | 1978-02-21 | 1992-12-07 | 75+ | 3 | 1978.140 | 1992.934 | 0 | 1 |
Female | 75 | Unknown | 8 | 1975 | 29.5 | 2.5 | Dead: cancer | Multiple and NOS | Diagnosed 75-84 | 1975-08-25 | 1978-02-08 | 75+ | 4 | 1975.646 | 1978.104 | 1 | 1 |
Female | 81 | Unknown | 7 | 1981 | 57.5 | 4.5 | Dead: other | Head and Neck | Diagnosed 75-84 | 1981-07-09 | 1986-04-25 | 75+ | 5 | 1981.517 | 1986.312 | 0 | 1 |
Female | 75 | Localised | 9 | 1975 | 19.5 | 1.5 | Dead: cancer | Trunk | Diagnosed 75-84 | 1975-09-03 | 1977-04-19 | 75+ | 6 | 1975.671 | 1977.296 | 1 | 1 |
(a) Plot estimates of the survivor function and hazard function by stage
We now tabulate the distribution of the melanoma patients by cancer stage at diagnosis.
## Tabulate by stage
Freq <- xtabs(~stage, data=melanoma)
cbind(Freq, Prop=proportions(Freq)) |> kable("html")
Freq | Prop | |
---|---|---|
Unknown | 1631 | 0.2097749 |
Localised | 5318 | 0.6839871 |
Regional | 350 | 0.0450161 |
Distant | 476 | 0.0612219 |
We then plot the survival and survival by stage.
survfit(Surv(surv_mm/12, death_cancer) ~ stage, data = melanoma) |>
plot(col=1:4,
xlab = "Time since diagnosis (years)",
ylab = "Survival")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
Survival depends heavily on stage. It is interesting to note that patients with stage 0 (unknown) appear to have a similar survival to patients with stage 1 (localized).
library(dplyr)
hazards <- group_by(melanoma, stage) |>
do(as.data.frame(bshazard(Surv(surv_mm/12, death_cancer)~1, data=., verbose=FALSE))) |>
ungroup()
with(hazards, plt(hazard~time|stage, ymin=lower.ci, ymax=upper.ci,
type="ribbon",
xlab="Time since diagnosis (years)",
ylab="Hazard",
col=1:4, lty=1, xlim=c(0,20), ylim=0:1))
ggplot(hazards,aes(x=time,y=hazard,group=stage)) + geom_line(aes(col=stage)) +
geom_ribbon(aes(ymin=lower.ci, ymax=upper.ci, fill=stage), alpha=0.3) +
xlim(0,20) + ylim(0,1) +
xlab('Time since diagnosis (years)') + ylab('Hazard')
## Warning: Removed 23 rows containing missing values or values outside the scale range
## (`geom_line()`).
As an extension, we can use thebshazard
to calculate the
hazards with confidence intervals (see below). Note, however, that the
bshazard function will adjust for covariates rather than
stratify by covariates. This means that we need to divide the dataset
into strata and calculate the smoothed hazards separately. I have shown
one approach using dplyr
for dividing the data, with the
plots use tinyplot
and ggplot
, which allows
for over-lapping confidence intervals (using the alpha
transparency argument).
(b) Estimate the mortality rates for each stage using, for example,
the survRate
command
stage | tstop | event | rate | lower | upper | |
---|---|---|---|---|---|---|
stage=Unknown | Unknown | 10267.12 | 274 | 0.0266871 | 0.0236204 | 0.0300414 |
stage=Localised | Localised | 38626.58 | 1013 | 0.0262255 | 0.0246351 | 0.0278915 |
stage=Regional | Regional | 1500.25 | 218 | 0.1453091 | 0.1266589 | 0.1659324 |
stage=Distant | Distant | 875.75 | 408 | 0.4658864 | 0.4217712 | 0.5133615 |
The time unit is years (since we specified surv_mm/12
as
the analysis time). Therefore, the units of the rates shown above are
events/person-years.
We can also do this using more general tools:
library(dplyr)
melanoma |>
group_by(stage) |>
summarise(D = sum(death_cancer), M = sum(surv_mm/12), Rate = D/M,
CI_low = stats::poisson.test(D,M)$conf.int[1],
CI_high = stats::poisson.test(D,M)$conf.int[2]) |>
kable("html")
stage | D | M | Rate | CI_low | CI_high |
---|---|---|---|---|---|
Unknown | 274 | 10267.12 | 0.0266871 | 0.0236204 | 0.0300414 |
Localised | 1013 | 38626.58 | 0.0262255 | 0.0246351 | 0.0278915 |
Regional | 218 | 1500.25 | 0.1453091 | 0.1266589 | 0.1659324 |
Distant | 408 | 875.75 | 0.4658864 | 0.4217712 | 0.5133615 |
(c)
Here we tabulate crude rates per 1000 person-years. For example,
stage | tstop | event | rate | lower | upper | |
---|---|---|---|---|---|---|
stage=Unknown | Unknown | 10.26713 | 274 | 26.68712 | 23.62045 | 30.04141 |
stage=Localised | Localised | 38.62658 | 1013 | 26.22546 | 24.63514 | 27.89150 |
stage=Regional | Regional | 1.50025 | 218 | 145.30912 | 126.65886 | 165.93238 |
stage=Distant | Distant | 0.87575 | 408 | 465.88638 | 421.77124 | 513.36147 |
(d)
Below we see that the crude mortality rate is higher for males than for females.
sex | tstop | event | rate | lower | upper | |
---|---|---|---|---|---|---|
sex=Male | Male | 21.96892 | 1074 | 48.88725 | 46.00684 | 51.90076 |
sex=Female | Female | 29.30079 | 839 | 28.63404 | 26.72903 | 30.63898 |
We see that the crude mortality rate is higher for males than females, a difference which is also reflected in the survival and hazard curves:
survfit(Surv(surv_mm/12, death_cancer) ~ sex, data = melanoma) |>
plot(col=1:2,
xlab = "Time since diagnosis (years)",
ylab = "Survival")
legend("topright", legend=levels(melanoma$sex), lty=1, col=1:2)
library(dplyr)
hazards <- group_by(melanoma, sex) |>
do(as.data.frame(bshazard(Surv(surv_mm/12, death_cancer)~1, data=., verbose=FALSE))) |>
ungroup()
with(hazards, plt(hazard~time|sex, ymin=lower.ci, ymax=upper.ci,
type="ribbon",
xlab="Time since diagnosis (years)",
ylab="Hazard",
col=1:2, lty=1, xlim=c(0,20)))
(e)
The majority of patients are alive at end of study. 1,913 died from cancer while 1,134 died from another cause. The cause of death is highly depending of age, as young people die less from other causes. To observe this we tabulate the events by age group.
## agegrp
## status 0-44 45-59 60-74 75+
## Alive 1615 1568 1178 359
## Dead: cancer 386 522 640 365
## Dead: other 39 147 461 487
## Lost to follow-up 6 1 1 0
## agegrp
## status 0-44 45-59 60-74 75+
## Alive 78.9 70.1 51.7 29.6
## Dead: cancer 18.9 23.3 28.1 30.1
## Dead: other 1.9 6.6 20.2 40.2
## Lost to follow-up 0.3 0.0 0.0 0.0
##
## Pearson's Chi-squared test
##
## data: Freq[-4, ]
## X-squared = 1341.4, df = 6, p-value < 2.2e-16
(f)
The survival is worse for all-cause survival than for cause-specific, since you now can die from other causes, and these deaths are incorporated in the Kaplan-Meier estimates. The ”other cause” mortality is particularly present in patients with localised and unknown stage.
(g)
By comparing Kaplan-Meier estimates for cancer deaths with all-cause mortality conditioned on age over 75 years, we see that the “other” cause mortality is particularly influential in patients with localised and unknown stage. Patients with localised disease, have a better prognosis (i.e. the cancer does not kill them), and are thus more likely to experience death from another cause. For regional and distant stage, the cancer is more aggressive and is the cause of death for most of these patients (i.e. it is the cancer that kills these patients before they have “the chance” to die from something else).
par(mfrow=c(1, 2))
survfit(Surv(surv_mm/12, death_cancer) ~ stage, data = subset(melanoma,agegrp=="75+")) |>
plot(col=1:4,
xlab = "Time since diagnosis (years)",
ylab = "Survival",
main = "Kaplan-Meier survival estimates\nCancer | Age 75+")
survfit(Surv(surv_mm/12, death_all) ~ stage, data = subset(melanoma,agegrp=="75+")) |>
plot(col=1:4,
xlab = "Time since diagnosis (years)",
ylab = "Survival",
main = "Kaplan-Meier survival estimates\nAll-cause | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
(h) Compare Kaplan-Meier estimates for cancer deaths with all-cause mortality by age group.
par(mfrow=c(1, 2))
survfit(Surv(surv_mm/12, death_cancer) ~ agegrp, data = melanoma) |>
plot(col=1:4,
xlab = "Time since diagnosis (years)",
ylab = "Survival",
main = "Kaplan-Meier estimates of\ncancer survival by age group")
survfit(Surv(surv_mm/12, death_all) ~ agegrp, data = melanoma) |>
plot(col=1:4,
xlab = "Time since diagnois",
ylab = "Survival",
main = "Kaplan-Meier estimates of\nall-cause survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)