Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival

Load dependencies

library(biostat3) # loads the survival and muhaz packages
library(dplyr)    # for data manipulation
library(bshazard) # for smoothed hazards
library(knitr)    # for html tables
library(tinyplot) # plt() for nicer base graphics
## utility function
as.data.frame.bshazard <- function(x, ...)
    with(x, data.frame(time,hazard,lower.ci,upper.ci))

We define two 1/0 variables for the events. We then list the first few observations to get an idea about the data.

## Create 0/1 outcome variables and then show first six rows of the dataset
melanoma <- 
    transform(biostat3::melanoma,
              death_cancer = ifelse(status == "Dead: cancer", 1, 0),
              death_all = ifelse(status %in% c("Dead: cancer", "Dead: other"), 1, 0))
head(melanoma) |> knitr::kable("html")

sex	age	stage	mmdx	yydx	surv_mm	surv_yy	status	subsite	year8594	dx	exit	agegrp	id	ydx	yexit	death_cancer	death_all
Female	81	Localised	2	1981	26.5	2.5	Dead: other	Head and Neck	Diagnosed 75-84	1981-02-02	1983-04-20	75+	1	1981.088	1983.298	0	1
Female	75	Localised	9	1975	55.5	4.5	Dead: other	Head and Neck	Diagnosed 75-84	1975-09-21	1980-05-07	75+	2	1975.720	1980.348	0	1
Female	78	Localised	2	1978	177.5	14.5	Dead: other	Limbs	Diagnosed 75-84	1978-02-21	1992-12-07	75+	3	1978.140	1992.934	0	1
Female	75	Unknown	8	1975	29.5	2.5	Dead: cancer	Multiple and NOS	Diagnosed 75-84	1975-08-25	1978-02-08	75+	4	1975.646	1978.104	1	1
Female	81	Unknown	7	1981	57.5	4.5	Dead: other	Head and Neck	Diagnosed 75-84	1981-07-09	1986-04-25	75+	5	1981.517	1986.312	0	1
Female	75	Localised	9	1975	19.5	1.5	Dead: cancer	Trunk	Diagnosed 75-84	1975-09-03	1977-04-19	75+	6	1975.671	1977.296	1	1

(a) Plot estimates of the survivor function and hazard function by stage

We now tabulate the distribution of the melanoma patients by cancer stage at diagnosis.

## Tabulate by stage
Freq <- xtabs(~stage, data=melanoma)
cbind(Freq, Prop=proportions(Freq)) |> kable("html")

	Freq	Prop
Unknown	1631	0.2097749
Localised	5318	0.6839871
Regional	350	0.0450161
Distant	476	0.0612219

We then plot the survival and survival by stage.

survfit(Surv(surv_mm/12, death_cancer) ~ stage, data = melanoma) |>
    plot(col=1:4,
         xlab = "Time since diagnosis (years)",
         ylab = "Survival")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

Survival depends heavily on stage. It is interesting to note that patients with stage 0 (unknown) appear to have a similar survival to patients with stage 1 (localized).

library(dplyr)
hazards <- group_by(melanoma, stage) |> 
    do(as.data.frame(bshazard(Surv(surv_mm/12, death_cancer)~1, data=., verbose=FALSE))) |>
    ungroup()
with(hazards, plt(hazard~time|stage, ymin=lower.ci, ymax=upper.ci,
                  type="ribbon",
                  xlab="Time since diagnosis (years)",
                  ylab="Hazard",
                  col=1:4, lty=1, xlim=c(0,20), ylim=0:1))

ggplot(hazards,aes(x=time,y=hazard,group=stage)) + geom_line(aes(col=stage)) +
    geom_ribbon(aes(ymin=lower.ci, ymax=upper.ci, fill=stage), alpha=0.3) +
    xlim(0,20) + ylim(0,1) + 
    xlab('Time since diagnosis (years)') + ylab('Hazard')

## Warning: Removed 23 rows containing missing values or values outside the scale range
## (`geom_line()`).

As an extension, we can use thebshazard to calculate the hazards with confidence intervals (see below). Note, however, that the bshazard function will adjust for covariates rather than stratify by covariates. This means that we need to divide the dataset into strata and calculate the smoothed hazards separately. I have shown one approach using dplyr for dividing the data, with the plots use tinyplot and ggplot, which allows for over-lapping confidence intervals (using the alpha transparency argument).

(b) Estimate the mortality rates for each stage using, for example, the `survRate` command

survRate(Surv(surv_mm/12, death_cancer) ~ stage, data=melanoma) |> kable("html")

	stage	tstop	event	rate	lower	upper
stage=Unknown	Unknown	10267.12	274	0.0266871	0.0236204	0.0300414
stage=Localised	Localised	38626.58	1013	0.0262255	0.0246351	0.0278915
stage=Regional	Regional	1500.25	218	0.1453091	0.1266589	0.1659324
stage=Distant	Distant	875.75	408	0.4658864	0.4217712	0.5133615

The time unit is years (since we specified surv_mm/12 as the analysis time). Therefore, the units of the rates shown above are events/person-years.

We can also do this using more general tools:

library(dplyr)
melanoma |>
    group_by(stage) |>
    summarise(D = sum(death_cancer), M = sum(surv_mm/12), Rate = D/M,
              CI_low = stats::poisson.test(D,M)$conf.int[1],
              CI_high = stats::poisson.test(D,M)$conf.int[2]) |>
    kable("html")

stage	D	M	Rate	CI_low	CI_high
Unknown	274	10267.12	0.0266871	0.0236204	0.0300414
Localised	1013	38626.58	0.0262255	0.0246351	0.0278915
Regional	218	1500.25	0.1453091	0.1266589	0.1659324
Distant	408	875.75	0.4658864	0.4217712	0.5133615

(c)

Here we tabulate crude rates per 1000 person-years. For example,

survRate(Surv(surv_mm/12/1000, death_cancer) ~ stage, data=melanoma) |>
    kable("html")

	stage	tstop	event	rate	lower	upper
stage=Unknown	Unknown	10.26713	274	26.68712	23.62045	30.04141
stage=Localised	Localised	38.62658	1013	26.22546	24.63514	27.89150
stage=Regional	Regional	1.50025	218	145.30912	126.65886	165.93238
stage=Distant	Distant	0.87575	408	465.88638	421.77124	513.36147

(d)

Below we see that the crude mortality rate is higher for males than for females.

survRate(Surv(surv_mm/12/1000, death_cancer) ~ sex, data=melanoma) |>
    kable("html")

	sex	tstop	event	rate	lower	upper
sex=Male	Male	21.96892	1074	48.88725	46.00684	51.90076
sex=Female	Female	29.30079	839	28.63404	26.72903	30.63898

We see that the crude mortality rate is higher for males than females, a difference which is also reflected in the survival and hazard curves:

survfit(Surv(surv_mm/12, death_cancer) ~ sex, data = melanoma) |>
    plot(col=1:2,
         xlab = "Time since diagnosis (years)",
         ylab = "Survival")
legend("topright", legend=levels(melanoma$sex), lty=1, col=1:2)

library(dplyr)
hazards <- group_by(melanoma, sex) |> 
    do(as.data.frame(bshazard(Surv(surv_mm/12, death_cancer)~1, data=., verbose=FALSE))) |>
    ungroup()
with(hazards, plt(hazard~time|sex, ymin=lower.ci, ymax=upper.ci,
                  type="ribbon",
                  xlab="Time since diagnosis (years)",
                  ylab="Hazard",
                  col=1:2, lty=1, xlim=c(0,20)))

(e)

The majority of patients are alive at end of study. 1,913 died from cancer while 1,134 died from another cause. The cause of death is highly depending of age, as young people die less from other causes. To observe this we tabulate the events by age group.

Freq <- xtabs(~status+agegrp, melanoma)
Freq

##                    agegrp
## status              0-44 45-59 60-74  75+
##   Alive             1615  1568  1178  359
##   Dead: cancer       386   522   640  365
##   Dead: other         39   147   461  487
##   Lost to follow-up    6     1     1    0

round(proportions(Freq,"agegrp")*100,1) # proportions for status by agegrp

##                    agegrp
## status              0-44 45-59 60-74  75+
##   Alive             78.9  70.1  51.7 29.6
##   Dead: cancer      18.9  23.3  28.1 30.1
##   Dead: other        1.9   6.6  20.2 40.2
##   Lost to follow-up  0.3   0.0   0.0  0.0

chisq.test(Freq[-4,])

## 
##  Pearson's Chi-squared test
## 
## data:  Freq[-4, ]
## X-squared = 1341.4, df = 6, p-value < 2.2e-16

(f)

The survival is worse for all-cause survival than for cause-specific, since you now can die from other causes, and these deaths are incorporated in the Kaplan-Meier estimates. The ”other cause” mortality is particularly present in patients with localised and unknown stage.

par(mfrow=c(1, 1))
survfit(Surv(surv_mm/12, death_all) ~ stage, data = melanoma) |>
    plot(col=1:4,
         xlab = "Time since diagnosis (years)",
         ylab = "Survival",
         main = "Kaplan-Meier survival estimates\nAll-cause")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

(g)

By comparing Kaplan-Meier estimates for cancer deaths with all-cause mortality conditioned on age over 75 years, we see that the “other” cause mortality is particularly influential in patients with localised and unknown stage. Patients with localised disease, have a better prognosis (i.e. the cancer does not kill them), and are thus more likely to experience death from another cause. For regional and distant stage, the cancer is more aggressive and is the cause of death for most of these patients (i.e. it is the cancer that kills these patients before they have “the chance” to die from something else).

par(mfrow=c(1, 2))
survfit(Surv(surv_mm/12, death_cancer) ~ stage, data = subset(melanoma,agegrp=="75+")) |>
    plot(col=1:4,
         xlab = "Time since diagnosis (years)",
         ylab = "Survival",
         main = "Kaplan-Meier survival estimates\nCancer | Age 75+")
survfit(Surv(surv_mm/12, death_all) ~ stage, data = subset(melanoma,agegrp=="75+")) |>
    plot(col=1:4,
         xlab = "Time since diagnosis (years)",
         ylab = "Survival",
         main = "Kaplan-Meier survival estimates\nAll-cause | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

(h) Compare Kaplan-Meier estimates for cancer deaths with all-cause mortality by age group.

par(mfrow=c(1, 2))
survfit(Surv(surv_mm/12, death_cancer) ~ agegrp, data = melanoma) |>
    plot(col=1:4,
         xlab = "Time since diagnosis (years)",
         ylab = "Survival",
         main = "Kaplan-Meier estimates of\ncancer survival by age group")
survfit(Surv(surv_mm/12, death_all) ~ agegrp, data = melanoma) |>
    plot(col=1:4,
         xlab = "Time since diagnois",
         ylab = "Survival",
         main = "Kaplan-Meier estimates of\nall-cause survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)

Biostatistics III in R

Code by Annika Tillander, Andreas Karlsson and Mark Clements

Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival

(a) Plot estimates of the survivor function and hazard function by stage

(b) Estimate the mortality rates for each stage using, for example, the `survRate` command

(c)

(d)

(e)

(f)

(g)

(h) Compare Kaplan-Meier estimates for cancer deaths with all-cause mortality by age group.

Biostatistics III in R

Code by Annika Tillander, Andreas Karlsson and Mark Clements

Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival

(a) Plot estimates of the survivor function and hazard function by stage

(b) Estimate the mortality rates for each stage using, for example, the survRate command

(c)

(d)

(e)

(f)

(g)

(h) Compare Kaplan-Meier estimates for cancer deaths with all-cause mortality by age group.

(b) Estimate the mortality rates for each stage using, for example, the `survRate` command