Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival
The purpose of this exercise is to study survival of the patients using two alternative measures — survival proportions and mortality rates. A second purpose is to study the difference between cause-specific and all-cause survival.
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
Load dependencies
We start by listing the first few observations to get an idea about the data. We then define two 1/0 variables for the events that we are interested in.
## sex age stage mmdx yydx surv_mm surv_yy status subsite
## 1 Female 81 Localised 2 1981 26.5 2.5 Dead: other Head and Neck
## 2 Female 75 Localised 9 1975 55.5 4.5 Dead: other Head and Neck
## 3 Female 78 Localised 2 1978 177.5 14.5 Dead: other Limbs
## 4 Female 75 Unknown 8 1975 29.5 2.5 Dead: cancer Multiple and NOS
## 5 Female 81 Unknown 7 1981 57.5 4.5 Dead: other Head and Neck
## 6 Female 75 Localised 9 1975 19.5 1.5 Dead: cancer Trunk
## year8594 dx exit agegrp id ydx yexit
## 1 Diagnosed 75-84 1981-02-02 1983-04-20 75+ 1 1981.088 1983.298
## 2 Diagnosed 75-84 1975-09-21 1980-05-07 75+ 2 1975.720 1980.348
## 3 Diagnosed 75-84 1978-02-21 1992-12-07 75+ 3 1978.140 1992.934
## 4 Diagnosed 75-84 1975-08-25 1978-02-08 75+ 4 1975.646 1978.104
## 5 Diagnosed 75-84 1981-07-09 1986-04-25 75+ 5 1981.517 1986.312
## 6 Diagnosed 75-84 1975-09-03 1977-04-19 75+ 6 1975.671 1977.296
## Create 0/1 outcome variables
melanoma <-
transform(melanoma,
death_cancer = ifelse( status == "Dead: cancer", 1, 0),
death_all = ifelse( status == "Dead: cancer" |
status == "Dead: other", 1, 0))
(a) Plot estimates of the survivor function and hazard function by stage
We now tabulate the distribution of the melanoma patients by cancer stage at diagnosis.
We then plot the survival and hazards by stage. Does it appear that stage is associated with patient survival?
par(mfrow=c(1, 2))
mfit <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = melanoma)
plot(mfit, col=1:4,
xlab = "Follow-up Time",
ylab = "Survival")
## legend("topright", levels(melanoma$stage), col=1:4, lty = 1)
hazards <- muhaz2(Surv(surv_mm, death_cancer)~stage, melanoma)
plot(hazards,
col=1:4, lty=1, xlim=c(0,250), ylim=c(0,0.08),
legend.args=list(bty="n"))
As an extension, this could be calculated using the bshazard
package, with plotting using either ggplot2
or lattice
packages.
(b) Estimate the mortality rates for each stage using, for example, the survRate
command
What are the units of the estimated rates? The survRate
function, as the name suggests, is used to estimates rates. Look at the help pages if you are not familiar with the function (e.g. ?survRate
or help(survRate)
).
(c)
If you haven’t already done so, estimate the mortality rates for each stage per person-year and per 1000 person-years of follow-up.
(d)
Study whether survival is different for males and females (both by plotting the survivor function and by tabulating mortality rates). Is there a difference in survival between males and females? If yes, is the difference present throughout the follow up?
(e)
The plots you made above were based on cause-specific survival (i.e., only deaths due to cancer are counted as events, deaths due to other causes are censored). In the next part of this question we will estimate all-cause survival (i.e., any death is counted as an event). First, however, study the coding of vital status and tabulate vital status by age group.
How many patients die of each cause? Does the distribution of cause of death depend on age?
(f)
To get all-cause survival, specify all deaths (both cancer and other) as events.
Now plot the survivor proportion for all-cause survival by stage. We name the graph to be able to separate them in the graph window. Is the survivor proportion different compared to the cause-specific survival you estimated above? Why?
(g)
It is more common to die from a cause other than cancer in older ages. How does this impact the survivor proportion for different stages? Compare cause-specific and all-cause survival by plotting the survivor proportion by stage for the oldest age group (75+ years) for both cause-specific and all-cause survival.
(h)
Now estimate both cancer-specific and all-cause survival for each age group.