Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival

The purpose of this exercise is to study survival of the patients using two alternative measures — survival proportions and mortality rates. A second purpose is to study the difference between cause-specific and all-cause survival.

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

Load dependencies

We start by listing the first few observations to get an idea about the data. We then define two 1/0 variables for the events that we are interested in.

##      sex age     stage mmdx yydx surv_mm surv_yy       status          subsite
## 1 Female  81 Localised    2 1981    26.5     2.5  Dead: other    Head and Neck
## 2 Female  75 Localised    9 1975    55.5     4.5  Dead: other    Head and Neck
## 3 Female  78 Localised    2 1978   177.5    14.5  Dead: other            Limbs
## 4 Female  75   Unknown    8 1975    29.5     2.5 Dead: cancer Multiple and NOS
## 5 Female  81   Unknown    7 1981    57.5     4.5  Dead: other    Head and Neck
## 6 Female  75 Localised    9 1975    19.5     1.5 Dead: cancer            Trunk
##          year8594         dx       exit agegrp id      ydx    yexit
## 1 Diagnosed 75-84 1981-02-02 1983-04-20    75+  1 1981.088 1983.298
## 2 Diagnosed 75-84 1975-09-21 1980-05-07    75+  2 1975.720 1980.348
## 3 Diagnosed 75-84 1978-02-21 1992-12-07    75+  3 1978.140 1992.934
## 4 Diagnosed 75-84 1975-08-25 1978-02-08    75+  4 1975.646 1978.104
## 5 Diagnosed 75-84 1981-07-09 1986-04-25    75+  5 1981.517 1986.312
## 6 Diagnosed 75-84 1975-09-03 1977-04-19    75+  6 1975.671 1977.296

(a) Plot estimates of the survivor function and hazard function by stage

We now tabulate the distribution of the melanoma patients by cancer stage at diagnosis.

We then plot the survival and hazards by stage. Does it appear that stage is associated with patient survival?

As an extension, this could be calculated using the bshazard package, with plotting using either ggplot2 or lattice packages.

(b) Estimate the mortality rates for each stage using, for example, the survRate command

What are the units of the estimated rates? The survRate function, as the name suggests, is used to estimates rates. Look at the help pages if you are not familiar with the function (e.g. ?survRate or help(survRate)).

(c)

If you haven’t already done so, estimate the mortality rates for each stage per person-year and per 1000 person-years of follow-up.

(d)

Study whether survival is different for males and females (both by plotting the survivor function and by tabulating mortality rates). Is there a difference in survival between males and females? If yes, is the difference present throughout the follow up?

(e)

The plots you made above were based on cause-specific survival (i.e., only deaths due to cancer are counted as events, deaths due to other causes are censored). In the next part of this question we will estimate all-cause survival (i.e., any death is counted as an event). First, however, study the coding of vital status and tabulate vital status by age group.

How many patients die of each cause? Does the distribution of cause of death depend on age?

(f)

To get all-cause survival, specify all deaths (both cancer and other) as events.

Now plot the survivor proportion for all-cause survival by stage. We name the graph to be able to separate them in the graph window. Is the survivor proportion different compared to the cause-specific survival you estimated above? Why?

(g)

It is more common to die from a cause other than cancer in older ages. How does this impact the survivor proportion for different stages? Compare cause-specific and all-cause survival by plotting the survivor proportion by stage for the oldest age group (75+ years) for both cause-specific and all-cause survival.

(h)

Now estimate both cancer-specific and all-cause survival for each age group.