Exercise 1: Life tables and Kaplan-Meier estimates of survival
(a) Hand calculation: Life table and Kaplan-Meier estimates of survival
The results are contained in an Excel file and are also shown in the R output below.
(b) Using R to validate the hand calculations done in part 1 (a)
First, load the biostat3
library for the dataset ,
survminer
for plots and knitr
for nice html
output:
Following are the life table estimates. Note that in the lectures, when we estimated all-cause survival, there were 8 deaths in the first interval. One of these died of a cause other than cancer so in the cause-specific survival analysis we see that there are 7 ‘deaths’ and 1 censoring (Stata uses the term ‘lost’ for lost to follow-up) in the first interval.
lifetab2(Surv(floor(surv_yy), status == "Dead: cancer")~1,
colon_sample, breaks=0:10) |>
kable("html", digits=2)
tstart | tstop | nsubs | nlost | nrisk | nevent | surv | hazard | se.surv | se.pdf | se.hazard | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0-1 | 0 | 1 | 35 | 1 | 34.5 | 7 | 1.00 | 0.20 | 0.23 | 0.00 | 0.07 | 0.08 |
1-2 | 1 | 2 | 27 | 3 | 25.5 | 1 | 0.80 | 0.03 | 0.04 | 0.07 | 0.03 | 0.04 |
2-3 | 2 | 3 | 23 | 4 | 21.0 | 5 | 0.77 | 0.18 | 0.27 | 0.07 | 0.07 | 0.12 |
3-4 | 3 | 4 | 14 | 1 | 13.5 | 2 | 0.58 | 0.09 | 0.16 | 0.09 | 0.06 | 0.11 |
4-5 | 4 | 5 | 11 | 1 | 10.5 | 0 | 0.50 | 0.00 | 0.00 | 0.10 | NaN | NaN |
5-6 | 5 | 6 | 10 | 0 | 10.0 | 0 | 0.50 | 0.00 | 0.00 | 0.10 | NaN | NaN |
6-7 | 6 | 7 | 10 | 3 | 8.5 | 0 | 0.50 | 0.00 | 0.00 | 0.10 | NaN | NaN |
7-8 | 7 | 8 | 7 | 1 | 6.5 | 0 | 0.50 | 0.00 | 0.00 | 0.10 | NaN | NaN |
8-9 | 8 | 9 | 6 | 4 | 4.0 | 1 | 0.50 | 0.12 | 0.29 | 0.10 | 0.11 | 0.28 |
9-10 | 9 | 10 | 1 | 1 | 0.5 | 0 | 0.37 | 0.00 | 0.00 | 0.13 | NaN | NaN |
10-Inf | 10 | Inf | 0 | 0 | 0.0 | 0 | 0.37 | NA | NA | 0.13 | NA | NA |
Following is a table of Kaplan-Meier estimates. Although it’s not clear from the table, the person censored (lost) at time 2 was at risk when the other person dies at time 2. On the following is a graph of the survival function.
mfit <- survfit(Surv(surv_mm/12, status == "Dead: cancer") ~ 1,
data = colon_sample) # make Kaplan-Meier estimates
summary(mfit) # print Kaplan-Meier table
## Call: survfit(formula = Surv(surv_mm/12, status == "Dead: cancer") ~
## 1, data = colon_sample)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.167 35 1 0.971 0.0282 0.918 1.000
## 0.250 33 1 0.942 0.0398 0.867 1.000
## 0.417 32 1 0.913 0.0482 0.823 1.000
## 0.583 31 1 0.883 0.0549 0.782 0.998
## 0.667 30 1 0.854 0.0605 0.743 0.981
## 0.750 29 1 0.824 0.0652 0.706 0.962
## 0.917 28 1 0.795 0.0692 0.670 0.943
## 1.833 24 1 0.762 0.0738 0.630 0.921
## 2.250 22 1 0.727 0.0781 0.589 0.898
## 2.333 20 1 0.691 0.0823 0.547 0.872
## 2.667 19 2 0.618 0.0882 0.467 0.818
## 2.750 16 1 0.579 0.0908 0.426 0.788
## 3.583 13 1 0.535 0.0941 0.379 0.755
## 3.833 12 1 0.490 0.0962 0.334 0.720
## 8.500 4 1 0.368 0.1284 0.185 0.729
plot(mfit, # plot Kaplan-Meier curve
ylab="S(t)",
xlab="Time since diagnosis (years)",
main = "Kaplan−Meier estimates of cause−specific survival")
ggsurvplot(mfit, # plot Kaplan-Meier curve
ylab="S(t)",
xlab="Time since diagnosis (years)",
main = "Kaplan−Meier estimates of cause−specific survival",
risk.table = TRUE,
conf.int = TRUE,
ggtheme = theme_minimal())