Exercise 1: Life tables and Kaplan-Meier estimates of survival

(a) Hand calculation: Life table and Kaplan-Meier estimates of survival

The results are contained in an Excel file and are also shown in the R output below.

(b) Using R to validate the hand calculations done in part 1 (a)

First, load the biostat3 library for the dataset , survminer for plots and knitr for nice html output:

library(biostat3)
library(survminer)
library(knitr)

Following are the life table estimates. Note that in the lectures, when we estimated all-cause survival, there were 8 deaths in the first interval. One of these died of a cause other than cancer so in the cause-specific survival analysis we see that there are 7 ‘deaths’ and 1 censoring (Stata uses the term ‘lost’ for lost to follow-up) in the first interval.

lifetab2(Surv(floor(surv_yy), status == "Dead: cancer")~1,
         colon_sample, breaks=0:10) |>
    kable("html", digits=2)
tstart tstop nsubs nlost nrisk nevent surv pdf hazard se.surv se.pdf se.hazard
0-1 0 1 35 1 34.5 7 1.00 0.20 0.23 0.00 0.07 0.08
1-2 1 2 27 3 25.5 1 0.80 0.03 0.04 0.07 0.03 0.04
2-3 2 3 23 4 21.0 5 0.77 0.18 0.27 0.07 0.07 0.12
3-4 3 4 14 1 13.5 2 0.58 0.09 0.16 0.09 0.06 0.11
4-5 4 5 11 1 10.5 0 0.50 0.00 0.00 0.10 NaN NaN
5-6 5 6 10 0 10.0 0 0.50 0.00 0.00 0.10 NaN NaN
6-7 6 7 10 3 8.5 0 0.50 0.00 0.00 0.10 NaN NaN
7-8 7 8 7 1 6.5 0 0.50 0.00 0.00 0.10 NaN NaN
8-9 8 9 6 4 4.0 1 0.50 0.12 0.29 0.10 0.11 0.28
9-10 9 10 1 1 0.5 0 0.37 0.00 0.00 0.13 NaN NaN
10-Inf 10 Inf 0 0 0.0 0 0.37 NA NA 0.13 NA NA

Following is a table of Kaplan-Meier estimates. Although it’s not clear from the table, the person censored (lost) at time 2 was at risk when the other person dies at time 2. On the following is a graph of the survival function.

mfit <- survfit(Surv(surv_mm/12, status == "Dead: cancer") ~ 1,
                data = colon_sample) # make Kaplan-Meier estimates
summary(mfit) # print Kaplan-Meier table
## Call: survfit(formula = Surv(surv_mm/12, status == "Dead: cancer") ~ 
##     1, data = colon_sample)
## 
##   time n.risk n.event survival std.err lower 95% CI upper 95% CI
##  0.167     35       1    0.971  0.0282        0.918        1.000
##  0.250     33       1    0.942  0.0398        0.867        1.000
##  0.417     32       1    0.913  0.0482        0.823        1.000
##  0.583     31       1    0.883  0.0549        0.782        0.998
##  0.667     30       1    0.854  0.0605        0.743        0.981
##  0.750     29       1    0.824  0.0652        0.706        0.962
##  0.917     28       1    0.795  0.0692        0.670        0.943
##  1.833     24       1    0.762  0.0738        0.630        0.921
##  2.250     22       1    0.727  0.0781        0.589        0.898
##  2.333     20       1    0.691  0.0823        0.547        0.872
##  2.667     19       2    0.618  0.0882        0.467        0.818
##  2.750     16       1    0.579  0.0908        0.426        0.788
##  3.583     13       1    0.535  0.0941        0.379        0.755
##  3.833     12       1    0.490  0.0962        0.334        0.720
##  8.500      4       1    0.368  0.1284        0.185        0.729
plot(mfit,    # plot Kaplan-Meier curve
     ylab="S(t)",
     xlab="Time since diagnosis (years)",
     main = "Kaplan−Meier estimates of cause−specific survival")

ggsurvplot(mfit, # plot Kaplan-Meier curve
     ylab="S(t)",
     xlab="Time since diagnosis (years)",
     main = "Kaplan−Meier estimates of cause−specific survival",
     risk.table = TRUE,
     conf.int = TRUE,
     ggtheme = theme_minimal())