Exercise 4. Localised melanoma: Comparing actuarial and Kaplan-Meier approaches with discrete time data

We load the dependencies:

library(biostat3) 
library(knitr) # kable

Then load the data and define an indicator:

localised <- subset(biostat3::melanoma, stage=="Localised") |>
    transform(death_cancer = ifelse(status == "Dead: cancer", 1, 0))

Then we show the results using the actuarial estimator with years:

lifetab2(Surv(floor(surv_yy),death_cancer)~1, data = localised)[,1:7] |>
    kable("html")
tstart tstop nsubs nlost nrisk nevent surv
0-1 0 1 5318 81 5277.5 71 1.0000000
1-2 1 2 5166 400 4966.0 228 0.9865467
2-3 2 3 4538 381 4347.5 202 0.9412521
3-4 3 4 3955 344 3783.0 138 0.8975183
4-5 4 5 3473 312 3317.0 100 0.8647777
5-6 5 6 3061 298 2912.0 80 0.8387066
6-7 6 7 2683 267 2549.5 56 0.8156652
7-8 7 8 2360 293 2213.5 35 0.7977491
8-9 8 9 2032 275 1894.5 34 0.7851350
9-10 9 10 1723 243 1601.5 16 0.7710445
10-11 10 11 1464 197 1365.5 18 0.7633412
11-12 11 12 1249 189 1154.5 17 0.7532789
12-13 12 13 1043 161 962.5 2 0.7421869
13-14 13 14 880 186 787.0 4 0.7406447
14-15 14 15 690 153 613.5 3 0.7368803
15-16 15 16 534 110 479.0 2 0.7332769
16-17 16 17 422 111 366.5 5 0.7302152
17-18 17 18 306 97 257.5 1 0.7202532
18-19 18 19 208 81 167.5 1 0.7174561
19-20 19 20 126 65 93.5 0 0.7131728
20-Inf 20 Inf 61 61 30.5 0 0.7131728

Similarly, we use the actuarial estimator using months:

lifetab2(Surv(floor(surv_mm),death_cancer)~1, data = localised)[110:130,1:7] |>
    kable("html")
tstart tstop nsubs nlost nrisk nevent surv
109-110 109 110 1699 27 1685.5 1 0.7701209
110-111 110 111 1671 16 1663.0 1 0.7696640
111-112 111 112 1654 26 1641.0 1 0.7692012
112-113 112 113 1627 27 1613.5 1 0.7687325
113-114 113 114 1599 19 1589.5 0 0.7682560
114-115 114 115 1580 21 1569.5 0 0.7682560
115-116 115 116 1559 26 1546.0 1 0.7682560
116-117 116 117 1532 20 1522.0 2 0.7677591
117-118 117 118 1510 14 1503.0 1 0.7667502
118-119 118 119 1495 14 1488.0 4 0.7662401
119-120 119 120 1477 12 1471.0 1 0.7641803
120-121 120 121 1464 11 1458.5 1 0.7636608
121-122 121 122 1452 9 1447.5 4 0.7631372
122-123 122 123 1439 13 1432.5 2 0.7610284
123-124 123 124 1424 15 1416.5 4 0.7599658
124-125 124 125 1405 25 1392.5 0 0.7578198
125-126 125 126 1380 15 1372.5 0 0.7578198
126-127 126 127 1365 16 1357.0 2 0.7578198
127-128 127 128 1347 25 1334.5 2 0.7567029
128-129 128 129 1320 15 1312.5 0 0.7555688
129-130 129 130 1305 16 1297.0 1 0.7555688

Then the code using the Kaplan-Meier estimator with years:

mfit_years <- survfit(Surv(surv_yy, death_cancer) ~ 1, data = localised)
summary(mfit_years)
## Call: survfit(formula = Surv(surv_yy, death_cancer) ~ 1, data = localised)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.5   5318      71    0.987 0.00157        0.984        0.990
##   1.5   5166     228    0.943 0.00320        0.937        0.949
##   2.5   4538     202    0.901 0.00420        0.893        0.909
##   3.5   3955     138    0.870 0.00483        0.860        0.879
##   4.5   3473     100    0.845 0.00530        0.834        0.855
##   5.5   3061      80    0.823 0.00571        0.811        0.834
##   6.5   2683      56    0.805 0.00603        0.794        0.817
##   7.5   2360      35    0.793 0.00627        0.781        0.806
##   8.5   2032      34    0.780 0.00657        0.767        0.793
##   9.5   1723      16    0.773 0.00675        0.760        0.786
##  10.5   1464      18    0.763 0.00703        0.750        0.777
##  11.5   1249      17    0.753 0.00737        0.739        0.768
##  12.5   1043       2    0.752 0.00743        0.737        0.766
##  13.5    880       4    0.748 0.00759        0.733        0.763
##  14.5    690       3    0.745 0.00779        0.730        0.760
##  15.5    534       2    0.742 0.00800        0.727        0.758
##  16.5    422       5    0.733 0.00882        0.716        0.751
##  17.5    306       1    0.731 0.00911        0.713        0.749
##  18.5    208       1    0.727 0.00972        0.709        0.747

And the Kaplan-Meier estimator with data in months:

mfit_months <- survfit(Surv(surv_mm/12, death_cancer) ~ 1, data = localised)
summary(mfit_months,times=(110:130)/12)
## Call: survfit(formula = Surv(surv_mm/12, death_cancer) ~ 1, data = localised)
## 
##   time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   9.17   1671     948    0.770 0.00684        0.757        0.784
##   9.25   1654       1    0.770 0.00685        0.757        0.784
##   9.33   1627       1    0.770 0.00686        0.756        0.783
##   9.42   1599       1    0.769 0.00687        0.756        0.783
##   9.50   1580       0    0.769 0.00687        0.756        0.783
##   9.58   1559       0    0.769 0.00687        0.756        0.783
##   9.67   1532       1    0.769 0.00688        0.755        0.782
##   9.75   1510       2    0.768 0.00691        0.754        0.781
##   9.83   1495       1    0.767 0.00693        0.754        0.781
##   9.92   1477       4    0.765 0.00698        0.751        0.779
##  10.00   1464       1    0.764 0.00700        0.751        0.778
##  10.08   1452       1    0.764 0.00701        0.750        0.778
##  10.17   1439       4    0.762 0.00707        0.748        0.776
##  10.25   1424       2    0.761 0.00710        0.747        0.775
##  10.33   1405       4    0.759 0.00716        0.745        0.773
##  10.42   1380       0    0.759 0.00716        0.745        0.773
##  10.50   1365       0    0.759 0.00716        0.745        0.773
##  10.58   1347       2    0.758 0.00719        0.744        0.772
##  10.67   1320       2    0.756 0.00723        0.742        0.771
##  10.75   1305       0    0.756 0.00723        0.742        0.771
##  10.83   1288       1    0.756 0.00724        0.742        0.770

(a)

The actuarial method is most appropriate because it deals with ties (events and censorings at the same time) in a more appropriate manner. The fact that there are a reasonably large number of ties in these data means that there is a difference between the estimates.

(b)

The K-M estimate changes more. Because the actuarial method deals with ties in an appropriate manner it is not biased when data are heavily tied so is not heavily affected when we reduce the number of ties.

(c)

The plot clearly shows that the Kaplan-Meier estimator with the aggregated data is upwardly biased compared with the other curves.

plot(mfit_months, conf.int=FALSE,
     ylim=c(0.7,1), cex=5,
     xlab="Time from cancer diagnosis (years)",
     ylab="Survival")
survfit(Surv(surv_yy, death_cancer) ~ 1, data = localised) |>
    lines(col="red", conf.int=FALSE)
lifetab2(Surv(surv_mm/12,death_cancer)~1, data = localised) |>
    lines(col="orange")
lifetab2(Surv(floor(surv_yy),death_cancer)~1, data = localised) |>
    lines(col="green")
legend("topright",
       legend=c("KM (months)", "KM (years)", "Actuarial (months)", "Actuarial (years)"),
       lty=1,
       col=c("black", "red", "orange", "green"),
       bty="n")