Exercise 4. Localised melanoma: Comparing actuarial and Kaplan-Meier approaches with discrete time data
We load the dependencies:
Then load the data and define an indicator:
localised <- subset(biostat3::melanoma, stage=="Localised") |>
transform(death_cancer = ifelse(status == "Dead: cancer", 1, 0))
Then we show the results using the actuarial estimator with years:
tstart | tstop | nsubs | nlost | nrisk | nevent | surv | |
0-1 | 0 | 1 | 5318 | 81 | 5277.5 | 71 | 1.0000000 |
1-2 | 1 | 2 | 5166 | 400 | 4966.0 | 228 | 0.9865467 |
2-3 | 2 | 3 | 4538 | 381 | 4347.5 | 202 | 0.9412521 |
3-4 | 3 | 4 | 3955 | 344 | 3783.0 | 138 | 0.8975183 |
4-5 | 4 | 5 | 3473 | 312 | 3317.0 | 100 | 0.8647777 |
5-6 | 5 | 6 | 3061 | 298 | 2912.0 | 80 | 0.8387066 |
6-7 | 6 | 7 | 2683 | 267 | 2549.5 | 56 | 0.8156652 |
7-8 | 7 | 8 | 2360 | 293 | 2213.5 | 35 | 0.7977491 |
8-9 | 8 | 9 | 2032 | 275 | 1894.5 | 34 | 0.7851350 |
9-10 | 9 | 10 | 1723 | 243 | 1601.5 | 16 | 0.7710445 |
10-11 | 10 | 11 | 1464 | 197 | 1365.5 | 18 | 0.7633412 |
11-12 | 11 | 12 | 1249 | 189 | 1154.5 | 17 | 0.7532789 |
12-13 | 12 | 13 | 1043 | 161 | 962.5 | 2 | 0.7421869 |
13-14 | 13 | 14 | 880 | 186 | 787.0 | 4 | 0.7406447 |
14-15 | 14 | 15 | 690 | 153 | 613.5 | 3 | 0.7368803 |
15-16 | 15 | 16 | 534 | 110 | 479.0 | 2 | 0.7332769 |
16-17 | 16 | 17 | 422 | 111 | 366.5 | 5 | 0.7302152 |
17-18 | 17 | 18 | 306 | 97 | 257.5 | 1 | 0.7202532 |
18-19 | 18 | 19 | 208 | 81 | 167.5 | 1 | 0.7174561 |
19-20 | 19 | 20 | 126 | 65 | 93.5 | 0 | 0.7131728 |
20-Inf | 20 | Inf | 61 | 61 | 30.5 | 0 | 0.7131728 |
Similarly, we use the actuarial estimator using months:
tstart | tstop | nsubs | nlost | nrisk | nevent | surv | |
109-110 | 109 | 110 | 1699 | 27 | 1685.5 | 1 | 0.7701209 |
110-111 | 110 | 111 | 1671 | 16 | 1663.0 | 1 | 0.7696640 |
111-112 | 111 | 112 | 1654 | 26 | 1641.0 | 1 | 0.7692012 |
112-113 | 112 | 113 | 1627 | 27 | 1613.5 | 1 | 0.7687325 |
113-114 | 113 | 114 | 1599 | 19 | 1589.5 | 0 | 0.7682560 |
114-115 | 114 | 115 | 1580 | 21 | 1569.5 | 0 | 0.7682560 |
115-116 | 115 | 116 | 1559 | 26 | 1546.0 | 1 | 0.7682560 |
116-117 | 116 | 117 | 1532 | 20 | 1522.0 | 2 | 0.7677591 |
117-118 | 117 | 118 | 1510 | 14 | 1503.0 | 1 | 0.7667502 |
118-119 | 118 | 119 | 1495 | 14 | 1488.0 | 4 | 0.7662401 |
119-120 | 119 | 120 | 1477 | 12 | 1471.0 | 1 | 0.7641803 |
120-121 | 120 | 121 | 1464 | 11 | 1458.5 | 1 | 0.7636608 |
121-122 | 121 | 122 | 1452 | 9 | 1447.5 | 4 | 0.7631372 |
122-123 | 122 | 123 | 1439 | 13 | 1432.5 | 2 | 0.7610284 |
123-124 | 123 | 124 | 1424 | 15 | 1416.5 | 4 | 0.7599658 |
124-125 | 124 | 125 | 1405 | 25 | 1392.5 | 0 | 0.7578198 |
125-126 | 125 | 126 | 1380 | 15 | 1372.5 | 0 | 0.7578198 |
126-127 | 126 | 127 | 1365 | 16 | 1357.0 | 2 | 0.7578198 |
127-128 | 127 | 128 | 1347 | 25 | 1334.5 | 2 | 0.7567029 |
128-129 | 128 | 129 | 1320 | 15 | 1312.5 | 0 | 0.7555688 |
129-130 | 129 | 130 | 1305 | 16 | 1297.0 | 1 | 0.7555688 |
Then the code using the Kaplan-Meier estimator with years:
## Call: survfit(formula = Surv(surv_yy, death_cancer) ~ 1, data = localised)
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.5 5318 71 0.987 0.00157 0.984 0.990
## 1.5 5166 228 0.943 0.00320 0.937 0.949
## 2.5 4538 202 0.901 0.00420 0.893 0.909
## 3.5 3955 138 0.870 0.00483 0.860 0.879
## 4.5 3473 100 0.845 0.00530 0.834 0.855
## 5.5 3061 80 0.823 0.00571 0.811 0.834
## 6.5 2683 56 0.805 0.00603 0.794 0.817
## 7.5 2360 35 0.793 0.00627 0.781 0.806
## 8.5 2032 34 0.780 0.00657 0.767 0.793
## 9.5 1723 16 0.773 0.00675 0.760 0.786
## 10.5 1464 18 0.763 0.00703 0.750 0.777
## 11.5 1249 17 0.753 0.00737 0.739 0.768
## 12.5 1043 2 0.752 0.00743 0.737 0.766
## 13.5 880 4 0.748 0.00759 0.733 0.763
## 14.5 690 3 0.745 0.00779 0.730 0.760
## 15.5 534 2 0.742 0.00800 0.727 0.758
## 16.5 422 5 0.733 0.00882 0.716 0.751
## 17.5 306 1 0.731 0.00911 0.713 0.749
## 18.5 208 1 0.727 0.00972 0.709 0.747
And the Kaplan-Meier estimator with data in months:
mfit_months <- survfit(Surv(surv_mm/12, death_cancer) ~ 1, data = localised)
## Call: survfit(formula = Surv(surv_mm/12, death_cancer) ~ 1, data = localised)
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 9.17 1671 948 0.770 0.00684 0.757 0.784
## 9.25 1654 1 0.770 0.00685 0.757 0.784
## 9.33 1627 1 0.770 0.00686 0.756 0.783
## 9.42 1599 1 0.769 0.00687 0.756 0.783
## 9.50 1580 0 0.769 0.00687 0.756 0.783
## 9.58 1559 0 0.769 0.00687 0.756 0.783
## 9.67 1532 1 0.769 0.00688 0.755 0.782
## 9.75 1510 2 0.768 0.00691 0.754 0.781
## 9.83 1495 1 0.767 0.00693 0.754 0.781
## 9.92 1477 4 0.765 0.00698 0.751 0.779
## 10.00 1464 1 0.764 0.00700 0.751 0.778
## 10.08 1452 1 0.764 0.00701 0.750 0.778
## 10.17 1439 4 0.762 0.00707 0.748 0.776
## 10.25 1424 2 0.761 0.00710 0.747 0.775
## 10.33 1405 4 0.759 0.00716 0.745 0.773
## 10.42 1380 0 0.759 0.00716 0.745 0.773
## 10.50 1365 0 0.759 0.00716 0.745 0.773
## 10.58 1347 2 0.758 0.00719 0.744 0.772
## 10.67 1320 2 0.756 0.00723 0.742 0.771
## 10.75 1305 0 0.756 0.00723 0.742 0.771
## 10.83 1288 1 0.756 0.00724 0.742 0.770
The actuarial method is most appropriate because it deals with ties (events and censorings at the same time) in a more appropriate manner. The fact that there are a reasonably large number of ties in these data means that there is a difference between the estimates.
The K-M estimate changes more. Because the actuarial method deals with ties in an appropriate manner it is not biased when data are heavily tied so is not heavily affected when we reduce the number of ties.
The plot clearly shows that the Kaplan-Meier estimator with the aggregated data is upwardly biased compared with the other curves.
plot(mfit_months, conf.int=FALSE,
ylim=c(0.7,1), cex=5,
xlab="Time from cancer diagnosis (years)",
survfit(Surv(surv_yy, death_cancer) ~ 1, data = localised) |>
lines(col="red", conf.int=FALSE)
lifetab2(Surv(surv_mm/12,death_cancer)~1, data = localised) |>
lifetab2(Surv(floor(surv_yy),death_cancer)~1, data = localised) |>
legend=c("KM (months)", "KM (years)", "Actuarial (months)", "Actuarial (years)"),
col=c("black", "red", "orange", "green"),