Exercise 4. Localised melanoma: Comparing actuarial and Kaplan-Meier approaches with discrete time data
We load the dependencies:
Then load the data and define an indicator:
data(melanoma)
localised <- subset(melanoma, stage=="Localised")
localised <- transform(localised,
death_cancer = ifelse(status == "Dead: cancer", 1, 0))
Then we show the results using the actuarial estimator with years:
## tstart tstop nsubs nlost nrisk nevent surv
## 0-1 0 1 5318 81 5277.5 71 1.0000000
## 1-2 1 2 5166 400 4966.0 228 0.9865467
## 2-3 2 3 4538 381 4347.5 202 0.9412521
## 3-4 3 4 3955 344 3783.0 138 0.8975183
## 4-5 4 5 3473 312 3317.0 100 0.8647777
## 5-6 5 6 3061 298 2912.0 80 0.8387066
## 6-7 6 7 2683 267 2549.5 56 0.8156652
## 7-8 7 8 2360 293 2213.5 35 0.7977491
## 8-9 8 9 2032 275 1894.5 34 0.7851350
## 9-10 9 10 1723 243 1601.5 16 0.7710445
## 10-11 10 11 1464 197 1365.5 18 0.7633412
## 11-12 11 12 1249 189 1154.5 17 0.7532789
## 12-13 12 13 1043 161 962.5 2 0.7421869
## 13-14 13 14 880 186 787.0 4 0.7406447
## 14-15 14 15 690 153 613.5 3 0.7368803
## 15-16 15 16 534 110 479.0 2 0.7332769
## 16-17 16 17 422 111 366.5 5 0.7302152
## 17-18 17 18 306 97 257.5 1 0.7202532
## 18-19 18 19 208 81 167.5 1 0.7174561
## 19-20 19 20 126 65 93.5 0 0.7131728
## 20-Inf 20 Inf 61 61 30.5 0 0.7131728
Similarly, we use the actuarial estimator using months:
## tstart tstop nsubs nlost nrisk nevent surv
## 109-110 109 110 1699 27 1685.5 1 0.7701209
## 110-111 110 111 1671 16 1663.0 1 0.7696640
## 111-112 111 112 1654 26 1641.0 1 0.7692012
## 112-113 112 113 1627 27 1613.5 1 0.7687325
## 113-114 113 114 1599 19 1589.5 0 0.7682560
## 114-115 114 115 1580 21 1569.5 0 0.7682560
## 115-116 115 116 1559 26 1546.0 1 0.7682560
## 116-117 116 117 1532 20 1522.0 2 0.7677591
## 117-118 117 118 1510 14 1503.0 1 0.7667502
## 118-119 118 119 1495 14 1488.0 4 0.7662401
## 119-120 119 120 1477 12 1471.0 1 0.7641803
## 120-121 120 121 1464 11 1458.5 1 0.7636608
## 121-122 121 122 1452 9 1447.5 4 0.7631372
## 122-123 122 123 1439 13 1432.5 2 0.7610284
## 123-124 123 124 1424 15 1416.5 4 0.7599658
## 124-125 124 125 1405 25 1392.5 0 0.7578198
## 125-126 125 126 1380 15 1372.5 0 0.7578198
## 126-127 126 127 1365 16 1357.0 2 0.7578198
## 127-128 127 128 1347 25 1334.5 2 0.7567029
## 128-129 128 129 1320 15 1312.5 0 0.7555688
## 129-130 129 130 1305 16 1297.0 1 0.7555688
Then the code using the Kaplan-Meier estimator with years:
## Call: survfit(formula = Surv(surv_yy, death_cancer) ~ 1, data = localised)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.5 5318 71 0.987 0.00157 0.984 0.990
## 1.5 5166 228 0.943 0.00320 0.937 0.949
## 2.5 4538 202 0.901 0.00420 0.893 0.909
## 3.5 3955 138 0.870 0.00483 0.860 0.879
## 4.5 3473 100 0.845 0.00530 0.834 0.855
## 5.5 3061 80 0.823 0.00571 0.811 0.834
## 6.5 2683 56 0.805 0.00603 0.794 0.817
## 7.5 2360 35 0.793 0.00627 0.781 0.806
## 8.5 2032 34 0.780 0.00657 0.767 0.793
## 9.5 1723 16 0.773 0.00675 0.760 0.786
## 10.5 1464 18 0.763 0.00703 0.750 0.777
## 11.5 1249 17 0.753 0.00737 0.739 0.768
## 12.5 1043 2 0.752 0.00743 0.737 0.766
## 13.5 880 4 0.748 0.00759 0.733 0.763
## 14.5 690 3 0.745 0.00779 0.730 0.760
## 15.5 534 2 0.742 0.00800 0.727 0.758
## 16.5 422 5 0.733 0.00882 0.716 0.751
## 17.5 306 1 0.731 0.00911 0.713 0.749
## 18.5 208 1 0.727 0.00972 0.709 0.747
And the Kaplan-Meier estimator with data in months:
mfit_months <- survfit(Surv(surv_mm, death_cancer) ~ 1, data = localised)
summary(mfit_months,times=110:130)
## Call: survfit(formula = Surv(surv_mm, death_cancer) ~ 1, data = localised)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 110 1671 948 0.770 0.00684 0.757 0.784
## 111 1654 1 0.770 0.00685 0.757 0.784
## 112 1627 1 0.770 0.00686 0.756 0.783
## 113 1599 1 0.769 0.00687 0.756 0.783
## 114 1580 0 0.769 0.00687 0.756 0.783
## 115 1559 0 0.769 0.00687 0.756 0.783
## 116 1532 1 0.769 0.00688 0.755 0.782
## 117 1510 2 0.768 0.00691 0.754 0.781
## 118 1495 1 0.767 0.00693 0.754 0.781
## 119 1477 4 0.765 0.00698 0.751 0.779
## 120 1464 1 0.764 0.00700 0.751 0.778
## 121 1452 1 0.764 0.00701 0.750 0.778
## 122 1439 4 0.762 0.00707 0.748 0.776
## 123 1424 2 0.761 0.00710 0.747 0.775
## 124 1405 4 0.759 0.00716 0.745 0.773
## 125 1380 0 0.759 0.00716 0.745 0.773
## 126 1365 0 0.759 0.00716 0.745 0.773
## 127 1347 2 0.758 0.00719 0.744 0.772
## 128 1320 2 0.756 0.00723 0.742 0.771
## 129 1305 0 0.756 0.00723 0.742 0.771
## 130 1288 1 0.756 0.00724 0.742 0.770
(a)
The actuarial method is most appropriate because it deals with ties (events and censorings at the same time) in a more appropriate manner. The fact that there are a reasonably large number of ties in these data means that there is a difference between the estimates.
(b)
The K-M estimate changes more. Because the actuarial method deals with ties in an appropriate manner it is not biased when data are heavily tied so is not heavily affected when we reduce the number of ties.
(c)
The plot clearly shows that the Kaplan-Meier estimator with the aggregated data is upwardly biased compared with the other curves.
plot(mfit_months, conf.int=FALSE,
ylim=c(0.7,1), cex=5,
xlab="Time from cancer diagnosis (months)",
ylab="Survival")
lty <- lifetab2(Surv(floor(surv_yy)*12,death_cancer)~1, data = localised)
ltm <- lifetab2(Surv(floor(surv_mm),death_cancer)~1, data = localised)
mfit_years <- survfit(Surv(surv_yy*12, death_cancer) ~ 1, data = localised)
lines(lty, col="green")
lines(ltm, col="orange")
lines(mfit_years, col="red", conf.int=FALSE)
legend("topright",
legend=c("KM (months)", "KM (years)", "Actuarial (months)", "Actuarial (years)"),
lty=1,
col=c("black", "red", "orange", "green"),
bty="n")