Biostatistics III in R

Exercise 4. Localised melanoma: Comparing actuarial and Kaplan-Meier approaches with discrete time data

We load the dependencies:

library(biostat3) 
library(knitr) # kable

Then load the data and define an indicator:

localised <- subset(biostat3::melanoma, stage=="Localised") |>
    transform(death_cancer = ifelse(status == "Dead: cancer", 1, 0))

Then we show the results using the actuarial estimator with years:

lifetab2(Surv(floor(surv_yy),death_cancer)~1, data = localised)[,1:7] |>
    kable("html")

	tstart	tstop	nsubs	nlost	nrisk	nevent	surv
0-1	0	1	5318	81	5277.5	71	1.0000000
1-2	1	2	5166	400	4966.0	228	0.9865467
2-3	2	3	4538	381	4347.5	202	0.9412521
3-4	3	4	3955	344	3783.0	138	0.8975183
4-5	4	5	3473	312	3317.0	100	0.8647777
5-6	5	6	3061	298	2912.0	80	0.8387066
6-7	6	7	2683	267	2549.5	56	0.8156652
7-8	7	8	2360	293	2213.5	35	0.7977491
8-9	8	9	2032	275	1894.5	34	0.7851350
9-10	9	10	1723	243	1601.5	16	0.7710445
10-11	10	11	1464	197	1365.5	18	0.7633412
11-12	11	12	1249	189	1154.5	17	0.7532789
12-13	12	13	1043	161	962.5	2	0.7421869
13-14	13	14	880	186	787.0	4	0.7406447
14-15	14	15	690	153	613.5	3	0.7368803
15-16	15	16	534	110	479.0	2	0.7332769
16-17	16	17	422	111	366.5	5	0.7302152
17-18	17	18	306	97	257.5	1	0.7202532
18-19	18	19	208	81	167.5	1	0.7174561
19-20	19	20	126	65	93.5	0	0.7131728
20-Inf	20	Inf	61	61	30.5	0	0.7131728

Similarly, we use the actuarial estimator using months:

lifetab2(Surv(floor(surv_mm),death_cancer)~1, data = localised)[110:130,1:7] |>
    kable("html")

	tstart	tstop	nsubs	nlost	nrisk	nevent	surv
109-110	109	110	1699	27	1685.5	1	0.7701209
110-111	110	111	1671	16	1663.0	1	0.7696640
111-112	111	112	1654	26	1641.0	1	0.7692012
112-113	112	113	1627	27	1613.5	1	0.7687325
113-114	113	114	1599	19	1589.5	0	0.7682560
114-115	114	115	1580	21	1569.5	0	0.7682560
115-116	115	116	1559	26	1546.0	1	0.7682560
116-117	116	117	1532	20	1522.0	2	0.7677591
117-118	117	118	1510	14	1503.0	1	0.7667502
118-119	118	119	1495	14	1488.0	4	0.7662401
119-120	119	120	1477	12	1471.0	1	0.7641803
120-121	120	121	1464	11	1458.5	1	0.7636608
121-122	121	122	1452	9	1447.5	4	0.7631372
122-123	122	123	1439	13	1432.5	2	0.7610284
123-124	123	124	1424	15	1416.5	4	0.7599658
124-125	124	125	1405	25	1392.5	0	0.7578198
125-126	125	126	1380	15	1372.5	0	0.7578198
126-127	126	127	1365	16	1357.0	2	0.7578198
127-128	127	128	1347	25	1334.5	2	0.7567029
128-129	128	129	1320	15	1312.5	0	0.7555688
129-130	129	130	1305	16	1297.0	1	0.7555688

Then the code using the Kaplan-Meier estimator with years:

mfit_years <- survfit(Surv(surv_yy, death_cancer) ~ 1, data = localised)
summary(mfit_years)

## Call: survfit(formula = Surv(surv_yy, death_cancer) ~ 1, data = localised)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.5   5318      71    0.987 0.00157        0.984        0.990
##   1.5   5166     228    0.943 0.00320        0.937        0.949
##   2.5   4538     202    0.901 0.00420        0.893        0.909
##   3.5   3955     138    0.870 0.00483        0.860        0.879
##   4.5   3473     100    0.845 0.00530        0.834        0.855
##   5.5   3061      80    0.823 0.00571        0.811        0.834
##   6.5   2683      56    0.805 0.00603        0.794        0.817
##   7.5   2360      35    0.793 0.00627        0.781        0.806
##   8.5   2032      34    0.780 0.00657        0.767        0.793
##   9.5   1723      16    0.773 0.00675        0.760        0.786
##  10.5   1464      18    0.763 0.00703        0.750        0.777
##  11.5   1249      17    0.753 0.00737        0.739        0.768
##  12.5   1043       2    0.752 0.00743        0.737        0.766
##  13.5    880       4    0.748 0.00759        0.733        0.763
##  14.5    690       3    0.745 0.00779        0.730        0.760
##  15.5    534       2    0.742 0.00800        0.727        0.758
##  16.5    422       5    0.733 0.00882        0.716        0.751
##  17.5    306       1    0.731 0.00911        0.713        0.749
##  18.5    208       1    0.727 0.00972        0.709        0.747

And the Kaplan-Meier estimator with data in months:

mfit_months <- survfit(Surv(surv_mm/12, death_cancer) ~ 1, data = localised)
summary(mfit_months,times=(110:130)/12)

## Call: survfit(formula = Surv(surv_mm/12, death_cancer) ~ 1, data = localised)
## 
##   time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   9.17   1671     948    0.770 0.00684        0.757        0.784
##   9.25   1654       1    0.770 0.00685        0.757        0.784
##   9.33   1627       1    0.770 0.00686        0.756        0.783
##   9.42   1599       1    0.769 0.00687        0.756        0.783
##   9.50   1580       0    0.769 0.00687        0.756        0.783
##   9.58   1559       0    0.769 0.00687        0.756        0.783
##   9.67   1532       1    0.769 0.00688        0.755        0.782
##   9.75   1510       2    0.768 0.00691        0.754        0.781
##   9.83   1495       1    0.767 0.00693        0.754        0.781
##   9.92   1477       4    0.765 0.00698        0.751        0.779
##  10.00   1464       1    0.764 0.00700        0.751        0.778
##  10.08   1452       1    0.764 0.00701        0.750        0.778
##  10.17   1439       4    0.762 0.00707        0.748        0.776
##  10.25   1424       2    0.761 0.00710        0.747        0.775
##  10.33   1405       4    0.759 0.00716        0.745        0.773
##  10.42   1380       0    0.759 0.00716        0.745        0.773
##  10.50   1365       0    0.759 0.00716        0.745        0.773
##  10.58   1347       2    0.758 0.00719        0.744        0.772
##  10.67   1320       2    0.756 0.00723        0.742        0.771
##  10.75   1305       0    0.756 0.00723        0.742        0.771
##  10.83   1288       1    0.756 0.00724        0.742        0.770

(a)

The actuarial method is most appropriate because it deals with ties (events and censorings at the same time) in a more appropriate manner. The fact that there are a reasonably large number of ties in these data means that there is a difference between the estimates.

(b)

The K-M estimate changes more. Because the actuarial method deals with ties in an appropriate manner it is not biased when data are heavily tied so is not heavily affected when we reduce the number of ties.

(c)

The plot clearly shows that the Kaplan-Meier estimator with the aggregated data is upwardly biased compared with the other curves.

plot(mfit_months, conf.int=FALSE,
     ylim=c(0.7,1), cex=5,
     xlab="Time from cancer diagnosis (years)",
     ylab="Survival")
survfit(Surv(surv_yy, death_cancer) ~ 1, data = localised) |>
    lines(col="red", conf.int=FALSE)
lifetab2(Surv(surv_mm/12,death_cancer)~1, data = localised) |>
    lines(col="orange")
lifetab2(Surv(floor(surv_yy),death_cancer)~1, data = localised) |>
    lines(col="green")
legend("topright",
       legend=c("KM (months)", "KM (years)", "Actuarial (months)", "Actuarial (years)"),
       lty=1,
       col=c("black", "red", "orange", "green"),
       bty="n")

Biostatistics III in R

Code by Annika Tillander, Andreas Karlsson and Mark Clements

Exercise 4. Localised melanoma: Comparing actuarial and Kaplan-Meier approaches with discrete time data

(a)

(b)

(c)