Exercise 4. Localised melanoma: Comparing actuarial and Kaplan-Meier approaches with discrete time data

We load the dependencies:

Then load the data and define an indicator:

Then we show the results using the actuarial estimator with years:

##        tstart tstop nsubs nlost  nrisk nevent      surv
## 0-1         0     1  5318    81 5277.5     71 1.0000000
## 1-2         1     2  5166   400 4966.0    228 0.9865467
## 2-3         2     3  4538   381 4347.5    202 0.9412521
## 3-4         3     4  3955   344 3783.0    138 0.8975183
## 4-5         4     5  3473   312 3317.0    100 0.8647777
## 5-6         5     6  3061   298 2912.0     80 0.8387066
## 6-7         6     7  2683   267 2549.5     56 0.8156652
## 7-8         7     8  2360   293 2213.5     35 0.7977491
## 8-9         8     9  2032   275 1894.5     34 0.7851350
## 9-10        9    10  1723   243 1601.5     16 0.7710445
## 10-11      10    11  1464   197 1365.5     18 0.7633412
## 11-12      11    12  1249   189 1154.5     17 0.7532789
## 12-13      12    13  1043   161  962.5      2 0.7421869
## 13-14      13    14   880   186  787.0      4 0.7406447
## 14-15      14    15   690   153  613.5      3 0.7368803
## 15-16      15    16   534   110  479.0      2 0.7332769
## 16-17      16    17   422   111  366.5      5 0.7302152
## 17-18      17    18   306    97  257.5      1 0.7202532
## 18-19      18    19   208    81  167.5      1 0.7174561
## 19-20      19    20   126    65   93.5      0 0.7131728
## 20-Inf     20   Inf    61    61   30.5      0 0.7131728

Similarly, we use the actuarial estimator using months:

##         tstart tstop nsubs nlost  nrisk nevent      surv
## 109-110    109   110  1699    27 1685.5      1 0.7701209
## 110-111    110   111  1671    16 1663.0      1 0.7696640
## 111-112    111   112  1654    26 1641.0      1 0.7692012
## 112-113    112   113  1627    27 1613.5      1 0.7687325
## 113-114    113   114  1599    19 1589.5      0 0.7682560
## 114-115    114   115  1580    21 1569.5      0 0.7682560
## 115-116    115   116  1559    26 1546.0      1 0.7682560
## 116-117    116   117  1532    20 1522.0      2 0.7677591
## 117-118    117   118  1510    14 1503.0      1 0.7667502
## 118-119    118   119  1495    14 1488.0      4 0.7662401
## 119-120    119   120  1477    12 1471.0      1 0.7641803
## 120-121    120   121  1464    11 1458.5      1 0.7636608
## 121-122    121   122  1452     9 1447.5      4 0.7631372
## 122-123    122   123  1439    13 1432.5      2 0.7610284
## 123-124    123   124  1424    15 1416.5      4 0.7599658
## 124-125    124   125  1405    25 1392.5      0 0.7578198
## 125-126    125   126  1380    15 1372.5      0 0.7578198
## 126-127    126   127  1365    16 1357.0      2 0.7578198
## 127-128    127   128  1347    25 1334.5      2 0.7567029
## 128-129    128   129  1320    15 1312.5      0 0.7555688
## 129-130    129   130  1305    16 1297.0      1 0.7555688

Then the code using the Kaplan-Meier estimator with years:

## Call: survfit(formula = Surv(surv_yy, death_cancer) ~ 1, data = localised)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   0.5   5318      71    0.987 0.00157        0.984        0.990
##   1.5   5166     228    0.943 0.00320        0.937        0.949
##   2.5   4538     202    0.901 0.00420        0.893        0.909
##   3.5   3955     138    0.870 0.00483        0.860        0.879
##   4.5   3473     100    0.845 0.00530        0.834        0.855
##   5.5   3061      80    0.823 0.00571        0.811        0.834
##   6.5   2683      56    0.805 0.00603        0.794        0.817
##   7.5   2360      35    0.793 0.00627        0.781        0.806
##   8.5   2032      34    0.780 0.00657        0.767        0.793
##   9.5   1723      16    0.773 0.00675        0.760        0.786
##  10.5   1464      18    0.763 0.00703        0.750        0.777
##  11.5   1249      17    0.753 0.00737        0.739        0.768
##  12.5   1043       2    0.752 0.00743        0.737        0.766
##  13.5    880       4    0.748 0.00759        0.733        0.763
##  14.5    690       3    0.745 0.00779        0.730        0.760
##  15.5    534       2    0.742 0.00800        0.727        0.758
##  16.5    422       5    0.733 0.00882        0.716        0.751
##  17.5    306       1    0.731 0.00911        0.713        0.749
##  18.5    208       1    0.727 0.00972        0.709        0.747

And the Kaplan-Meier estimator with data in months:

## Call: survfit(formula = Surv(surv_mm, death_cancer) ~ 1, data = localised)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##   110   1671     948    0.770 0.00684        0.757        0.784
##   111   1654       1    0.770 0.00685        0.757        0.784
##   112   1627       1    0.770 0.00686        0.756        0.783
##   113   1599       1    0.769 0.00687        0.756        0.783
##   114   1580       0    0.769 0.00687        0.756        0.783
##   115   1559       0    0.769 0.00687        0.756        0.783
##   116   1532       1    0.769 0.00688        0.755        0.782
##   117   1510       2    0.768 0.00691        0.754        0.781
##   118   1495       1    0.767 0.00693        0.754        0.781
##   119   1477       4    0.765 0.00698        0.751        0.779
##   120   1464       1    0.764 0.00700        0.751        0.778
##   121   1452       1    0.764 0.00701        0.750        0.778
##   122   1439       4    0.762 0.00707        0.748        0.776
##   123   1424       2    0.761 0.00710        0.747        0.775
##   124   1405       4    0.759 0.00716        0.745        0.773
##   125   1380       0    0.759 0.00716        0.745        0.773
##   126   1365       0    0.759 0.00716        0.745        0.773
##   127   1347       2    0.758 0.00719        0.744        0.772
##   128   1320       2    0.756 0.00723        0.742        0.771
##   129   1305       0    0.756 0.00723        0.742        0.771
##   130   1288       1    0.756 0.00724        0.742        0.770

(a)

The actuarial method is most appropriate because it deals with ties (events and censorings at the same time) in a more appropriate manner. The fact that there are a reasonably large number of ties in these data means that there is a difference between the estimates.

(b)

The K-M estimate changes more. Because the actuarial method deals with ties in an appropriate manner it is not biased when data are heavily tied so is not heavily affected when we reduce the number of ties.

(c)

The plot clearly shows that the Kaplan-Meier estimator with the aggregated data is upwardly biased compared with the other curves.