Cox model for cause-specific mortality for melanoma (all stages)
Use Cox regression to model the cause-specific survival of patients with skin melanoma (including all stages).
You may have to install the required packages the first time you use
them. You can install a package by
install.packages("package_of_interest")
for each package
you require.
Load the melanoma data and explore it.
(a)
Without adjusting for potential confounders, we see that females have a 37% lower mortality rate than males.
# Cox regression with time-since-entry as the timescale
# Note that R uses the Efron method for approximating the likelihood in the presence of ties,
# whereas Stata (and some other software) uses the Breslow method
coxph(Surv(surv_mm, death_cancer) ~ sex, data=melanoma.l) |>
summary()
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex, data = melanoma.l)
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.46741 0.62662 0.04612 -10.13 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.6266 1.596 0.5725 0.6859
##
## Concordance= 0.562 (se = 0.006 )
## Likelihood ratio test= 103.7 on 1 df, p=<2e-16
## Wald test = 102.7 on 1 df, p=<2e-16
## Score (logrank) test = 104.6 on 1 df, p=<2e-16
(b)
After adjusting for a range of potential confounders, we see that the estimated difference in cancer-specific mortality between males and females has decreased slightly but there is still quite a large difference.
coxph(Surv(surv_mm, death_cancer) ~ sex + agegrp + stage + subsite +
year8594, data=melanoma.l) |>
summary()
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + agegrp +
## stage + subsite + year8594, data = melanoma.l)
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.28919 0.74887 0.04865 -5.944 2.78e-09 ***
## agegrp45-59 0.23835 1.26915 0.06745 3.534 0.00041 ***
## agegrp60-74 0.54980 1.73291 0.06510 8.445 < 2e-16 ***
## agegrp75+ 1.02773 2.79472 0.07641 13.451 < 2e-16 ***
## stageLocalised 0.03773 1.03845 0.06869 0.549 0.58287
## stageRegional 1.56895 4.80158 0.09144 17.158 < 2e-16 ***
## stageDistant 2.62087 13.74770 0.08137 32.209 < 2e-16 ***
## subsiteTrunk 0.33488 1.39777 0.07066 4.740 2.14e-06 ***
## subsiteLimbs 0.03379 1.03437 0.07436 0.454 0.64951
## subsiteMultiple and NOS 0.27000 1.30997 0.10230 2.639 0.00831 **
## year8594Diagnosed 85-94 -0.24037 0.78633 0.04791 -5.017 5.24e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.7489 1.33535 0.6808 0.8238
## agegrp45-59 1.2692 0.78793 1.1120 1.4485
## agegrp60-74 1.7329 0.57706 1.5253 1.9688
## agegrp75+ 2.7947 0.35782 2.4060 3.2462
## stageLocalised 1.0384 0.96298 0.9076 1.1881
## stageRegional 4.8016 0.20826 4.0137 5.7441
## stageDistant 13.7477 0.07274 11.7211 16.1248
## subsiteTrunk 1.3978 0.71543 1.2170 1.6054
## subsiteLimbs 1.0344 0.96677 0.8941 1.1967
## subsiteMultiple and NOS 1.3100 0.76338 1.0720 1.6008
## year8594Diagnosed 85-94 0.7863 1.27172 0.7159 0.8637
##
## Concordance= 0.755 (se = 0.006 )
## Likelihood ratio test= 1854 on 11 df, p=<2e-16
## Wald test = 2579 on 11 df, p=<2e-16
## Score (logrank) test = 4112 on 11 df, p=<2e-16
(c)
Let’s first estimate the effect of gender for each age group without adjusting for confounders.
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ agegrp + agegrp:sex,
## data = melanoma.l)
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## agegrp45-59 0.18031 1.19759 0.08501 2.121 0.033927 *
## agegrp60-74 0.40466 1.49879 0.08462 4.782 1.74e-06 ***
## agegrp75+ 0.84551 2.32916 0.10341 8.176 2.92e-16 ***
## agegrp0-44:sexFemale -0.78242 0.45730 0.10444 -7.491 6.81e-14 ***
## agegrp45-59:sexFemale -0.59421 0.55200 0.09133 -6.506 7.72e-11 ***
## agegrp60-74:sexFemale -0.33880 0.71263 0.07935 -4.270 1.96e-05 ***
## agegrp75+:sexFemale -0.39470 0.67388 0.10569 -3.734 0.000188 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## agegrp45-59 1.1976 0.8350 1.0138 1.4147
## agegrp60-74 1.4988 0.6672 1.2697 1.7692
## agegrp75+ 2.3292 0.4293 1.9019 2.8525
## agegrp0-44:sexFemale 0.4573 2.1868 0.3726 0.5612
## agegrp45-59:sexFemale 0.5520 1.8116 0.4615 0.6602
## agegrp60-74:sexFemale 0.7126 1.4033 0.6100 0.8325
## agegrp75+:sexFemale 0.6739 1.4839 0.5478 0.8290
##
## Concordance= 0.622 (se = 0.006 )
## Likelihood ratio test= 332.6 on 7 df, p=<2e-16
## Wald test = 307.7 on 7 df, p=<2e-16
## Score (logrank) test = 333.9 on 7 df, p=<2e-16
## Analysis of Deviance Table
## Cox model: response is Surv(surv_mm, death_cancer)
## Terms added sequentially (first to last)
##
## loglik Chisq Df Pr(>|Chi|)
## NULL -16390
## agegrp -16291 198.762 3 < 2.2e-16 ***
## sex -16231 120.248 1 < 2.2e-16 ***
## agegrp:sex -16224 13.607 3 0.003492 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coxph(Surv(surv_mm, death_cancer) ~ stage + subsite + year8594 + agegrp * sex,
data=melanoma.l) |> anova()
## Analysis of Deviance Table
## Cox model: response is Surv(surv_mm, death_cancer)
## Terms added sequentially (first to last)
##
## loglik Chisq Df Pr(>|Chi|)
## NULL -16390
## stage -15602 1576.8515 3 < 2.2e-16 ***
## subsite -15578 47.9085 3 2.227e-10 ***
## year8594 -15572 11.7253 1 0.0006166 ***
## agegrp -15481 182.4024 3 < 2.2e-16 ***
## sex -15463 35.6084 1 2.412e-09 ***
## agegrp:sex -15461 4.5821 3 0.2050873
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that there is some evidence that the survival advantage experienced by females depends on age. The hazard ratio for males/females in the youngest age group is 0.46, while in the highest age group the hazard ratio is 0.68. There is evidence that the hazard ratios for gender differ across the age groups (p=0.0035). However, after adjusting for stage, subsite, and period there is no longer evidence of an interaction.
There is not strong evidence in support of the hypothesis (although some may consider that there is weak evidence).
(d)
After having fitted a main effects model we can check the proportional hazards assumption by fitting a regression line through the model-based Schoenfeld residulas and check if the slope is statistically different from zero.
There is strong evidence that the proportional hazard assumption is not satisfied for the effect of stage. Unless our primary interest is in the stage effect we can fit a stratified Cox model where we stratify on stage (i.e. estimate a separate baseline hazard function for each stage group).
cox4 <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp + subsite +
stage, data=melanoma.l)
summary(cox4)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 +
## agegrp + subsite + stage, data = melanoma.l)
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.28919 0.74887 0.04865 -5.944 2.78e-09 ***
## year8594Diagnosed 85-94 -0.24037 0.78633 0.04791 -5.017 5.24e-07 ***
## agegrp45-59 0.23835 1.26915 0.06745 3.534 0.00041 ***
## agegrp60-74 0.54980 1.73291 0.06510 8.445 < 2e-16 ***
## agegrp75+ 1.02773 2.79472 0.07641 13.451 < 2e-16 ***
## subsiteTrunk 0.33488 1.39777 0.07066 4.740 2.14e-06 ***
## subsiteLimbs 0.03379 1.03437 0.07436 0.454 0.64951
## subsiteMultiple and NOS 0.27000 1.30997 0.10230 2.639 0.00831 **
## stageLocalised 0.03773 1.03845 0.06869 0.549 0.58287
## stageRegional 1.56895 4.80158 0.09144 17.158 < 2e-16 ***
## stageDistant 2.62087 13.74770 0.08137 32.209 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.7489 1.33535 0.6808 0.8238
## year8594Diagnosed 85-94 0.7863 1.27172 0.7159 0.8637
## agegrp45-59 1.2692 0.78793 1.1120 1.4485
## agegrp60-74 1.7329 0.57706 1.5253 1.9688
## agegrp75+ 2.7947 0.35782 2.4060 3.2462
## subsiteTrunk 1.3978 0.71543 1.2170 1.6054
## subsiteLimbs 1.0344 0.96677 0.8941 1.1967
## subsiteMultiple and NOS 1.3100 0.76338 1.0720 1.6008
## stageLocalised 1.0384 0.96298 0.9076 1.1881
## stageRegional 4.8016 0.20826 4.0137 5.7441
## stageDistant 13.7477 0.07274 11.7211 16.1248
##
## Concordance= 0.755 (se = 0.006 )
## Likelihood ratio test= 1854 on 11 df, p=<2e-16
## Wald test = 2579 on 11 df, p=<2e-16
## Score (logrank) test = 4112 on 11 df, p=<2e-16
## chisq df p
## sex 2.194 1 0.14
## year8594 0.423 1 0.52
## agegrp 3.077 3 0.38
## subsite 41.775 3 4.5e-09
## stage 161.040 3 < 2e-16
## GLOBAL 169.371 11 < 2e-16
## chisq df p
## sex 0.526 1 0.47
## year8594 1.185 1 0.28
## agegrp 2.043 3 0.56
## subsite 64.648 3 6e-14
## stage 185.921 3 <2e-16
## GLOBAL 201.516 11 <2e-16
## chisq df p
## sex 4.480 1 0.0343
## year8594 0.208 1 0.6483
## agegrp 3.951 3 0.2668
## subsite 19.681 3 0.0002
## stage 97.393 3 <2e-16
## GLOBAL 104.673 11 <2e-16
If we re-do a test for non-proportional hazards we find that there is no longer evidence that any of the remaining covariates effects seem to depend on time since diagnosis.
Having accounted for the time-dependent effect of stage, there is still no evidence that the effect of sex is modified by age at diagnosis.
## Stratified Cox model; separate baseline hazard functions are fit for each strata.
cox5 <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp + subsite +
strata(stage), data=melanoma.l)
summary(cox5)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 +
## agegrp + subsite + strata(stage), data = melanoma.l)
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.299867 0.740917 0.048742 -6.152 7.65e-10 ***
## year8594Diagnosed 85-94 -0.238915 0.787482 0.047838 -4.994 5.90e-07 ***
## agegrp45-59 0.233856 1.263462 0.067463 3.466 0.000527 ***
## agegrp60-74 0.552148 1.736981 0.065125 8.478 < 2e-16 ***
## agegrp75+ 1.017338 2.765823 0.076435 13.310 < 2e-16 ***
## subsiteTrunk 0.293094 1.340569 0.070578 4.153 3.28e-05 ***
## subsiteLimbs -0.003252 0.996753 0.074215 -0.044 0.965045
## subsiteMultiple and NOS 0.228109 1.256222 0.102565 2.224 0.026146 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.7409 1.3497 0.6734 0.8152
## year8594Diagnosed 85-94 0.7875 1.2699 0.7170 0.8649
## agegrp45-59 1.2635 0.7915 1.1070 1.4421
## agegrp60-74 1.7370 0.5757 1.5288 1.9735
## agegrp75+ 2.7658 0.3616 2.3810 3.2128
## subsiteTrunk 1.3406 0.7460 1.1674 1.5394
## subsiteLimbs 0.9968 1.0033 0.8618 1.1528
## subsiteMultiple and NOS 1.2562 0.7960 1.0275 1.5359
##
## Concordance= 0.645 (se = 0.008 )
## Likelihood ratio test= 276 on 8 df, p=<2e-16
## Wald test = 279.6 on 8 df, p=<2e-16
## Score (logrank) test = 286.5 on 8 df, p=<2e-16
## chisq df p
## sex 0.0587 1 0.81
## year8594 0.8514 1 0.36
## agegrp 1.4552 3 0.69
## subsite 6.2138 3 0.10
## GLOBAL 8.0234 8 0.43
## chisq df p
## sex 0.351 1 0.5537
## year8594 1.920 1 0.1659
## agegrp 0.484 3 0.9224
## subsite 11.966 3 0.0075
## GLOBAL 13.820 8 0.0866
## chisq df p
## sex 1.456 1 0.23
## year8594 0.424 1 0.51
## agegrp 2.646 3 0.45
## subsite 2.216 3 0.53
## GLOBAL 6.918 8 0.55
If you have time make sure you check for additional interaction terms between the remaining covariates, i.e. between age at diagnosis and stage.
cox5 <- coxph(Surv(surv_mm, death_cancer) ~ sex * agegrp + year8594 + agegrp +
subsite + strata(stage), data=melanoma.l)
summary(cox5)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex * agegrp +
## year8594 + agegrp + subsite + strata(stage), data = melanoma.l)
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.492146 0.611313 0.105915 -4.647 3.37e-06 ***
## agegrp45-59 0.158472 1.171719 0.085336 1.857 0.0633 .
## agegrp60-74 0.439206 1.551475 0.084962 5.169 2.35e-07 ***
## agegrp75+ 0.898995 2.457132 0.104912 8.569 < 2e-16 ***
## year8594Diagnosed 85-94 -0.238206 0.788040 0.047827 -4.981 6.34e-07 ***
## subsiteTrunk 0.299616 1.349341 0.070684 4.239 2.25e-05 ***
## subsiteLimbs 0.003931 1.003938 0.074275 0.053 0.9578
## subsiteMultiple and NOS 0.236333 1.266596 0.102775 2.300 0.0215 *
## sexFemale:agegrp45-59 0.182131 1.199772 0.138904 1.311 0.1898
## sexFemale:agegrp60-74 0.269005 1.308662 0.131729 2.042 0.0411 *
## sexFemale:agegrp75+ 0.265538 1.304133 0.149245 1.779 0.0752 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.6113 1.6358 0.4967 0.7523
## agegrp45-59 1.1717 0.8534 0.9913 1.3850
## agegrp60-74 1.5515 0.6445 1.3135 1.8326
## agegrp75+ 2.4571 0.4070 2.0004 3.0181
## year8594Diagnosed 85-94 0.7880 1.2690 0.7175 0.8655
## subsiteTrunk 1.3493 0.7411 1.1748 1.5498
## subsiteLimbs 1.0039 0.9961 0.8679 1.1613
## subsiteMultiple and NOS 1.2666 0.7895 1.0355 1.5492
## sexFemale:agegrp45-59 1.1998 0.8335 0.9138 1.5752
## sexFemale:agegrp60-74 1.3087 0.7641 1.0109 1.6942
## sexFemale:agegrp75+ 1.3041 0.7668 0.9734 1.7473
##
## Concordance= 0.645 (se = 0.008 )
## Likelihood ratio test= 280.8 on 11 df, p=<2e-16
## Wald test = 275.6 on 11 df, p=<2e-16
## Score (logrank) test = 287.5 on 11 df, p=<2e-16
## Analysis of Deviance Table
## Cox model: response is Surv(surv_mm, death_cancer)
## Terms added sequentially (first to last)
##
## loglik Chisq Df Pr(>|Chi|)
## NULL -13520
## sex -13501 36.687 1 1.387e-09 ***
## agegrp -13411 180.387 3 < 2.2e-16 ***
## year8594 -13400 22.864 1 1.739e-06 ***
## subsite -13382 36.051 3 7.306e-08 ***
## sex:agegrp -13379 4.813 3 0.186
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1