Cox model for cause-specific mortality for melanoma (all stages)

Use Cox regression to model the cause-specific survival of patients with skin melanoma (including all stages).


You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

library(biostat3) 

Load the melanoma data and explore it.

melanoma.l <- transform(biostat3::melanoma,
                        death_cancer = ifelse( status == "Dead: cancer", 1, 0))

(a)

Without adjusting for potential confounders, we see that females have a 37% lower mortality rate than males.

# Cox regression with time-since-entry as the timescale
# Note that R uses the Efron method for approximating the likelihood in the presence of ties,
# whereas Stata (and some other software) uses the Breslow method
coxph(Surv(surv_mm, death_cancer) ~ sex, data=melanoma.l) |>
    summary()
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex, data = melanoma.l)
## 
##   n= 7775, number of events= 1913 
## 
##               coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale -0.46741   0.62662  0.04612 -10.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## sexFemale    0.6266      1.596    0.5725    0.6859
## 
## Concordance= 0.562  (se = 0.006 )
## Likelihood ratio test= 103.7  on 1 df,   p=<2e-16
## Wald test            = 102.7  on 1 df,   p=<2e-16
## Score (logrank) test = 104.6  on 1 df,   p=<2e-16

(b)

After adjusting for a range of potential confounders, we see that the estimated difference in cancer-specific mortality between males and females has decreased slightly but there is still quite a large difference.

coxph(Surv(surv_mm, death_cancer) ~ sex + agegrp + stage + subsite +
          year8594, data=melanoma.l) |>
    summary()
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + agegrp + 
##     stage + subsite + year8594, data = melanoma.l)
## 
##   n= 7775, number of events= 1913 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.28919   0.74887  0.04865 -5.944 2.78e-09 ***
## agegrp45-59              0.23835   1.26915  0.06745  3.534  0.00041 ***
## agegrp60-74              0.54980   1.73291  0.06510  8.445  < 2e-16 ***
## agegrp75+                1.02773   2.79472  0.07641 13.451  < 2e-16 ***
## stageLocalised           0.03773   1.03845  0.06869  0.549  0.58287    
## stageRegional            1.56895   4.80158  0.09144 17.158  < 2e-16 ***
## stageDistant             2.62087  13.74770  0.08137 32.209  < 2e-16 ***
## subsiteTrunk             0.33488   1.39777  0.07066  4.740 2.14e-06 ***
## subsiteLimbs             0.03379   1.03437  0.07436  0.454  0.64951    
## subsiteMultiple and NOS  0.27000   1.30997  0.10230  2.639  0.00831 ** 
## year8594Diagnosed 85-94 -0.24037   0.78633  0.04791 -5.017 5.24e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.7489    1.33535    0.6808    0.8238
## agegrp45-59                1.2692    0.78793    1.1120    1.4485
## agegrp60-74                1.7329    0.57706    1.5253    1.9688
## agegrp75+                  2.7947    0.35782    2.4060    3.2462
## stageLocalised             1.0384    0.96298    0.9076    1.1881
## stageRegional              4.8016    0.20826    4.0137    5.7441
## stageDistant              13.7477    0.07274   11.7211   16.1248
## subsiteTrunk               1.3978    0.71543    1.2170    1.6054
## subsiteLimbs               1.0344    0.96677    0.8941    1.1967
## subsiteMultiple and NOS    1.3100    0.76338    1.0720    1.6008
## year8594Diagnosed 85-94    0.7863    1.27172    0.7159    0.8637
## 
## Concordance= 0.755  (se = 0.006 )
## Likelihood ratio test= 1854  on 11 df,   p=<2e-16
## Wald test            = 2579  on 11 df,   p=<2e-16
## Score (logrank) test = 4112  on 11 df,   p=<2e-16

(c)

Let’s first estimate the effect of gender for each age group without adjusting for confounders.

coxph(Surv(surv_mm, death_cancer) ~ agegrp + agegrp : sex, data=melanoma.l) |> summary()
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ agegrp + agegrp:sex, 
##     data = melanoma.l)
## 
##   n= 7775, number of events= 1913 
## 
##                           coef exp(coef) se(coef)      z Pr(>|z|)    
## agegrp45-59            0.18031   1.19759  0.08501  2.121 0.033927 *  
## agegrp60-74            0.40466   1.49879  0.08462  4.782 1.74e-06 ***
## agegrp75+              0.84551   2.32916  0.10341  8.176 2.92e-16 ***
## agegrp0-44:sexFemale  -0.78242   0.45730  0.10444 -7.491 6.81e-14 ***
## agegrp45-59:sexFemale -0.59421   0.55200  0.09133 -6.506 7.72e-11 ***
## agegrp60-74:sexFemale -0.33880   0.71263  0.07935 -4.270 1.96e-05 ***
## agegrp75+:sexFemale   -0.39470   0.67388  0.10569 -3.734 0.000188 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                       exp(coef) exp(-coef) lower .95 upper .95
## agegrp45-59              1.1976     0.8350    1.0138    1.4147
## agegrp60-74              1.4988     0.6672    1.2697    1.7692
## agegrp75+                2.3292     0.4293    1.9019    2.8525
## agegrp0-44:sexFemale     0.4573     2.1868    0.3726    0.5612
## agegrp45-59:sexFemale    0.5520     1.8116    0.4615    0.6602
## agegrp60-74:sexFemale    0.7126     1.4033    0.6100    0.8325
## agegrp75+:sexFemale      0.6739     1.4839    0.5478    0.8290
## 
## Concordance= 0.622  (se = 0.006 )
## Likelihood ratio test= 332.6  on 7 df,   p=<2e-16
## Wald test            = 307.7  on 7 df,   p=<2e-16
## Score (logrank) test = 333.9  on 7 df,   p=<2e-16
coxph(Surv(surv_mm, death_cancer) ~ agegrp * sex, data=melanoma.l) |> anova()
## Analysis of Deviance Table
##  Cox model: response is Surv(surv_mm, death_cancer)
## Terms added sequentially (first to last)
## 
##            loglik   Chisq Df Pr(>|Chi|)    
## NULL       -16390                          
## agegrp     -16291 198.762  3  < 2.2e-16 ***
## sex        -16231 120.248  1  < 2.2e-16 ***
## agegrp:sex -16224  13.607  3   0.003492 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coxph(Surv(surv_mm, death_cancer) ~ stage + subsite + year8594 + agegrp * sex,
      data=melanoma.l) |> anova()
## Analysis of Deviance Table
##  Cox model: response is Surv(surv_mm, death_cancer)
## Terms added sequentially (first to last)
## 
##            loglik     Chisq Df Pr(>|Chi|)    
## NULL       -16390                            
## stage      -15602 1576.8515  3  < 2.2e-16 ***
## subsite    -15578   47.9085  3  2.227e-10 ***
## year8594   -15572   11.7253  1  0.0006166 ***
## agegrp     -15481  182.4024  3  < 2.2e-16 ***
## sex        -15463   35.6084  1  2.412e-09 ***
## agegrp:sex -15461    4.5821  3  0.2050873    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that there is some evidence that the survival advantage experienced by females depends on age. The hazard ratio for males/females in the youngest age group is 0.46, while in the highest age group the hazard ratio is 0.68. There is evidence that the hazard ratios for gender differ across the age groups (p=0.0035). However, after adjusting for stage, subsite, and period there is no longer evidence of an interaction.

There is not strong evidence in support of the hypothesis (although some may consider that there is weak evidence).

(d)

After having fitted a main effects model we can check the proportional hazards assumption by fitting a regression line through the model-based Schoenfeld residulas and check if the slope is statistically different from zero.

There is strong evidence that the proportional hazard assumption is not satisfied for the effect of stage. Unless our primary interest is in the stage effect we can fit a stratified Cox model where we stratify on stage (i.e. estimate a separate baseline hazard function for each stage group).

cox4 <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp + subsite +
                  stage, data=melanoma.l)
summary(cox4)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 + 
##     agegrp + subsite + stage, data = melanoma.l)
## 
##   n= 7775, number of events= 1913 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.28919   0.74887  0.04865 -5.944 2.78e-09 ***
## year8594Diagnosed 85-94 -0.24037   0.78633  0.04791 -5.017 5.24e-07 ***
## agegrp45-59              0.23835   1.26915  0.06745  3.534  0.00041 ***
## agegrp60-74              0.54980   1.73291  0.06510  8.445  < 2e-16 ***
## agegrp75+                1.02773   2.79472  0.07641 13.451  < 2e-16 ***
## subsiteTrunk             0.33488   1.39777  0.07066  4.740 2.14e-06 ***
## subsiteLimbs             0.03379   1.03437  0.07436  0.454  0.64951    
## subsiteMultiple and NOS  0.27000   1.30997  0.10230  2.639  0.00831 ** 
## stageLocalised           0.03773   1.03845  0.06869  0.549  0.58287    
## stageRegional            1.56895   4.80158  0.09144 17.158  < 2e-16 ***
## stageDistant             2.62087  13.74770  0.08137 32.209  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.7489    1.33535    0.6808    0.8238
## year8594Diagnosed 85-94    0.7863    1.27172    0.7159    0.8637
## agegrp45-59                1.2692    0.78793    1.1120    1.4485
## agegrp60-74                1.7329    0.57706    1.5253    1.9688
## agegrp75+                  2.7947    0.35782    2.4060    3.2462
## subsiteTrunk               1.3978    0.71543    1.2170    1.6054
## subsiteLimbs               1.0344    0.96677    0.8941    1.1967
## subsiteMultiple and NOS    1.3100    0.76338    1.0720    1.6008
## stageLocalised             1.0384    0.96298    0.9076    1.1881
## stageRegional              4.8016    0.20826    4.0137    5.7441
## stageDistant              13.7477    0.07274   11.7211   16.1248
## 
## Concordance= 0.755  (se = 0.006 )
## Likelihood ratio test= 1854  on 11 df,   p=<2e-16
## Wald test            = 2579  on 11 df,   p=<2e-16
## Score (logrank) test = 4112  on 11 df,   p=<2e-16
## Test proportional hazards assumption
cox.zph(cox4) 
##            chisq df       p
## sex        2.194  1    0.14
## year8594   0.423  1    0.52
## agegrp     3.077  3    0.38
## subsite   41.775  3 4.5e-09
## stage    161.040  3 < 2e-16
## GLOBAL   169.371 11 < 2e-16
cox.zph(cox4, transform="log") 
##            chisq df      p
## sex        0.526  1   0.47
## year8594   1.185  1   0.28
## agegrp     2.043  3   0.56
## subsite   64.648  3  6e-14
## stage    185.921  3 <2e-16
## GLOBAL   201.516 11 <2e-16
cox.zph(cox4, transform="identity") # Stata default
##            chisq df      p
## sex        4.480  1 0.0343
## year8594   0.208  1 0.6483
## agegrp     3.951  3 0.2668
## subsite   19.681  3 0.0002
## stage     97.393  3 <2e-16
## GLOBAL   104.673 11 <2e-16

If we re-do a test for non-proportional hazards we find that there is no longer evidence that any of the remaining covariates effects seem to depend on time since diagnosis.

Having accounted for the time-dependent effect of stage, there is still no evidence that the effect of sex is modified by age at diagnosis.

## Stratified Cox model; separate baseline hazard functions are fit for each strata.
cox5 <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp + subsite +
                  strata(stage), data=melanoma.l)
summary(cox5)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 + 
##     agegrp + subsite + strata(stage), data = melanoma.l)
## 
##   n= 7775, number of events= 1913 
## 
##                              coef exp(coef)  se(coef)      z Pr(>|z|)    
## sexFemale               -0.299867  0.740917  0.048742 -6.152 7.65e-10 ***
## year8594Diagnosed 85-94 -0.238915  0.787482  0.047838 -4.994 5.90e-07 ***
## agegrp45-59              0.233856  1.263462  0.067463  3.466 0.000527 ***
## agegrp60-74              0.552148  1.736981  0.065125  8.478  < 2e-16 ***
## agegrp75+                1.017338  2.765823  0.076435 13.310  < 2e-16 ***
## subsiteTrunk             0.293094  1.340569  0.070578  4.153 3.28e-05 ***
## subsiteLimbs            -0.003252  0.996753  0.074215 -0.044 0.965045    
## subsiteMultiple and NOS  0.228109  1.256222  0.102565  2.224 0.026146 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.7409     1.3497    0.6734    0.8152
## year8594Diagnosed 85-94    0.7875     1.2699    0.7170    0.8649
## agegrp45-59                1.2635     0.7915    1.1070    1.4421
## agegrp60-74                1.7370     0.5757    1.5288    1.9735
## agegrp75+                  2.7658     0.3616    2.3810    3.2128
## subsiteTrunk               1.3406     0.7460    1.1674    1.5394
## subsiteLimbs               0.9968     1.0033    0.8618    1.1528
## subsiteMultiple and NOS    1.2562     0.7960    1.0275    1.5359
## 
## Concordance= 0.645  (se = 0.008 )
## Likelihood ratio test= 276  on 8 df,   p=<2e-16
## Wald test            = 279.6  on 8 df,   p=<2e-16
## Score (logrank) test = 286.5  on 8 df,   p=<2e-16
## Test proportional hazards assumption
cox.zph(cox5) 
##           chisq df    p
## sex      0.0587  1 0.81
## year8594 0.8514  1 0.36
## agegrp   1.4552  3 0.69
## subsite  6.2138  3 0.10
## GLOBAL   8.0234  8 0.43
cox.zph(cox5, transform="log") 
##           chisq df      p
## sex       0.351  1 0.5537
## year8594  1.920  1 0.1659
## agegrp    0.484  3 0.9224
## subsite  11.966  3 0.0075
## GLOBAL   13.820  8 0.0866
cox.zph(cox5, transform="identity") 
##          chisq df    p
## sex      1.456  1 0.23
## year8594 0.424  1 0.51
## agegrp   2.646  3 0.45
## subsite  2.216  3 0.53
## GLOBAL   6.918  8 0.55

If you have time make sure you check for additional interaction terms between the remaining covariates, i.e. between age at diagnosis and stage.

cox5 <- coxph(Surv(surv_mm, death_cancer) ~ sex * agegrp + year8594 + agegrp +
                  subsite + strata(stage), data=melanoma.l)
summary(cox5)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex * agegrp + 
##     year8594 + agegrp + subsite + strata(stage), data = melanoma.l)
## 
##   n= 7775, number of events= 1913 
## 
##                              coef exp(coef)  se(coef)      z Pr(>|z|)    
## sexFemale               -0.492146  0.611313  0.105915 -4.647 3.37e-06 ***
## agegrp45-59              0.158472  1.171719  0.085336  1.857   0.0633 .  
## agegrp60-74              0.439206  1.551475  0.084962  5.169 2.35e-07 ***
## agegrp75+                0.898995  2.457132  0.104912  8.569  < 2e-16 ***
## year8594Diagnosed 85-94 -0.238206  0.788040  0.047827 -4.981 6.34e-07 ***
## subsiteTrunk             0.299616  1.349341  0.070684  4.239 2.25e-05 ***
## subsiteLimbs             0.003931  1.003938  0.074275  0.053   0.9578    
## subsiteMultiple and NOS  0.236333  1.266596  0.102775  2.300   0.0215 *  
## sexFemale:agegrp45-59    0.182131  1.199772  0.138904  1.311   0.1898    
## sexFemale:agegrp60-74    0.269005  1.308662  0.131729  2.042   0.0411 *  
## sexFemale:agegrp75+      0.265538  1.304133  0.149245  1.779   0.0752 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.6113     1.6358    0.4967    0.7523
## agegrp45-59                1.1717     0.8534    0.9913    1.3850
## agegrp60-74                1.5515     0.6445    1.3135    1.8326
## agegrp75+                  2.4571     0.4070    2.0004    3.0181
## year8594Diagnosed 85-94    0.7880     1.2690    0.7175    0.8655
## subsiteTrunk               1.3493     0.7411    1.1748    1.5498
## subsiteLimbs               1.0039     0.9961    0.8679    1.1613
## subsiteMultiple and NOS    1.2666     0.7895    1.0355    1.5492
## sexFemale:agegrp45-59      1.1998     0.8335    0.9138    1.5752
## sexFemale:agegrp60-74      1.3087     0.7641    1.0109    1.6942
## sexFemale:agegrp75+        1.3041     0.7668    0.9734    1.7473
## 
## Concordance= 0.645  (se = 0.008 )
## Likelihood ratio test= 280.8  on 11 df,   p=<2e-16
## Wald test            = 275.6  on 11 df,   p=<2e-16
## Score (logrank) test = 287.5  on 11 df,   p=<2e-16
anova(cox5)
## Analysis of Deviance Table
##  Cox model: response is Surv(surv_mm, death_cancer)
## Terms added sequentially (first to last)
## 
##            loglik   Chisq Df Pr(>|Chi|)    
## NULL       -13520                          
## sex        -13501  36.687  1  1.387e-09 ***
## agegrp     -13411 180.387  3  < 2.2e-16 ***
## year8594   -13400  22.864  1  1.739e-06 ***
## subsite    -13382  36.051  3  7.306e-08 ***
## sex:agegrp -13379   4.813  3      0.186    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1