Exercise 11. Cox regression with all-cause mortality as the outcome

Now fit a model to the localised melanoma data where the outcome is observed survival (i.e. all deaths are considered to be events).

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

library(biostat3) # 
library(dplyr)    # for data manipulation

Load the melanoma data and explore it.

## Read melanoma data
## and select subcohorts
melanoma.l <- biostat3::melanoma |>
    subset(stage=="Localised") |>
    transform(
        ## Create a death indicator
        death_cancer = as.numeric(status=="Dead: cancer"),
        death_any = as.numeric(status=="Dead: cancer" | status=="Dead: other") )

## Truncate follow-up time

melanoma.l2 <-
  transform(melanoma.l,
            ## Create new death indicators (only count deaths within 120 months)
            death_cancer = death_cancer * as.numeric( surv_mm <= 120),
            death_any = death_any * as.numeric( surv_mm <= 120),
            ## Create a new time variable
            surv_mm = pmin(surv_mm, 120))

(a)

For patients of the same sex diagnosed in the same period, those aged 60–74 at diagnosis have a 2.94 times higher risk of death due to any causes than those aged 0-–44 at diagnosis. The 95% confidence interval is (2.52, 3.44). This difference is statistically significant.

The regression equation is \[\begin{align*} h(t|\text{year8594},\text{sex},\text{agegrp}) &= h_0(t) \exp(\beta_1 I(\text{sex}=\text{"Female"})+\beta_2 I(\text{year8594}=\text{"Diagnosed 85-94"})+\\ &\qquad\beta_3 I(\text{agegrp}=\text{"45-59"})+\beta_4 I(\text{agegrp}=\text{"60-64"}) + \beta_5 I(\text{agegrp}=\text{"75+"}) \end{align*}\] where \(h(t|\text{year8594},\text{sex},\text{agegrp},\text{agegrp})\) is the hazard at time \(t\) given covariates \(\text{year8594}\), \(\text{sex}\) and \(\text{agegrp}\) and \(\text{agegrp}\), with baseline hazard \(h_0(t)\) and regression coefficients representing log hazard ratios for \(\beta_1\) for the calendar period 1985–1994, \(\beta_2\) for females, with log hazard ratios at time 0 for those aged 45–59 years, 60–74 years and 75 years and over are \(\beta_3\), \(\beta_4\) and \(\beta_5\), respectively.

From the regression equation, a formula for the hazard ratio comparing a male aged 60–74 years with a female aged 75 years and over is \[\begin{align*} \frac{h(t|\text{year8594},\text{sex}=\text{"Male"},\text{agegrp}=\text{"60-74"})}{h(t|\text{year8594},\text{sex}=\text{"Female"},\text{agegrp}=\text{"75+"})} &= \frac{h_0(t) \exp(\beta_2 I(\text{year8594}=\text{"Diagnosed 85-94"})+ \beta_4)}{h_0(t)\exp(\beta_1}+\beta_2 I(\text{year8594}=\text{"Diagnosed 85-94"}+\beta_5))} \\ &= \exp(\beta_4 - \beta_1 - \beta_5) \end{align*}\]

summary(coxfit11a <- coxph(Surv(surv_mm, death_any) ~ sex + year8594 + agegrp,
                           data = melanoma.l2))
## Call:
## coxph(formula = Surv(surv_mm, death_any) ~ sex + year8594 + agegrp, 
##     data = melanoma.l2)
## 
##   n= 5318, number of events= 1580 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.49550   0.60927  0.05099 -9.718  < 2e-16 ***
## year8594Diagnosed 85-94 -0.28430   0.75254  0.05189 -5.479 4.29e-08 ***
## agegrp45-59              0.40796   1.50375  0.08700  4.689 2.74e-06 ***
## agegrp60-74              1.07955   2.94335  0.07991 13.510  < 2e-16 ***
## agegrp75+                2.13825   8.48456  0.08266 25.867  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.6093     1.6413    0.5513    0.6733
## year8594Diagnosed 85-94    0.7525     1.3288    0.6798    0.8331
## agegrp45-59                1.5037     0.6650    1.2680    1.7833
## agegrp60-74                2.9434     0.3397    2.5167    3.4424
## agegrp75+                  8.4846     0.1179    7.2155    9.9769
## 
## Concordance= 0.709  (se = 0.006 )
## Likelihood ratio test= 895.5  on 5 df,   p=<2e-16
## Wald test            = 943.1  on 5 df,   p=<2e-16
## Score (logrank) test = 1130  on 5 df,   p=<2e-16

(b)

Note that the previous model estimated cause-specific hazard ratios whereas the current model estimates all-cause hazard ratios. The estimated hazard ratios for sex and period are similar, whereas the estimated hazard ratios for age are markedly different. This is because non-cancer mortality is heavily dependent on age, but only lightly dependent on sex and calendar period.

summary( coxfit11b <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp,
                           data = melanoma.l2) )
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 + 
##     agegrp, data = melanoma.l2)
## 
##   n= 5318, number of events= 960 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.53061   0.58825  0.06545 -8.107 5.19e-16 ***
## year8594Diagnosed 85-94 -0.33339   0.71649  0.06618 -5.037 4.72e-07 ***
## agegrp45-59              0.28283   1.32688  0.09417  3.003  0.00267 ** 
## agegrp60-74              0.62006   1.85904  0.09088  6.823 8.90e-12 ***
## agegrp75+                1.21801   3.38045  0.10443 11.663  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.5882     1.7000    0.5174    0.6688
## year8594Diagnosed 85-94    0.7165     1.3957    0.6293    0.8157
## agegrp45-59                1.3269     0.7536    1.1032    1.5959
## agegrp60-74                1.8590     0.5379    1.5557    2.2215
## agegrp75+                  3.3804     0.2958    2.7547    4.1483
## 
## Concordance= 0.646  (se = 0.009 )
## Likelihood ratio test= 212.7  on 5 df,   p=<2e-16
## Wald test            = 217.9  on 5 df,   p=<2e-16
## Score (logrank) test = 226.8  on 5 df,   p=<2e-16