Biostat III exercises in R

Laboratory exercise 11

Suggested solutions by

Author: Johan Zetterqvist, 2015-03-04

Cox regression with all-cause mortality as the outcome


Now fit a model to the localised melanoma data where the outcome is observed survival (i.e. all deaths are considered to be events).

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

require(survival) # for Surv and survfit
require(dplyr)    # for data manipulation
require(foreign)  # for reading data set from Stata

Load the melanoma data and explore it.

## Read melanoma data
## and select subcohorts
melanoma.l <-

  tbl_df( read.dta("http://biostat3.net/download/melanoma.dta") ) %>%

  filter(stage=="Localised") %>%

  mutate(
    ## Create a death indicator
    death_cancer = as.numeric(status=="Dead: cancer"),
    death_any = as.numeric(status=="Dead: cancer" | status=="Dead: other") )


## Truncate follow-up time

melanoma.l2 <-

  mutate(melanoma.l,
         ## Create new death indicators (only count deaths within 120 months)
         death_cancer = death_cancer * as.numeric( surv_mm <= 120),
         death_any = death_any * as.numeric( surv_mm <= 120),
         ## Create a new time variable
         surv_mm = pmin(surv_mm, 120))

(a) Interpret the estimated hazard ratio for the parameter labelled 2.agegrp, including a comment on statistical significance.

summary( coxfit11a <- coxph(Surv(surv_mm, death_any) ~ sex + year8594 + agegrp,
                           data = melanoma.l2,
                           ties = "breslow") )
## Call:
## coxph(formula = Surv(surv_mm, death_any) ~ sex + year8594 + agegrp, 
##     data = melanoma.l2, ties = "breslow")
## 
##   n= 5318, number of events= 1580 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.49401   0.61017  0.05098 -9.690  < 2e-16 ***
## year8594Diagnosed 85-94 -0.28368   0.75301  0.05189 -5.467 4.59e-08 ***
## agegrp45-59              0.40742   1.50294  0.08700  4.683 2.82e-06 ***
## agegrp60-74              1.07766   2.93781  0.07991 13.486  < 2e-16 ***
## agegrp75+                2.13148   8.42736  0.08266 25.785  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.6102     1.6389    0.5521    0.6743
## year8594Diagnosed 85-94    0.7530     1.3280    0.6802    0.8336
## agegrp45-59                1.5029     0.6654    1.2673    1.7823
## agegrp60-74                2.9378     0.3404    2.5119    3.4359
## agegrp75+                  8.4274     0.1187    7.1669    9.9096
## 
## Concordance= 0.709  (se = 0.008 )
## Rsquare= 0.154   (max possible= 0.992 )
## Likelihood ratio test= 890.4  on 5 df,   p=0
## Wald test            = 937  on 5 df,   p=0
## Score (logrank) test = 1121  on 5 df,   p=0

(b) On comparing the estimates between the observed and cause-specific survival models it appears that only the parameters for age have changed substantially. Can you explain why the estimates for the effect of age would be expected to change more than the estimates of the effect of sex and period?

summary( coxfit11b <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp,
                           data = melanoma.l2,
                           ties = "breslow") )
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 + 
##     agegrp, data = melanoma.l2, ties = "breslow")
## 
##   n= 5318, number of events= 960 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.52964   0.58881  0.06545 -8.092 5.55e-16 ***
## year8594Diagnosed 85-94 -0.33284   0.71688  0.06618 -5.029 4.93e-07 ***
## agegrp45-59              0.28247   1.32640  0.09417  2.999   0.0027 ** 
## agegrp60-74              0.61914   1.85732  0.09088  6.813 9.56e-12 ***
## agegrp75+                1.21570   3.37265  0.10444 11.641  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.5888     1.6983    0.5179    0.6694
## year8594Diagnosed 85-94    0.7169     1.3949    0.6297    0.8162
## agegrp45-59                1.3264     0.7539    1.1028    1.5953
## agegrp60-74                1.8573     0.5384    1.5543    2.2194
## agegrp75+                  3.3727     0.2965    2.7484    4.1387
## 
## Concordance= 0.646  (se = 0.01 )
## Rsquare= 0.039   (max possible= 0.949 )
## Likelihood ratio test= 211.9  on 5 df,   p=0
## Wald test            = 217.1  on 5 df,   p=0
## Score (logrank) test = 225.9  on 5 df,   p=0