Exercise 9. Localised melanoma: modelling cause-specific mortality using Cox regression

In exercise 7 we modelled the cause-specific mortality of patients diagnosed with localised melanoma using Poisson regression. We will now model cause-specific mortality using Cox regression and compare the results to those we obtained using the Poisson regression model.

To fit a Cox proportional hazards model (for cause-specific survival) with calendar period as the only explanatory variable, the following commands can be used. Note that we are censoring all survival times at 120 months (10 years) in order to facilitate comparisons with the Poisson regression model in exercise 7.


You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

(a)

Interpret the estimated hazard ratio, including a comment on statistical significance. Write the regression equation. Draw a graph showing the shape of the fitted hazard rates for the two calendar periods, also indicating the distance between the curves (HINT: draw on the log hazard scale if you find it easier). Compare this to how you drew the graph in exercise 7h.

(b)

(This part is more theoretical and is not required in order to understand the remaining parts.)

R reports a Wald test of the null hypothesis that survival is independent of calendar period. The test statistic (and associated P-value), is reported in the table of parameter estimates. Under the null hypothesis, the test statistic has a standard normal (Z) distribution, so the square of the test statistic will have a chi square distribution with one degree of freedom.

R also reports a likelihood ratio test statistic of the null hypothesis that none of the parameters in the model are associated with survival. In general, this test statistic will have a chi-square distribution with degrees of freedom equal to the number of parameters in the model. For the current model, with only one parameter, the test statistic has a chi square distribution with one degree of freedom.

Compare these two test statistics with each other and with the log rank test statistic (which also has a chi square distribution) calculated in question 3c (you should, however, recalculate the log rank test since we have restricted follow-up to the first 10 years in this exercise). Would you expect these test statistics to be similar? Consider the null and alternative hypotheses of each test and the assumptions involved with each test.

(c)

Now include sex and age (in categories) in the model. Write out the regression equation.

  1. Interpret the estimated hazard ratio for the parameter labelled agegrp45-59, including a comment on its statistical significance.
  2. Is the effect of calendar period strongly confounded by age and sex? That is, does the inclusion of sex and age in the model change the estimate for the effect of calendar period?
  3. Perform a Wald test of the overall effect of age and interpret the results.

(d)

Perform a likelihood ratio test of the overall effect of age and interpret the results.

Compare your findings to those obtained using the Wald test. Are the findings similar? Would you expect them to be similar?

(e)

The model estimated in question 9c is similar to the model estimated in question 7i.

  1. Both models adjust for sex, year8594, and i.agegrp but the Poisson regression model in question 7i appears to adjust for an additional variable (fu). Is the Poisson regression model adjusting for an additional factor? Explain.
  2. Would you expect the parameter estimate for sex, period, and age to be similar for the two models? Are they similar?
  3. Do both models assume proportional hazards? Explain.

(g)

ADVANCED: Split the data finely (e.g., 3-month intervals) and model the effect of time using a restricted cubic spline with Poisson regression. Also plot the rate and survival.

## split and collapse
cuts.splines <- seq(0,max(sfit9f$time),by=3)
mid.splines <- cuts.splines + 1.5
melanoma.spl4 <-
    survSplit(Surv(surv_mm,death_cancer)~., data=melanoma.l2, cut=cuts.splines,
              start="tstart", end="tstop") %>%
    mutate(cut=cut(tstop,cuts.splines),
           mid=mid.splines[unclass(cut)]) %>%
    group_by(mid,year8594,sex,agegrp) %>%
    summarise(pt=sum(tstop-tstart), death_cancer=sum(death_cancer))

poisson9g <- glm( death_cancer ~ ns(mid,df=3) + year8594 + sex + agegrp + offset( log(pt) ),
                 family = poisson,
                 data = melanoma.spl4 )
summary(poisson9g)
eform(poisson9g)

## quick approach: use the effects package
library(effects)
plot(Effect("mid", poisson9g))

## utility function to draw a confidence interval
polygon.ci <- function(time, interval, col="lightgrey") 
    polygon(c(time,rev(time)), c(interval[,1],rev(interval[,2])), col=col, border=col)

## define exposures
times <- seq(0,max(cuts.splines),length=1000)
delta <- diff(times)[1]
newdata <- data.frame(mid=times+delta/2, year8594="Diagnosed 85-94",
                      sex="Male", agegrp="45-59",
                      pt=1)
## predict rates and 95% CI
pred <- predict(poisson9g, newdata=newdata, se.fit=TRUE)
predrate <- exp(pred$fit)
ci <- with(pred, exp(cbind(fit-1.96*se.fit, fit+1.96*se.fit)))
## plot
matplot(newdata$mid, ci, type="n", xlab="Time since diagnosis (months)",
        ylab="Rate", main="Males aged 45-59 years diagnosed 1985-94")
polygon.ci(newdata$mid, ci) 
lines(newdata$mid, predrate)

## predict survival and 95% CI
library(rstpm2)
logcumhaz <- rstpm2::predictnl(poisson9g,
          fun=function(fit,newdata) log(cumsum(delta*predict(fit, newdata, type="response"))),
          newdata=newdata)
surv <- exp(-exp(logcumhaz$Estimate))
ci <- with(logcumhaz, exp(-exp(cbind(Estimate-1.96*SE, Estimate+1.96*SE))))

## plot
matplot(newdata$mid, ci, type="n", xlab="Time since diagnosis (months)",
        ylab="Survival", main="Males aged 45-59 years diagnosed 1985-94")
polygon.ci(newdata$mid, ci) 
lines(newdata$mid, surv)