Biostatistics III in R

Code by Annika Tillander, Andreas Karlsson and Mark Clements

Exercise 10. Examining the proportional hazards hypothesis (localised melanoma)


Load the diet data using time-on-study as the timescale with a maximum of 10 years follow-up.

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

Load melanoma data and explore it.

(a)

For the localised melanoma data with 10 years follow-up, plot the instantaneous cause-specific hazard for each calendar period.

(b)

Now plot the instantaneous cause-specific hazard for each calendar period using a log scale for the y axis (use the option yscale(log)). What would you expect to see if a proportional hazards assumption was appropriate? Do you see it?

(c)

Another graphical way of checking the proportional hazards assumption is to plot the log cumulative cause specific hazard function for each calendar period. These plots were not given extensive coverage in the lectures, so attempt this if you like or continue to part (d).

(d)

Compare your estimated hazard ratio from part (a) with the one from a fitted Cox model with calendar period as the only explanatory variable. Are they similar?

(e)

Now fit a more complex model and use graphical methods to explore the assumption of proportional hazards by calendar period.

What do you conclude?

(f)

Do plot the Schoenfeld residual plots for the variable agegrp. What are your conclusions regarding the assumption of proportional hazards by age group?

(g)

Now formally test the assumption of proportional hazards.

Are your conclusions from the test coherent with your conclusions from the graphical assessments?

(h)

Estimate separate age effects for the first two years of follow-up (and separate estimates for the remainder of the follow-up) while controlling for sex and period. Do the estimates for the effect of age differ between the two periods of follow-up? Write out the regression equation.

We see effects of age (i.e., the hazard ratios) for the period 0–2 years subsequent to diagnosis along with the interaction effects. An advantage of the default parameterisation is that one can easily test the statistical significance of the interaction effects. Before going further, test whether the age*follow-up interaction is statistically significant (using a Wald and/or LR test).

(i)

Often we wish to see the effects of exposure (age) for each level of the modifier (time since diagnosis). That is, we would like to complete the table below with relevant hazard ratios. To get the effects of age for the period 2+ years after diagnosis, using the default parametrization, we must multiply the hazard ratios for 0–2 years by the appropriate interaction effect. Now let’s reparameterise the model to directly estimate the effects of age for each level of time since diagnosis.

0–2 years 2+ years
Agegrp0-44 1.00 1.00
Agegrp45-59 – –
Agegrp60-74 – –
Agegrp75+ – –

We can also use the tt argument in coxph for modelling for time-varying effects:

Write out the regression equations for models cox2p8Split2 and cox2p8tvct. Based on model cox2p8tvct, write out a formula for the hazard ratio for those aged 75 years and over compared with those aged less than 45 years as a function of time.

(j)

ADVANCED: Fit an analogous Poisson regression model. Are the parameter estimates similar? HINT: You will need to split the data by time since diagnosis. Calculate and plot the rates.