Biostatistics III in R

Code by Johan Zetterqvist and Mark Clements

Exercise 7. Model cause-specific mortality with Poisson regression

In this exercise we model, using Poisson regression, cause-specific mortality of patients diagnosed with localised (stage==1) melanoma.

In exercise 9 we model cause-specific mortality using Cox regression and in exercise 28 we use flexible parametric models. The aim is to illustrate that these three methods are very similar.

The aim of these exercises is to explore the similarities and differences to these three approaches to modelling. We will be comparing the results (and their interpretation) as we proceed through the exercises.

Load the melanoma dataset, restrict to localised cancer, and explore it.

Rates can be modelled on different timescales, e.g., attained age, time-since-entry, calendar time. Plot the CHD incidence rates both by attained age and by time-since-entry. Is there a difference? Do the same for CHD hazard by different energy intakes (hieng).

(a)

i.

Plot Kaplan-Meier estimates of cause-specific survival as a function of calendar period of diagnosis.

ii.

Now plot the estimated hazard function (cause-specific mortality rate) as a function of calendar period of diagnosis.

During which calendar period (the early or the latter) is mortality the lowest?

iii.

Is the interpretation (with respect to how prognosis depends on period) based on the hazard consistent with the interpretation of the survival plot?

(b)

Estimate the cause-specific mortality rate for each calendar period.

During which calendar period (the early or the latter) is mortality the lowest? Is this consistent with what you found earlier? If not, why the inconsistency?

(c)

The reason for the inconsistency between parts 7a and 7b was confounding by time since diagnosis. The comparison in part 7a was adjusted for time since diagnosis (since we compare the differences between the curves at each point in time) whereas the comparison in part 7b was not. Understanding this concept is central to the remainder of the exercise so please ask for help if you don’t follow.

Two approaches for controlling for confounding are ‘restriction’ and ‘statistical adjustment’. We will first use restriction to control for confounding. We will restrict the potential follow-up time to a maximum of 120 months. Individuals who survive more than 120 months are censored at 120 months.

i.

Estimate the cause-specific mortality rate for each calendar period.

During which calendar period (the early of the latter) is mortality the lowest? Is this consistent with what you found in part 7b?

ii.

Calculate by hand the ratio (85–94/75–84) of the two mortality rates (i.e., a mortality rate ratio) and interpret the estimate (i.e., during which period is mortality higher/lower and by how much).

iii.

Now use Poisson regression to estimate the same mortality rate ratio. Write the linear predictor using pen and paper and draw a graph of the fitted hazard rates.

(7d)

In order to adjust for time since diagnosis (i.e., adjust for the fact that we expect mortality to depend on time since diagnosis) we need to split the data by this timescale. We will restrict our analysis to mortality up to 10 years following diagnosis.

(e)

Now tabulate (and produce a graph of) the rates by follow-up time.

Mortality appears to be quite low during the first year of follow-up. Does this seem reasonable considering the disease with which these patients have been diagnosed?

(f)

Compare the plot of the estimated rates to a plot of the hazard rate as a function of continuous time.

Is the interpretation similar? Do you think it is sufficient to classify follow-up time into annual intervals or might it be preferable to use, for example, narrower intervals?

(g)

Use Poisson regression to estimate incidence rate ratios as a function of follow-up time. Write the linear predictor using pen and paper.

Does the pattern of estimated incident rate ratios mirror the pattern you observed in the plots? Draw a graph of the fitted hazard rate using pen and paper.

Write out the regression equation.

(h)

Now estimate the effect of calendar period of diagnosis while adjusting for time since diagnosis. Before fitting this model, predict what you expect the estimated effect to be (i.e., will it be higher, lower, or similar to the value we obtained in part c). Write the linear predictor using pen and paper and draw a graph of the fitted hazard rates.

Is the estimated effect of calendar period of diagnosis consistent with what you expected? Add an interaction between follow-up and calendar period of diagnosis and interpret the results.

(i)

Now control for age, sex, and calendar period. Write the linear predictor using pen and paper.

i.

Interpret the estimated hazard ratio for the parameter labelled agegrp 2, including a comment on statistical significance. ### ii. ###
Is the effect of calendar period strongly confounded by age and sex? That is, does the inclusion of sex and age in the model change the estimate for the effect of calendar period? ### iii. ###
Perform a Wald test of the overall effect of age and interpret the results.

(j)

Is the effect of sex modified by calendar period (whilst adjusting for age and follow-up)? Fit an appropriate interaction term to test this hypothesis. Write the linear predictor using pen and paper.

(k)

Based on the interaction model you fitted in exercise 7j, estimate the hazard ratio for the effect of sex (with 95% confidence interval) for each calendar period.

ADVANCED: Do this with each of the following methods and confirm that the results are the same:

i.

Using hand-calculation on the estimates from exercise 7j.

ii.

Using the estimates from exercise 7j.

iii.

Creating appropriate dummy variables that represent the effects of sex for each calendar period.

Write the linear predictor using pen and paper.

iv.

Using the formula to specify the interactions to repeat the previous model.

Using the formula to specify the interactions, repeat the previous model.

(l)

Now fit a separate model for each calendar period in order to estimate the hazard ratio for the effect of sex (with 95% confidence interval) for each calendar period.

Why do the estimates differ from those you obtained in the previous part?

Can you fit a single model that reproduces the estimates you obtained from the stratified models?

(m)

Split by month and fit a model to smooth for time using natural splines, adjusting for age group and calendar period. Plot the baseline hazard.

(n)

Split by month and fit a model to smooth for time using natural splines, adjusting for age group and calendar period, with a time-varying hazard ratio for calendar period. Plot the time-varying hazard ratio.

(o)

We can estimate survival from a Poisson regression model by integration. This can be done elegantly using ordinary differential equations: if survival S at time 0 is 1, such that S(0) = 1, then dS(t)/dt = −S(t)h(t), where h(t) is the hazard.

If the deSolve package is available (which is not true with WebR:(), then we can use the rstpm2::markov_msm function for a two-state Markov multi-state model:

What pattern do you observe?

This approach will general quite readily to competing risks and other multi-state models. Alternatively, we can use the pracma package (which is available on WebR), which requires more work:

Again, what pattern do you observe?