Biostatistics III in R

Code by Mark Clements

Exercise 14. Non-collapsibility of proportional hazards models

We simulate for time-to-event data assuming constant hazards and then investigate whether we can estimate the underlying parameters. Note that the binary variable X is essentially a coin toss and we have used a large variance for the normally distributed U.

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

The assumed causal diagram is reproduced below:

(a) Fitting models with both X and U

For constant hazards, we can fit (i) Poisson regression, (ii) Cox regression and (iii) flexible parametric survival models.

It may be useful to investigate whether the hazard ratio for X is time-varying hazard ratio and the form for survival.

(b) Fitting models with only X

We now model by excluding the variable U. This variable could be excluded when it is not measured or perhaps when the variable is not considered to be a confounding variable – from the causal diagram, the two variables X and U are not correlated and are only connected through the time variable T.

Again, we suggest investigating whether the hazard ratio for X is time-varying.

What do you see from the time-varing hazard ratio? Is U a potential confounder for X?

(c) Rarer outcomes

We now simulate for rarer outcomes by changing the censoring distribution:

What do you observe?

(d) Less heterogeneity

We now simulate for less heterogeneity by changing the reducing the standard deviation for the random effect U from 3 to 1.

What do you observe?

(e) Accelerated failure time models

As an alternative model class, we can fit accelerated failure time models with a smooth baseline survival function. We can use the rstpm2::aft function, which uses splines to model baseline survival. Using the baseline simulation, fit and interpret smooth accelerated failure time models: