Exercise 14. Non-collapsibility of proportional hazards models
We simulate for time-to-event data assuming constant hazards and then investigate whether we can estimate the underlying parameters. Note that the binary variable \(X\) is essentially a coin toss and we have used a large variance for the normally distributed \(U\).
You may have to install the required packages the first time you use
them. You can install a package by
install.packages("package_of_interest")
for each package
you require.
The assumed causal diagram is reproduced below:
set.seed(12345)
d <- local({
n <- 1e4
x <- rbinom(n, 1, 0.5)
u <- rnorm(n, 0, 3)
t <- rexp(n, exp(-5+x+u))
c <- runif(n, 0, 10)
y <- pmin(t, c)
delta <- (t < c)
data.frame(y,x,u,delta)
})
head(d)
## y x u delta
## 1 1.809348 1 -2.0774167 FALSE
## 2 1.251765 1 3.3740868 TRUE
## 3 2.078184 1 0.5236054 FALSE
## 4 3.528797 1 -5.9034201 FALSE
## 5 6.659236 0 -4.9328152 FALSE
## 6 5.366579 0 -0.3293542 FALSE
(a) Fitting models with both \(X\) and \(U\)
For constant hazards, we can fit (i) Poisson regression, (ii) Cox regression and (iii) flexible parametric survival models. With both covariates, these models are expected to estimate the parameters for \(X\) and \(U\), with values close to 1:
##
## Call:
## glm(formula = delta ~ x + u + offset(log(y)), family = poisson,
## data = d)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.97595 0.05344 -93.11 <2e-16 ***
## x 0.96027 0.04372 21.96 <2e-16 ***
## u 0.99087 0.01091 90.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 14687.0 on 9999 degrees of freedom
## Residual deviance: 5811.2 on 9997 degrees of freedom
## AIC: 10257
##
## Number of Fisher Scoring iterations: 7
## Call:
## coxph(formula = Surv(y, delta) ~ x + u, data = d)
##
## n= 10000, number of events= 2220
##
## coef exp(coef) se(coef) z Pr(>|z|)
## x 0.96227 2.61764 0.04490 21.43 <2e-16 ***
## u 0.99034 2.69215 0.01461 67.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## x 2.618 0.3820 2.397 2.858
## u 2.692 0.3714 2.616 2.770
##
## Concordance= 0.922 (se = 0.002 )
## Likelihood ratio test= 7448 on 2 df, p=<2e-16
## Wald test = 4629 on 2 df, p=<2e-16
## Score (logrank) test = 5611 on 2 df, p=<2e-16
## Maximum likelihood estimation
##
## Call:
## stpm2(formula = Surv(y, delta) ~ x + u, data = d, df = 4)
##
## Coefficients:
## Estimate Std. Error z value Pr(z)
## (Intercept) -14.565979 0.516079 -28.224 < 2.2e-16 ***
## x 0.960750 0.044878 21.408 < 2.2e-16 ***
## u 0.990848 0.014600 67.865 < 2.2e-16 ***
## nsx(log(y), df = 4)1 9.424531 0.494145 19.072 < 2.2e-16 ***
## nsx(log(y), df = 4)2 8.786272 0.328904 26.714 < 2.2e-16 ***
## nsx(log(y), df = 4)3 16.534497 1.013387 16.316 < 2.2e-16 ***
## nsx(log(y), df = 4)4 8.562140 0.206188 41.526 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## -2 log L: 8520.997
## summary table for the coefficients for X
rbind(Poisson=coef(summary(fit1))["x",c("Estimate","Std. Error")],
Cox=coef(summary(fit2))["x",c("coef","se(coef)")],
Stpm2=coef(summary(fit3))["x",c("Estimate","Std. Error")])
## Estimate Std. Error
## Poisson 0.9602659 0.04372345
## Cox 0.9622728 0.04490412
## Stpm2 0.9607498 0.04487831
If we fit a time-varying hazard ratio for \(X\), we see that the hazard ratio looks reasonably constant.
fit <- stpm2(Surv(y,delta)~x+u,data=d,df=4, tvc=list(x=2)) # slow
plot(fit, type="hr", newdata=data.frame(x=0,u=0), var="x", ylim=c(1,4))
predict(fit, type="surv", newdata=data.frame(x=0:1,u=3), grid=TRUE, full=TRUE,
se.fit=TRUE) |>
ggplot(aes(x=y,y=Estimate,fill=factor(x),ymin=lower,ymax=upper)) +
ylab("Survival") +
geom_ribbon(alpha=0.6) +
geom_line(aes(colour=factor(x)))
(b) Fitting models with only \(X\)
We now model by excluding the variable \(U\). This variable could be excluded when it is not measured or perhaps when the variable is not considered to be a confounding variable – from the causal diagram, the two variables \(X\) and \(U\) are not correlated and are only connected through the time variable \(T\).
##
## Call:
## glm(formula = delta ~ x + offset(log(y)), family = poisson, data = d)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.19636 0.03377 -94.66 <2e-16 ***
## x 0.51289 0.04341 11.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 14687 on 9999 degrees of freedom
## Residual deviance: 14544 on 9998 degrees of freedom
## AIC: 18988
##
## Number of Fisher Scoring iterations: 7
## Call:
## coxph(formula = Surv(y, delta) ~ x, data = d)
##
## n= 10000, number of events= 2220
##
## coef exp(coef) se(coef) z Pr(>|z|)
## x 0.48511 1.62436 0.04342 11.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## x 1.624 0.6156 1.492 1.769
##
## Concordance= 0.565 (se = 0.005 )
## Likelihood ratio test= 127.9 on 1 df, p=<2e-16
## Wald test = 124.8 on 1 df, p=<2e-16
## Score (logrank) test = 127.3 on 1 df, p=<2e-16
## Maximum likelihood estimation
##
## Call:
## stpm2(formula = Surv(y, delta) ~ x, data = d, df = 4)
##
## Coefficients:
## Estimate Std. Error z value Pr(z)
## (Intercept) -9.883577 0.473288 -20.883 < 2.2e-16 ***
## x 0.485236 0.043423 11.175 < 2.2e-16 ***
## nsx(log(y), df = 4)1 7.458585 0.467806 15.944 < 2.2e-16 ***
## nsx(log(y), df = 4)2 6.323162 0.310308 20.377 < 2.2e-16 ***
## nsx(log(y), df = 4)3 13.212493 0.930874 14.194 < 2.2e-16 ***
## nsx(log(y), df = 4)4 5.541993 0.192044 28.858 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## -2 log L: 15844.98
## summary table for the coefficients for X
rbind(Poisson=coef(summary(fit1))["x",c("Estimate","Std. Error")],
Cox=coef(summary(fit2))["x",c("coef","se(coef)")],
Stpm2=coef(summary(fit3))["x",c("Estimate","Std. Error")])
## Estimate Std. Error
## Poisson 0.5128869 0.04341441
## Cox 0.4851147 0.04342341
## Stpm2 0.4852363 0.04342328
We clearly see that the estimate for \(X\) is different in the models without \(U\). If we now allow the the hazard ratio to vary by time:
fit <- stpm2(Surv(y,delta)~x,data=d,df=4, tvc=list(x=2))
plot(fit, type="hr", newdata=data.frame(x=0), var="x", ylim=c(1,4))
We see that the hazard ratio is clearly time-varying, starting at an initial value at time = 0 of close to exp(1), but then declining rapidly. As discussed in class, this is an example of unmeasured heterogeneity, which is a normal random effect or, in this context, a log-normal frailty. Initially, there is no selection in the at-risk population, and the estimated marginal hazard ratio from the model without \(U\) is similar to the conditional hazrd ratio from the model that adjusts for \(U\). However, at later times, the at-risk population has been selected for those with a smaller frailty (because they were less likely to have had the event), and for a log-normal frailty the marginal hazard ratio is attenuated towards 1.
Let us stress that \(U\) is not a confounder. The issue is that fitting proportional hazards models with unmodelled heterogeneity for variables that are not confounders can lead to time-varying hazard ratios – and marginal estimates that do not have a causal interpretation.
(c) Rarer outcomes
We now simulate for rarer outcomes by changing the censoring distribution.
set.seed(12345)
d <- local({
n <- 1e4*10 # CHANGED N
x <- rbinom(n, 1, 0.5)
u <- rnorm(n, 0, 3)
t <- rexp(n, exp(-5+x+u))
c <- runif(n, 0, 10/1000) # CHANGED FROM 10 TO 0.01
y <- pmin(t, c)
delta <- (t < c)
data.frame(y,x,u,delta)
})
summary(fit1 <- glm(delta~x+offset(log(y)),data=d,family=poisson))
summary(fit2 <- coxph(Surv(y,delta)~x,data=d))
summary(fit3 <- stpm2(Surv(y,delta)~x,data=d,df=4))
## summary table for the coefficients for X
rbind(Poisson=coef(summary(fit1))["x",c("Estimate","Std. Error")],
Cox=coef(summary(fit2))["x",c("coef","se(coef)")],
Stpm2=coef(summary(fit3))["x",c("Estimate","Std. Error")])
## Estimate Std. Error
## Poisson 1.081150 0.1212037
## Cox 1.081513 0.1212136
## Stpm2 1.081399 0.1212136
For rarer outcomes, the marginal estimates that do not model for \(U\) are closer to the conditional estimates that have modelled for \(U\).
(d) Less heterogeneity
We now simulate for less heterogeneity by changing the reducing the standard deviation for the random effect \(U\) from 3 to 1.
set.seed(12345)
d <- local( {
n <- 1e4
x <- rbinom(n, 1, 0.5)
u <- rnorm(n, 0, 1) # CHANGED SD FROM 3 TO 1
t <- rexp(n, exp(-5+x+u))
c <- runif(n, 0, 10)
y <- pmin(t, c)
delta <- (t < c)
data.frame(y,x,u,delta)
})
summary(fit1 <- glm(delta~x+offset(log(y)),data=d,family=poisson))
summary(fit2 <- coxph(Surv(y,delta)~x,data=d))
summary(fit3 <- stpm2(Surv(y,delta)~x,data=d,df=4))
## summary table for the coefficients for X
rbind(Poisson=coef(summary(fit1))["x",c("Estimate","Std. Error")],
Cox=coef(summary(fit2))["x",c("coef","se(coef)")],
Stpm2=coef(summary(fit3))["x",c("Estimate","Std. Error")])
## Estimate Std. Error
## Poisson 0.9498227 0.07583196
## Cox 0.9444216 0.07584759
## Stpm2 0.9441747 0.07584672
The marginal estimates are now much closer to the conditional estimates. For little or no heterogeneity, the marginal and conditional estimates would be very similar.
(e) Accelerated failure time models
We re-run the baseline simulations:
set.seed(12345)
d <- local({
n <- 1e4
x <- rbinom(n, 1, 0.5)
u <- rnorm(n, 0, 3)
t <- rexp(n, exp(-5+x+u))
c <- runif(n, 0, 10)
y <- pmin(t, c)
delta <- (t < c)
data.frame(y,x,u,delta)
})
head(d)
The accelerated failure time models have the form \(S(t|x,u)=S_0(t/\exp( \beta_1 x + \beta_2 u))\), where the function \(S_0(t)\) is expressed in terms of splines. For an exponential distribution, the hazard ratio is equal to the inverse of the ratio of the means — so we expect values of \(-1\) for \(x\) and \(u\).
## Maximum likelihood estimation
##
## Call:
## bbmle::mle2(minuslogl = negll, start = coef, eval.only = TRUE,
## vecpar = TRUE, gr = gradient, control = control)
##
## Coefficients:
## Estimate Std. Error z value Pr(z)
## x -0.962300 0.044098 -21.8218 < 2.2e-16
## u -0.991064 0.012670 -78.2240 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)1 -3.541448 1.262247 -2.8057 0.005021
## nsx(logtstar, df, intercept = TRUE)2 1.847778 1.122168 1.6466 0.099637
## nsx(logtstar, df, intercept = TRUE)3 -20.886221 3.220093 -6.4862 8.802e-11
## nsx(logtstar, df, intercept = TRUE)4 8.702543 2.087292 4.1693 3.055e-05
##
## x ***
## u ***
## nsx(logtstar, df, intercept = TRUE)1 **
## nsx(logtstar, df, intercept = TRUE)2 .
## nsx(logtstar, df, intercept = TRUE)3 ***
## nsx(logtstar, df, intercept = TRUE)4 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## -2 log L: 8520.726
## Maximum likelihood estimation
##
## Call:
## bbmle::mle2(minuslogl = negll, start = coef, eval.only = TRUE,
## vecpar = TRUE, gr = gradient, control = control)
##
## Coefficients:
## Estimate Std. Error z value Pr(z)
## x -1.036624 0.091058 -11.3842 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)1 -1.712803 0.202523 -8.4573 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)2 1.744380 0.158298 11.0196 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)3 -12.163961 0.465543 -26.1286 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)4 5.648685 0.296200 19.0705 < 2.2e-16
##
## x ***
## nsx(logtstar, df, intercept = TRUE)1 ***
## nsx(logtstar, df, intercept = TRUE)2 ***
## nsx(logtstar, df, intercept = TRUE)3 ***
## nsx(logtstar, df, intercept = TRUE)4 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## -2 log L: 15835.68
We see that the model with both \(x\) and \(u\) estimates values are very close to \(-1\) — which is as expected. Importantly, the model with only \(x\) also estimates a value close to \(-1\), which shows that the acclerated failure time models are robust to omitted covariates. The standard error for \(x\) was smaller in the model with both \(x\) and \(u\) compared with that for the model with only \(x\).