Exercise 14. Non-collapsibility of proportional hazards models


We simulate for time-to-event data assuming constant hazards and then investigate whether we can estimate the underlying parameters. Note that the binary variable \(X\) is essentially a coin toss and we have used a large variance for the normally distributed \(U\).

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

library(biostat3) 
library(rstpm2)
library(ggplot2)

The assumed causal diagram is reproduced below:

set.seed(12345)
d <- local({
    n <- 1e4
    x <- rbinom(n, 1, 0.5)
    u <- rnorm(n, 0, 3)
    t <- rexp(n, exp(-5+x+u))
    c <- runif(n, 0, 10)
    y <- pmin(t, c)
    delta <- (t < c)
    data.frame(y,x,u,delta)
})
head(d)
##          y x          u delta
## 1 1.809348 1 -2.0774167 FALSE
## 2 1.251765 1  3.3740868  TRUE
## 3 2.078184 1  0.5236054 FALSE
## 4 3.528797 1 -5.9034201 FALSE
## 5 6.659236 0 -4.9328152 FALSE
## 6 5.366579 0 -0.3293542 FALSE

(a) Fitting models with both \(X\) and \(U\)

For constant hazards, we can fit (i) Poisson regression, (ii) Cox regression and (iii) flexible parametric survival models. With both covariates, these models are expected to estimate the parameters for \(X\) and \(U\), with values close to 1:

summary(fit1 <- glm(delta~x+u+offset(log(y)),data=d,family=poisson))
## 
## Call:
## glm(formula = delta ~ x + u + offset(log(y)), family = poisson, 
##     data = d)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.97595    0.05344  -93.11   <2e-16 ***
## x            0.96027    0.04372   21.96   <2e-16 ***
## u            0.99087    0.01091   90.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 14687.0  on 9999  degrees of freedom
## Residual deviance:  5811.2  on 9997  degrees of freedom
## AIC: 10257
## 
## Number of Fisher Scoring iterations: 7
summary(fit2 <- coxph(Surv(y,delta)~x+u,data=d))
## Call:
## coxph(formula = Surv(y, delta) ~ x + u, data = d)
## 
##   n= 10000, number of events= 2220 
## 
##      coef exp(coef) se(coef)     z Pr(>|z|)    
## x 0.96227   2.61764  0.04490 21.43   <2e-16 ***
## u 0.99034   2.69215  0.01461 67.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##   exp(coef) exp(-coef) lower .95 upper .95
## x     2.618     0.3820     2.397     2.858
## u     2.692     0.3714     2.616     2.770
## 
## Concordance= 0.922  (se = 0.002 )
## Likelihood ratio test= 7448  on 2 df,   p=<2e-16
## Wald test            = 4629  on 2 df,   p=<2e-16
## Score (logrank) test = 5611  on 2 df,   p=<2e-16
summary(fit3 <- stpm2(Surv(y,delta)~x+u,data=d,df=4))
## Maximum likelihood estimation
## 
## Call:
## stpm2(formula = Surv(y, delta) ~ x + u, data = d, df = 4)
## 
## Coefficients:
##                        Estimate Std. Error z value     Pr(z)    
## (Intercept)          -14.565979   0.516079 -28.224 < 2.2e-16 ***
## x                      0.960750   0.044878  21.408 < 2.2e-16 ***
## u                      0.990848   0.014600  67.865 < 2.2e-16 ***
## nsx(log(y), df = 4)1   9.424531   0.494145  19.072 < 2.2e-16 ***
## nsx(log(y), df = 4)2   8.786272   0.328904  26.714 < 2.2e-16 ***
## nsx(log(y), df = 4)3  16.534497   1.013387  16.316 < 2.2e-16 ***
## nsx(log(y), df = 4)4   8.562140   0.206188  41.526 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -2 log L: 8520.997
## summary table for the coefficients for X
rbind(Poisson=coef(summary(fit1))["x",c("Estimate","Std. Error")],
      Cox=coef(summary(fit2))["x",c("coef","se(coef)")],
      Stpm2=coef(summary(fit3))["x",c("Estimate","Std. Error")])
##          Estimate Std. Error
## Poisson 0.9602659 0.04372345
## Cox     0.9622728 0.04490412
## Stpm2   0.9607498 0.04487831

If we fit a time-varying hazard ratio for \(X\), we see that the hazard ratio looks reasonably constant.

fit <- stpm2(Surv(y,delta)~x+u,data=d,df=4, tvc=list(x=2)) # slow
plot(fit, type="hr", newdata=data.frame(x=0,u=0), var="x", ylim=c(1,4))

predict(fit, type="surv", newdata=data.frame(x=0:1,u=3), grid=TRUE, full=TRUE,
        se.fit=TRUE) |>
    ggplot(aes(x=y,y=Estimate,fill=factor(x),ymin=lower,ymax=upper)) +
    ylab("Survival") +
    geom_ribbon(alpha=0.6) +
    geom_line(aes(colour=factor(x)))

## standardisation: slow:(
plot(fit, type="meanhr", newdata=transform(d,x=0), var="x", seqLength=31, ylim=c(1,4))

plot(fit, type="meansurvdiff", newdata=transform(d,x=0), var="x", seqLength=31)

plot(fit, type="meansurv", newdata=transform(d,x=0), seqLength=101)

(b) Fitting models with only \(X\)

We now model by excluding the variable \(U\). This variable could be excluded when it is not measured or perhaps when the variable is not considered to be a confounding variable – from the causal diagram, the two variables \(X\) and \(U\) are not correlated and are only connected through the time variable \(T\).

summary(fit1 <- glm(delta~x+offset(log(y)),data=d,family=poisson))
## 
## Call:
## glm(formula = delta ~ x + offset(log(y)), family = poisson, data = d)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.19636    0.03377  -94.66   <2e-16 ***
## x            0.51289    0.04341   11.81   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 14687  on 9999  degrees of freedom
## Residual deviance: 14544  on 9998  degrees of freedom
## AIC: 18988
## 
## Number of Fisher Scoring iterations: 7
summary(fit2 <- coxph(Surv(y,delta)~x,data=d))
## Call:
## coxph(formula = Surv(y, delta) ~ x, data = d)
## 
##   n= 10000, number of events= 2220 
## 
##      coef exp(coef) se(coef)     z Pr(>|z|)    
## x 0.48511   1.62436  0.04342 11.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##   exp(coef) exp(-coef) lower .95 upper .95
## x     1.624     0.6156     1.492     1.769
## 
## Concordance= 0.565  (se = 0.005 )
## Likelihood ratio test= 127.9  on 1 df,   p=<2e-16
## Wald test            = 124.8  on 1 df,   p=<2e-16
## Score (logrank) test = 127.3  on 1 df,   p=<2e-16
summary(fit3 <- stpm2(Surv(y,delta)~x,data=d,df=4))
## Maximum likelihood estimation
## 
## Call:
## stpm2(formula = Surv(y, delta) ~ x, data = d, df = 4)
## 
## Coefficients:
##                       Estimate Std. Error z value     Pr(z)    
## (Intercept)          -9.883577   0.473288 -20.883 < 2.2e-16 ***
## x                     0.485236   0.043423  11.175 < 2.2e-16 ***
## nsx(log(y), df = 4)1  7.458585   0.467806  15.944 < 2.2e-16 ***
## nsx(log(y), df = 4)2  6.323162   0.310308  20.377 < 2.2e-16 ***
## nsx(log(y), df = 4)3 13.212493   0.930874  14.194 < 2.2e-16 ***
## nsx(log(y), df = 4)4  5.541993   0.192044  28.858 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -2 log L: 15844.98
## summary table for the coefficients for X
rbind(Poisson=coef(summary(fit1))["x",c("Estimate","Std. Error")],
      Cox=coef(summary(fit2))["x",c("coef","se(coef)")],
      Stpm2=coef(summary(fit3))["x",c("Estimate","Std. Error")])
##          Estimate Std. Error
## Poisson 0.5128869 0.04341441
## Cox     0.4851147 0.04342341
## Stpm2   0.4852363 0.04342328

We clearly see that the estimate for \(X\) is different in the models without \(U\). If we now allow the the hazard ratio to vary by time:

fit <- stpm2(Surv(y,delta)~x,data=d,df=4, tvc=list(x=2))
plot(fit, type="hr", newdata=data.frame(x=0), var="x", ylim=c(1,4))

We see that the hazard ratio is clearly time-varying, starting at an initial value at time = 0 of close to exp(1), but then declining rapidly. As discussed in class, this is an example of unmeasured heterogeneity, which is a normal random effect or, in this context, a log-normal frailty. Initially, there is no selection in the at-risk population, and the estimated marginal hazard ratio from the model without \(U\) is similar to the conditional hazrd ratio from the model that adjusts for \(U\). However, at later times, the at-risk population has been selected for those with a smaller frailty (because they were less likely to have had the event), and for a log-normal frailty the marginal hazard ratio is attenuated towards 1.

Let us stress that \(U\) is not a confounder. The issue is that fitting proportional hazards models with unmodelled heterogeneity for variables that are not confounders can lead to time-varying hazard ratios – and marginal estimates that do not have a causal interpretation.

(c) Rarer outcomes

We now simulate for rarer outcomes by changing the censoring distribution.

set.seed(12345)
d <- local({
    n <- 1e4*10 # CHANGED N
    x <- rbinom(n, 1, 0.5)
    u <- rnorm(n, 0, 3)
    t <- rexp(n, exp(-5+x+u))
    c <- runif(n, 0, 10/1000) # CHANGED FROM 10 TO 0.01
    y <- pmin(t, c)
    delta <- (t < c)
    data.frame(y,x,u,delta)
})
summary(fit1 <- glm(delta~x+offset(log(y)),data=d,family=poisson))

summary(fit2 <- coxph(Surv(y,delta)~x,data=d))

summary(fit3 <- stpm2(Surv(y,delta)~x,data=d,df=4))
## summary table for the coefficients for X
rbind(Poisson=coef(summary(fit1))["x",c("Estimate","Std. Error")],
      Cox=coef(summary(fit2))["x",c("coef","se(coef)")],
      Stpm2=coef(summary(fit3))["x",c("Estimate","Std. Error")])
##         Estimate Std. Error
## Poisson 1.081150  0.1212037
## Cox     1.081513  0.1212136
## Stpm2   1.081399  0.1212136

For rarer outcomes, the marginal estimates that do not model for \(U\) are closer to the conditional estimates that have modelled for \(U\).

(d) Less heterogeneity

We now simulate for less heterogeneity by changing the reducing the standard deviation for the random effect \(U\) from 3 to 1.

set.seed(12345)
d <- local( {
    n <- 1e4
    x <- rbinom(n, 1, 0.5)
    u <- rnorm(n, 0, 1) # CHANGED SD FROM 3 TO 1
    t <- rexp(n, exp(-5+x+u))
    c <- runif(n, 0, 10) 
    y <- pmin(t, c)
    delta <- (t < c)
    data.frame(y,x,u,delta)
})
summary(fit1 <- glm(delta~x+offset(log(y)),data=d,family=poisson))

summary(fit2 <- coxph(Surv(y,delta)~x,data=d))

summary(fit3 <- stpm2(Surv(y,delta)~x,data=d,df=4))
## summary table for the coefficients for X
rbind(Poisson=coef(summary(fit1))["x",c("Estimate","Std. Error")],
      Cox=coef(summary(fit2))["x",c("coef","se(coef)")],
      Stpm2=coef(summary(fit3))["x",c("Estimate","Std. Error")])
##          Estimate Std. Error
## Poisson 0.9498227 0.07583196
## Cox     0.9444216 0.07584759
## Stpm2   0.9441747 0.07584672

The marginal estimates are now much closer to the conditional estimates. For little or no heterogeneity, the marginal and conditional estimates would be very similar.

(e) Accelerated failure time models

We re-run the baseline simulations:

set.seed(12345)
d <- local({
    n <- 1e4
    x <- rbinom(n, 1, 0.5)
    u <- rnorm(n, 0, 3)
    t <- rexp(n, exp(-5+x+u))
    c <- runif(n, 0, 10)
    y <- pmin(t, c)
    delta <- (t < c)
    data.frame(y,x,u,delta)
})
head(d)

The accelerated failure time models have the form \(S(t|x,u)=S_0(t/\exp( \beta_1 x + \beta_2 u))\), where the function \(S_0(t)\) is expressed in terms of splines. For an exponential distribution, the hazard ratio is equal to the inverse of the ratio of the means — so we expect values of \(-1\) for \(x\) and \(u\).

fit <- aft(Surv(y,delta)~x+u,data=d,df=4)
summary(fit)
## Maximum likelihood estimation
## 
## Call:
## bbmle::mle2(minuslogl = negll, start = coef, eval.only = TRUE, 
##     vecpar = TRUE, gr = gradient, control = control)
## 
## Coefficients:
##                                        Estimate Std. Error  z value     Pr(z)
## x                                     -0.962300   0.044098 -21.8218 < 2.2e-16
## u                                     -0.991064   0.012670 -78.2240 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)1  -3.541448   1.262247  -2.8057  0.005021
## nsx(logtstar, df, intercept = TRUE)2   1.847778   1.122168   1.6466  0.099637
## nsx(logtstar, df, intercept = TRUE)3 -20.886221   3.220093  -6.4862 8.802e-11
## nsx(logtstar, df, intercept = TRUE)4   8.702543   2.087292   4.1693 3.055e-05
##                                         
## x                                    ***
## u                                    ***
## nsx(logtstar, df, intercept = TRUE)1 ** 
## nsx(logtstar, df, intercept = TRUE)2 .  
## nsx(logtstar, df, intercept = TRUE)3 ***
## nsx(logtstar, df, intercept = TRUE)4 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -2 log L: 8520.726
fit <- aft(Surv(y,delta)~x,data=d,df=4)
summary(fit)
## Maximum likelihood estimation
## 
## Call:
## bbmle::mle2(minuslogl = negll, start = coef, eval.only = TRUE, 
##     vecpar = TRUE, gr = gradient, control = control)
## 
## Coefficients:
##                                        Estimate Std. Error  z value     Pr(z)
## x                                     -1.036624   0.091058 -11.3842 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)1  -1.712803   0.202523  -8.4573 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)2   1.744380   0.158298  11.0196 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)3 -12.163961   0.465543 -26.1286 < 2.2e-16
## nsx(logtstar, df, intercept = TRUE)4   5.648685   0.296200  19.0705 < 2.2e-16
##                                         
## x                                    ***
## nsx(logtstar, df, intercept = TRUE)1 ***
## nsx(logtstar, df, intercept = TRUE)2 ***
## nsx(logtstar, df, intercept = TRUE)3 ***
## nsx(logtstar, df, intercept = TRUE)4 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## -2 log L: 15835.68

We see that the model with both \(x\) and \(u\) estimates values are very close to \(-1\) — which is as expected. Importantly, the model with only \(x\) also estimates a value close to \(-1\), which shows that the acclerated failure time models are robust to omitted covariates. The standard error for \(x\) was smaller in the model with both \(x\) and \(u\) compared with that for the model with only \(x\).