Exercise 8. Diet data: Using Poisson regression to study the effect of energy intake adjusting for confounders on two different timescales

Use Poisson regression to study the association between energy intake (hieng) and CHD adjusted for potential confounders (job, BMI). We know that people who expend a lot of energy (i.e., are physically active) require a higher energy intake. We do not have data on physical activity but we are hoping that occupation (job) will serve as a surrogate measure of work-time physical activity (conductors on London double-decker busses expend energy walking up and down the stairs all day).

Fit models both without adjusting for ‘time’ and by adjusting for attained age (you will need to split the data) and time-since-entry and compare the results.


Load the diet data using time-on-study as the timescale.

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

library(biostat3) 
library(bshazard)  # for bshazard
library(knitr)     # kable
library(broom)     # tidy

Load diet data and explore it.

diet <- transform(biostat3::diet, bmi = weight/((height/100)^2))
head(diet) |> kable("html")
id chd y hieng energy job month height weight doe dox dob yoe yox yob bmi
127 0 16.791239 low 2023.25 conductor 2 173.9900 61.46280 1960-02-16 1976-12-01 1910-09-27 1960.126 1976.917 1910.737 20.30316
200 0 19.958933 low 2448.68 bank 12 177.8000 73.48320 1956-12-16 1976-12-01 1909-06-18 1956.958 1976.917 1909.460 23.24473
198 0 19.958933 low 2281.38 bank 12 NA NA 1956-12-16 1976-12-01 1910-06-30 1956.958 1976.917 1910.493 NA
222 0 15.394935 low 2467.95 bank 2 158.7500 58.24224 1957-02-16 1972-07-10 1902-07-11 1957.126 1972.523 1902.523 23.11057
305 1 1.494867 low 2362.93 bank 1 NA NA 1960-01-16 1961-07-15 1913-06-30 1960.041 1961.534 1913.493 NA
173 0 15.958932 low 2380.11 conductor 12 164.4904 79.01712 1960-12-16 1976-12-01 1915-06-28 1960.958 1976.917 1915.487 29.20385

The rates plotted on timescale attained age show a clear increasing trend as age increases, which is to be expected (older persons are more likely to suffer from CHD). The rates plotted on timescale time-since-entry are almost constant (if you have some imagination you can see that the rates are flat).

(a)

scale <- 365.24
plot(bshazard(Surv((doe - dob)/scale, (dox - dob)/scale, chd) ~ 1,
              verbose=FALSE,
              data=subset(diet,hieng=="low")),
     ylim=c(0,0.03), conf.int=FALSE, xlab="Attained age (years)")
lines(bshazard(Surv((doe - dob)/scale, (dox - dob)/scale, chd) ~ 1,
               verbose=FALSE,
               data=subset(diet,hieng=="high")),
      col="red", conf.int=FALSE)
legend("topleft", legend=c('hieng=="low"','hieng=="high"'), col=1:2, lty=1, bty="n")

par(mfrow=1:2)
for (level in levels(diet$hieng))
    plot(bshazard(Surv((doe - dob)/scale, (dox - dob)/scale, chd) ~ 1,
                  verbose=FALSE,
                  data=subset(diet,hieng==level)),
         ylim=c(0,0.03), xlab="Attained age (years)",
         main=sprintf('hieng == "%s"', level))

if (requireNamespace("muhaz")) {
    scale <- 365.24
    plot(muhaz2(Surv((dox - doe)/scale, chd) ~ hieng, data=diet), lty=2,
         xlab="Time since study entry (years)", ylim=c(0,0.025),
         legend=FALSE)
    lines(bshazard(Surv((dox - doe)/scale, chd) ~ 1, v=F, data=subset(diet,hieng=="low")),
          conf.int=FALSE)
    lines(bshazard(Surv((dox - doe)/scale, chd) ~ 1, v=F, data=subset(diet,hieng=="high")),
          col="red", conf.int=FALSE)
    legend("topleft", legend=c('hieng=="low" (bshazard)','hieng=="high" (bshazard)',
                               'hieng=="low" (muhaz)','hieng=="high" (muhaz)'),
           col=1:2, lty=c(1,1,2,2), bty="n")
} 

scale <- 365.24
par(mfrow=1:2)
for (level in levels(diet$hieng))
    plot(bshazard(Surv((dox - doe)/scale, chd) ~ 1,
                  verbose=FALSE,
                  data=subset(diet,hieng==level)),
         ylim=c(0,0.03), xlab="Time since study entry (years)",
         main=sprintf('hieng == "%s"', level))

(b)

Patients with high energy intake have 48% less CHD rate. The underlying shape of the rates is assumed to be constant (i.e. the baseline is flat) over time.

poisson8b <- glm(chd ~ hieng + offset(log(y/1000)), family=poisson, data=diet)
summary(poisson8b)
## 
## Call:
## glm(formula = chd ~ hieng + offset(log(y/1000)), family = poisson, 
##     data = diet)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.6098     0.1890  13.811   <2e-16 ***
## hienghigh    -0.6532     0.3021  -2.162   0.0306 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 262.82  on 336  degrees of freedom
## Residual deviance: 258.00  on 335  degrees of freedom
## AIC: 354
## 
## Number of Fisher Scoring iterations: 6
broom::tidy(poisson8b, conf.int=TRUE, exponentiate=TRUE) # tidyverse (profile-based CIs)
## # A tibble: 2 × 7
##   term        estimate std.error statistic  p.value conf.low conf.high
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)   13.6       0.189     13.8  2.20e-43    9.16     19.3  
## 2 hienghigh      0.520     0.302     -2.16 3.06e- 2    0.283     0.933
biostat3::eform(poisson8b) # Wald-based CIs
##              exp(beta)     2.5 %     97.5 %
## (Intercept) 13.5959916 9.3877130 19.6907371
## hienghigh    0.5203599 0.2878432  0.9407012

(c)

The effect of high energy intake is slightly confounded by bmi and job, since the point estimate changes a little.

## Create BMI variable
diet <- transform(diet, bmi=weight/((height/100)^2))

## Create orderly varable instead of categorical, start at zero
levels(diet$job)
## [1] "driver"    "conductor" "bank"
diet <- transform(diet, jobNumber = unclass(job) - 1)

poisson8c <- glm(chd ~ hieng + jobNumber + bmi + offset(log(y/1000)),
                 family=poisson, data=diet)
summary(poisson8c)
## 
## Call:
## glm(formula = chd ~ hieng + jobNumber + bmi + offset(log(y/1000)), 
##     family = poisson, data = diet)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  1.49616    1.18209   1.266   0.2056  
## hienghigh   -0.69995    0.30986  -2.259   0.0239 *
## jobNumber   -0.08706    0.17170  -0.507   0.6121  
## bmi          0.05091    0.04757   1.070   0.2845  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 255.01  on 331  degrees of freedom
## Residual deviance: 249.03  on 328  degrees of freedom
##   (5 observations deleted due to missingness)
## AIC: 347.03
## 
## Number of Fisher Scoring iterations: 6
eform(poisson8c)
##             exp(beta)     2.5 %     97.5 %
## (Intercept) 4.4644913 0.4401212 45.2868008
## hienghigh   0.4966098 0.2705588  0.9115256
## jobNumber   0.9166234 0.6546985  1.2833364
## bmi         1.0522316 0.9585556  1.1550622
##########################################################

(d)

The y variable is not correct since it is kept for all splitted records, and contains the complete follow-up rather than the risktime in that specific timeband. The risktime variable contains the correct amount of risktime for each timeband. The event variable chd is not correct since it is kept constant for all splitted records, while it should only be 1 for the last record (if the person has the event). For all other records (timebands) for that person it should be 0.

## Split time at 30,50,60 and 72 with time scale age at entry to attained age
scale <- 365.24
age.cuts <- c(30,50,60,72)
diet.spl.dob <- survSplit(Surv((doe - dob)/scale, (dox - dob)/scale, chd) ~ .,
                          data=diet, cut=age.cuts,start="tstart",end="tstop")


## Tabulate ageband
diet.spl.dob |> subset(id<=3, select=c(id, tstart, tstop, y))
##     id   tstart    tstop        y
## 60   2 50.53937 60.00000 11.95893
## 61   2 60.00000 62.49863 11.95893
## 316  3 58.78600 60.00000 11.04175
## 317  3 60.00000 69.82806 11.04175
## 460  1 49.61669 50.00000 12.29295
## 461  1 50.00000 60.00000 12.29295
## 462  1 60.00000 61.90998 12.29295
## Create an agespan variable
diet.spl.dob <- transform(diet.spl.dob,
                          agespan = cut(tstop, age.cuts))

## Make the numeric variables factors since we want to model them with dummie variables and calculate time at risk
diet.spl.dob <- transform(diet.spl.dob,
                          jobNumber = as.factor(jobNumber),
                          risk_time = (tstop-tstart))

## Tabulate ageband including risk_time
diet.spl.dob |> subset(id<=3, select=c(id,  tstart, tstop, y,risk_time))
##     id   tstart    tstop        y  risk_time
## 60   2 50.53937 60.00000 11.95893  9.4606286
## 61   2 60.00000 62.49863 11.95893  2.4986310
## 316  3 58.78600 60.00000 11.04175  1.2139963
## 317  3 60.00000 69.82806 11.04175  9.8280583
## 460  1 49.61669 50.00000 12.29295  0.3833096
## 461  1 50.00000 60.00000 12.29295 10.0000000
## 462  1 60.00000 61.90998 12.29295  1.9099770
## Tabulate number of events per agespan
xtabs(~agespan+chd,diet.spl.dob)
##          chd
## agespan     0   1
##   (30,50] 190   6
##   (50,60] 275  18
##   (60,72] 218  22

The effect of high energy intake is somewhat confounded by age, but also confounded by job and bmi.

poisson8d <- glm( chd ~ hieng + agespan + offset( log( risk_time) ),
                 family=poisson,
                 data=diet.spl.dob)

summary(poisson8d)
## 
## Call:
## glm(formula = chd ~ hieng + agespan + offset(log(risk_time)), 
##     family = poisson, data = diet.spl.dob)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -4.7798     0.4320 -11.065   <2e-16 ***
## hienghigh       -0.6233     0.3026  -2.060   0.0394 *  
## agespan(50,60]   0.3025     0.4721   0.641   0.5217    
## agespan(60,72]   0.8451     0.4613   1.832   0.0670 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 321.05  on 728  degrees of freedom
## Residual deviance: 311.40  on 725  degrees of freedom
## AIC: 411.4
## 
## Number of Fisher Scoring iterations: 6
eform(poisson8d)
##                  exp(beta)       2.5 %     97.5 %
## (Intercept)    0.008397598 0.003601145 0.01958257
## hienghigh      0.536168916 0.296269765 0.97032212
## agespan(50,60] 1.353254649 0.536460711 3.41366685
## agespan(60,72] 2.328213936 0.942643672 5.75040208

Fitting the model for CHD, with adjustment for job and bmi.

poisson8d <- glm( chd ~ hieng + agespan + jobNumber + bmi + offset( log( risk_time) ),
                 family=poisson,
                 data=diet.spl.dob)

summary(poisson8d)
## 
## Call:
## glm(formula = chd ~ hieng + agespan + jobNumber + bmi + offset(log(risk_time)), 
##     family = poisson, data = diet.spl.dob)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -6.79183    1.31326  -5.172 2.32e-07 ***
## hienghigh      -0.71303    0.31387  -2.272   0.0231 *  
## agespan(50,60]  0.53692    0.50868   1.056   0.2912    
## agespan(60,72]  1.07421    0.49668   2.163   0.0306 *  
## jobNumber1      0.43510    0.40670   1.070   0.2847    
## jobNumber2     -0.13791    0.37184  -0.371   0.7107    
## bmi             0.07388    0.04852   1.523   0.1278    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 313.24  on 718  degrees of freedom
## Residual deviance: 298.77  on 712  degrees of freedom
##   (10 observations deleted due to missingness)
## AIC: 402.77
## 
## Number of Fisher Scoring iterations: 6
eform(poisson8d)
##                  exp(beta)        2.5 %     97.5 %
## (Intercept)    0.001122909 8.560401e-05 0.01472974
## hienghigh      0.490157708 2.649506e-01 0.90678996
## agespan(50,60] 1.710737260 6.312339e-01 4.63635138
## agespan(60,72] 2.927691811 1.105992e+00 7.74994872
## jobNumber1     1.545112410 6.962690e-01 3.42880771
## jobNumber2     0.871175509 4.203362e-01 1.80557060
## bmi            1.076677746 9.790156e-01 1.18408219

(e)

time.cuts <- c(0, 5, 10, 15, 22)
diet.spl.t_entry <- survSplit(Surv((dox-doe)/365.24, chd) ~ ., data=diet,
                              cut=time.cuts, end="tstop", start="tstart", event="chd")

##Tabulate ageband
diet.spl.t_entry |> subset(id<=3, select=c(id, tstart, tstop, y))
##     id tstart    tstop        y
## 93   2      0  5.00000 11.95893
## 94   2      5 10.00000 11.95893
## 95   2     10 11.95926 11.95893
## 468  3      0  5.00000 11.04175
## 469  3      5 10.00000 11.04175
## 470  3     10 11.04205 11.04175
## 687  1      0  5.00000 12.29295
## 688  1      5 10.00000 12.29295
## 689  1     10 12.29329 12.29295
diet.spl.t_entry <-
    transform(diet.spl.t_entry,
              fu = cut(tstop, time.cuts),
              risk_time = (tstop-tstart))

##Tabulate ageband including risk_time
diet.spl.t_entry |> subset(id<=3, select=c(id, fu, tstart, y, risk_time))
##     id      fu tstart        y risk_time
## 93   2   (0,5]      0 11.95893  5.000000
## 94   2  (5,10]      5 11.95893  5.000000
## 95   2 (10,15]     10 11.95893  1.959260
## 468  3   (0,5]      0 11.04175  5.000000
## 469  3  (5,10]      5 11.04175  5.000000
## 470  3 (10,15]     10 11.04175  1.042055
## 687  1   (0,5]      0 12.29295  5.000000
## 688  1  (5,10]      5 12.29295  5.000000
## 689  1 (10,15]     10 12.29295  2.293287
poisson8e1 <- glm(chd ~ fu + hieng + offset(log(risk_time)),
                 family=poisson,
                 data=diet.spl.t_entry)
summary(poisson8e1)
## 
## Call:
## glm(formula = chd ~ fu + hieng + offset(log(risk_time)), family = poisson, 
##     data = diet.spl.t_entry)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.25958    0.26933 -15.815   <2e-16 ***
## fu(5,10]    -0.23369    0.37705  -0.620   0.5354    
## fu(10,15]    0.12151    0.36841   0.330   0.7415    
## fu(15,22]   -0.05012    0.55562  -0.090   0.9281    
## hienghigh   -0.64923    0.30212  -2.149   0.0316 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 391.17  on 1099  degrees of freedom
## Residual deviance: 385.52  on 1095  degrees of freedom
## AIC: 487.52
## 
## Number of Fisher Scoring iterations: 6
eform(poisson8e1)
##              exp(beta)       2.5 %     97.5 %
## (Intercept) 0.01412826 0.008333566 0.02395225
## fu(5,10]    0.79160514 0.378061629 1.65750408
## fu(10,15]   1.12919965 0.548506063 2.32466319
## fu(15,22]   0.95111381 0.320099619 2.82604985
## hienghigh   0.52244901 0.288986787 0.94451712

There seems to be no confounding by time-since-entry, but there is confounding by BMI and job.

poisson8e2 <- glm(chd ~ fu + hieng + job + bmi + offset(log(risk_time)),
                 family=poisson,
                 data=diet.spl.t_entry)
summary(poisson8e2)
## 
## Call:
## glm(formula = chd ~ fu + hieng + job + bmi + offset(log(risk_time)), 
##     family = poisson, data = diet.spl.t_entry)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -6.02331    1.26284  -4.770 1.85e-06 ***
## fu(5,10]     -0.16826    0.38194  -0.441   0.6595    
## fu(10,15]     0.21932    0.37484   0.585   0.5585    
## fu(15,22]     0.13312    0.56448   0.236   0.8136    
## hienghigh    -0.71425    0.31171  -2.291   0.0219 *  
## jobconductor  0.46008    0.40646   1.132   0.2577    
## jobbank      -0.13790    0.37261  -0.370   0.7113    
## bmi           0.06876    0.04872   1.411   0.1581    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 383.36  on 1083  degrees of freedom
## Residual deviance: 374.22  on 1076  degrees of freedom
##   (16 observations deleted due to missingness)
## AIC: 480.22
## 
## Number of Fisher Scoring iterations: 6
eform(poisson8e2)
##                exp(beta)        2.5 %     97.5 %
## (Intercept)  0.002421633 0.0002037867 0.02877669
## fu(5,10]     0.845132733 0.3997784786 1.78661277
## fu(10,15]    1.245226372 0.5972864235 2.59605552
## fu(15,22]    1.142385874 0.3778543787 3.45383184
## hienghigh    0.489559573 0.2657536762 0.90184482
## jobconductor 1.584205119 0.7142218469 3.51390240
## jobbank      0.871181912 0.4197020639 1.80832545
## bmi          1.071175374 0.9736261715 1.17849819