Exercise 8. Diet data: Using Poisson regression to study the effect of energy intake adjusting for confounders on two different timescales
Use Poisson regression to study the association between energy intake (hieng) and CHD adjusted for potential confounders (job, BMI). We know that people who expend a lot of energy (i.e., are physically active) require a higher energy intake. We do not have data on physical activity but we are hoping that occupation (job) will serve as a surrogate measure of work-time physical activity (conductors on London double-decker busses expend energy walking up and down the stairs all day).
Fit models both without adjusting for ‘time’ and by adjusting for attained age (you will need to split the data) and time-since-entry and compare the results.
Load the diet data using time-on-study as the timescale.
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
Load diet data and explore it.
The rates plotted on timescale attained age show a clear increasing trend as age increases, which is to be expected (older persons are more likely to suffer from CHD). The rates plotted on timescale time-since-entry are almost constant (if you have some imagination you can see that the rates are flat).
(a)
scale <- 365.24
plot(bshazard(Surv((doe - dob)/scale, (dox - dob)/scale, chd) ~ 1, data=subset(diet,hieng=="low")),
ylim=c(0,0.03), conf.int=FALSE, xlab="Attained age (years)")
## Iterations: relative error in phi-hat = 1e-04
## phi= 1.779334 sv2= 0.03760323 df= 6.726742 lambda= 47.31865
## phi= 1.925449 sv2= 0.01259691 df= 4.929914 lambda= 152.8508
## phi= 1.961624 sv2= 0.005470211 df= 3.958756 lambda= 358.6012
## phi= 1.947998 sv2= 0.003330537 df= 3.410114 lambda= 584.8903
## phi= 1.926782 sv2= 0.002698311 df= 3.147468 lambda= 714.0697
## phi= 1.916927 sv2= 0.002494673 df= 3.04987 lambda= 768.4084
## phi= 1.913296 sv2= 0.002424179 df= 3.015274 lambda= 789.2554
## phi= 1.911978 sv2= 0.002398892 df= 3.002812 lambda= 797.0257
## phi= 1.911498 sv2= 0.00238969 df= 2.998273 lambda= 799.8934
## phi= 1.911322 sv2= 0.002386324 df= 2.996612 lambda= 800.9481
## phi= 1.911257 sv2= 0.00238509 df= 2.996003 lambda= 801.3354
## phi= 1.911233 sv2= 0.002384637 df= 2.995779 lambda= 801.4776
## phi= 1.911225 sv2= 0.002384471 df= 2.995697 lambda= 801.5298
lines(bshazard(Surv((doe - dob)/scale, (dox - dob)/scale, chd) ~ 1, data=subset(diet,hieng=="high")),
col="red", conf.int=FALSE)
## Iterations: relative error in phi-hat = 1e-04
## phi= 2.153541 sv2= 0.02593808 df= 6.269186 lambda= 83.02622
## phi= 2.409863 sv2= 0.006238357 df= 4.163755 lambda= 386.2978
## phi= 2.698454 sv2= 0.00171257 df= 3.170399 lambda= 1575.675
## phi= 2.91647 sv2= 0.0005349335 df= 2.553115 lambda= 5452.023
## phi= 3.040772 sv2= 0.0001886201 df= 2.220458 lambda= 16121.14
scale <- 365.24
plot(muhaz2(Surv((dox - doe)/scale, chd) ~ hieng, data=diet), lty=2,
xlab="Time since study entry (years)", ylim=c(0,0.025),
legend=FALSE)
lines(bshazard(Surv((dox - doe)/scale, chd) ~ 1, data=subset(diet,hieng=="low")),
conf.int=FALSE)
## Iterations: relative error in phi-hat = 1e-04
## phi= 2.235959 sv2= 0.07486988 df= 6.59953 lambda= 29.86459
## phi= 2.417459 sv2= 0.02283781 df= 5.288999 lambda= 105.8534
## phi= 2.562882 sv2= 0.004970585 df= 4.113921 lambda= 515.6097
## phi= 2.775805 sv2= 0.0009967037 df= 3.051224 lambda= 2784.986
## phi= 3.104534 sv2= 0.0002592808 df= 2.34643 lambda= 11973.64
lines(bshazard(Surv((dox - doe)/scale, chd) ~ 1, data=subset(diet,hieng=="high")),
col="red", conf.int=FALSE)
## Iterations: relative error in phi-hat = 1e-04
## phi= 0.9665759 sv2= 0.02663575 df= 5.684564 lambda= 36.28866
## phi= 1.016078 sv2= 0.006372641 df= 4.404778 lambda= 159.4438
## phi= 1.053943 sv2= 0.002188221 df= 3.358888 lambda= 481.644
## phi= 1.09428 sv2= 0.001338729 df= 2.80305 lambda= 817.4017
## phi= 1.119151 sv2= 0.001015902 df= 2.592819 lambda= 1101.633
## phi= 1.133828 sv2= 0.0008474499 df= 2.49013 lambda= 1337.929
## phi= 1.143282 sv2= 0.00074534 df= 2.429672 lambda= 1533.906
## phi= 1.149776 sv2= 0.0006777671 df= 2.390227 lambda= 1696.418
## phi= 1.154442 sv2= 0.0006304254 df= 2.362804 lambda= 1831.212
## phi= 1.157905 sv2= 0.0005959215 df= 2.342908 lambda= 1943.05
## phi= 1.160537 sv2= 0.0005700457 df= 2.328029 lambda= 2035.867
## phi= 1.162574 sv2= 0.0005502227 df= 2.316652 lambda= 2112.915
## phi= 1.164172 sv2= 0.0005347879 df= 2.307805 lambda= 2176.886
## phi= 1.16544 sv2= 0.0005226174 df= 2.300835 lambda= 2230.007
## phi= 1.166455 sv2= 0.0005129252 df= 2.295288 lambda= 2274.123
## phi= 1.167273 sv2= 0.0005051456 df= 2.290838 lambda= 2310.765
## phi= 1.167936 sv2= 0.0004988616 df= 2.287245 lambda= 2341.202
## phi= 1.168475 sv2= 0.0004937596 df= 2.284328 lambda= 2366.485
## phi= 1.168916 sv2= 0.0004896003 df= 2.28195 lambda= 2387.49
## phi= 1.169277 sv2= 0.0004861979 df= 2.280006 lambda= 2404.94
## phi= 1.169573 sv2= 0.0004834071 df= 2.278411 lambda= 2419.437
## phi= 1.169817 sv2= 0.0004811128 df= 2.277101 lambda= 2431.482
## phi= 1.170018 sv2= 0.0004792231 df= 2.276021 lambda= 2441.49
## phi= 1.170185 sv2= 0.0004776643 df= 2.27513 lambda= 2449.805
## phi= 1.170322 sv2= 0.0004763768 df= 2.274395 lambda= 2456.715
## phi= 1.170435 sv2= 0.0004753124 df= 2.273787 lambda= 2462.455
## phi= 1.170529 sv2= 0.0004744315 df= 2.273284 lambda= 2467.225
## phi= 1.170607 sv2= 0.0004737021 df= 2.272867 lambda= 2471.189
## phi= 1.170672 sv2= 0.0004730977 df= 2.272522 lambda= 2474.482
## phi= 1.170725 sv2= 0.0004725967 df= 2.272236 lambda= 2477.219
## phi= 1.17077 sv2= 0.0004721812 df= 2.271999 lambda= 2479.493
## phi= 1.170807 sv2= 0.0004718365 df= 2.271802 lambda= 2481.382
## phi= 1.170837 sv2= 0.0004715505 df= 2.271638 lambda= 2482.952
## phi= 1.170863 sv2= 0.0004713131 df= 2.271503 lambda= 2484.257
## phi= 1.170884 sv2= 0.000471116 df= 2.27139 lambda= 2485.341
## phi= 1.170901 sv2= 0.0004709523 df= 2.271297 lambda= 2486.242
## phi= 1.170916 sv2= 0.0004708164 df= 2.271219 lambda= 2486.99
## phi= 1.170928 sv2= 0.0004707035 df= 2.271155 lambda= 2487.612
## phi= 1.170938 sv2= 0.0004706098 df= 2.271101 lambda= 2488.129
## phi= 1.170946 sv2= 0.0004705319 df= 2.271057 lambda= 2488.559
## phi= 1.170953 sv2= 0.0004704672 df= 2.27102 lambda= 2488.915
## phi= 1.170959 sv2= 0.0004704135 df= 2.270989 lambda= 2489.212
## phi= 1.170964 sv2= 0.0004703688 df= 2.270963 lambda= 2489.458
(b)
Patients with high energy intake have 48% less CHD rate. The underlying shape of the rates is assumed to be constant (i.e. the baseline is flat) over time.
diet <- mutate(diet, y1k = y / 1000)
poisson8b <- glm( chd ~ hieng + offset( log( y1k ) ), family=poisson, data=diet)
summary(poisson8b)
##
## Call:
## glm(formula = chd ~ hieng + offset(log(y1k)), family = poisson,
## data = diet)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7382 -0.6337 -0.4899 -0.3891 3.0161
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.6098 0.1890 13.811 <2e-16 ***
## hienghigh -0.6532 0.3021 -2.162 0.0306 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 262.82 on 336 degrees of freedom
## Residual deviance: 258.00 on 335 degrees of freedom
## AIC: 354
##
## Number of Fisher Scoring iterations: 6
## exp(beta) 2.5 % 97.5 %
## (Intercept) 13.5959916 9.3877130 19.6907371
## hienghigh 0.5203599 0.2878432 0.9407012
(c)
The effect of high energy intake is slightly confounded by bmi and job, since the point estimate changes a little.
## Create BMI variable
diet$bmi <- diet$weight/((diet$height/100)^2)
## Create orderly varable instead of categorical, start at zero
levels(diet$job)
## [1] "driver" "conductor" "bank"
diet <- mutate(diet, jobNumber = unclass(job) - 1)
poisson8c <- glm( chd ~ hieng + jobNumber + bmi + offset( log( y1k ) ), family=poisson, data=diet)
summary(poisson8c)
##
## Call:
## glm(formula = chd ~ hieng + jobNumber + bmi + offset(log(y1k)),
## family = poisson, data = diet)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.8376 -0.6007 -0.4815 -0.3919 3.0614
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.49616 1.18209 1.266 0.2056
## hienghigh -0.69995 0.30986 -2.259 0.0239 *
## jobNumber -0.08706 0.17170 -0.507 0.6121
## bmi 0.05091 0.04757 1.070 0.2845
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 255.01 on 331 degrees of freedom
## Residual deviance: 249.03 on 328 degrees of freedom
## (5 observations deleted due to missingness)
## AIC: 347.03
##
## Number of Fisher Scoring iterations: 6
## exp(beta) 2.5 % 97.5 %
## (Intercept) 4.4644913 0.4401212 45.2868008
## hienghigh 0.4966098 0.2705588 0.9115256
## jobNumber 0.9166234 0.6546985 1.2833364
## bmi 1.0522316 0.9585556 1.1550622
(d)
The y variable is not correct since it is kept for all splitted records, and contains the complete follow-up rather than the risktime in that specific timeband. The risktime variable contains the correct amount of risktime for each timeband. The event variable chd is not correct since it is kept constant for all splitted records, while it should only be 1 for the last record (if the person has the event). For all other records (timebands) for that person it should be 0.
## Split time at 30,50,60 and 72 with time scale age at entry to attained age
scale <- 365.24
age.cuts <- c(30,50,60,72)
diet.spl.dob <- survSplit(Surv((doe - dob)/scale, (dox - dob)/scale, chd) ~ ., data=diet, cut=age.cuts,start="tstart",end="tstop")
## Tabulate ageband
diet.spl.dob %>% select(id, tstart, tstop, y) %>% filter(id<=3) %>% arrange(id, tstart)
## id tstart tstop y
## 1 1 49.61669 50.00000 12.29295
## 2 1 50.00000 60.00000 12.29295
## 3 1 60.00000 61.90998 12.29295
## 4 2 50.53937 60.00000 11.95893
## 5 2 60.00000 62.49863 11.95893
## 6 3 58.78600 60.00000 11.04175
## 7 3 60.00000 69.82806 11.04175
## Create an agespan variable
diet.spl.dob <- mutate(diet.spl.dob,
agespan = cut(tstop, age.cuts))
## Make the numeric variables factors since we want to model them with dummie variables and calculate time at risk
diet.spl.dob <- mutate(diet.spl.dob,
jobNumber = as.factor(jobNumber),
risk_time = (tstop-tstart))
## Tabulate ageband including risk_time
diet.spl.dob %>% select(id, tstart, tstop, y,risk_time) %>% filter(id<=3) %>% arrange(id, tstart)
## id tstart tstop y risk_time
## 1 1 49.61669 50.00000 12.29295 0.3833096
## 2 1 50.00000 60.00000 12.29295 10.0000000
## 3 1 60.00000 61.90998 12.29295 1.9099770
## 4 2 50.53937 60.00000 11.95893 9.4606286
## 5 2 60.00000 62.49863 11.95893 2.4986310
## 6 3 58.78600 60.00000 11.04175 1.2139963
## 7 3 60.00000 69.82806 11.04175 9.8280583
## chd
## agespan 0 1
## (30,50] 190 6
## (50,60] 275 18
## (60,72] 218 22
The effect of high energy intake is somewhat confounded by age, but also confounded by job and bmi.
poisson8d <- glm( chd ~ hieng + agespan + offset( log( risk_time) ),
family=poisson,
data=diet.spl.dob)
summary(poisson8d)
##
## Call:
## glm(formula = chd ~ hieng + agespan + offset(log(risk_time)),
## family = poisson, data = diet.spl.dob)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6253 -0.4312 -0.3246 -0.2050 2.8938
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.7798 0.4320 -11.065 <2e-16 ***
## hienghigh -0.6233 0.3026 -2.060 0.0394 *
## agespan(50,60] 0.3025 0.4721 0.641 0.5217
## agespan(60,72] 0.8451 0.4613 1.832 0.0670 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 321.05 on 728 degrees of freedom
## Residual deviance: 311.40 on 725 degrees of freedom
## AIC: 411.4
##
## Number of Fisher Scoring iterations: 6
## exp(beta) 2.5 % 97.5 %
## (Intercept) 0.008397598 0.003601145 0.01958257
## hienghigh 0.536168916 0.296269765 0.97032212
## agespan(50,60] 1.353254649 0.536460711 3.41366685
## agespan(60,72] 2.328213936 0.942643672 5.75040208
Fitting the model for CHD, with adjustment for job and bmi.
poisson8d <- glm( chd ~ hieng + agespan + jobNumber + bmi + offset( log( risk_time) ),
family=poisson,
data=diet.spl.dob)
summary(poisson8d)
##
## Call:
## glm(formula = chd ~ hieng + agespan + jobNumber + bmi + offset(log(risk_time)),
## family = poisson, data = diet.spl.dob)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.7714 -0.4027 -0.3083 -0.1940 3.0174
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.79183 1.31326 -5.172 2.32e-07 ***
## hienghigh -0.71303 0.31387 -2.272 0.0231 *
## agespan(50,60] 0.53692 0.50868 1.056 0.2912
## agespan(60,72] 1.07421 0.49668 2.163 0.0306 *
## jobNumber1 0.43510 0.40670 1.070 0.2847
## jobNumber2 -0.13791 0.37184 -0.371 0.7107
## bmi 0.07388 0.04852 1.523 0.1278
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 313.24 on 718 degrees of freedom
## Residual deviance: 298.77 on 712 degrees of freedom
## (10 observations deleted due to missingness)
## AIC: 402.77
##
## Number of Fisher Scoring iterations: 6
## exp(beta) 2.5 % 97.5 %
## (Intercept) 0.001122909 8.560401e-05 0.01472974
## hienghigh 0.490157708 2.649506e-01 0.90678996
## agespan(50,60] 1.710737260 6.312339e-01 4.63635138
## agespan(60,72] 2.927691811 1.105992e+00 7.74994872
## jobNumber1 1.545112410 6.962690e-01 3.42880771
## jobNumber2 0.871175509 4.203362e-01 1.80557060
## bmi 1.076677746 9.790156e-01 1.18408219
(e)
time.cuts <- c(0, 5, 10, 15, 22)
diet.spl.t_entry <- survSplit(Surv((dox-doe)/365.24, chd) ~ ., data=diet, cut=time.cuts, end="tstop", start="tstart", event="chd")
##Tabulate ageband
diet.spl.t_entry %>% select(id, tstart, tstop, y) %>% filter(id<=3) %>% arrange(id, tstart)
## id tstart tstop y
## 1 1 0 5.00000 12.29295
## 2 1 5 10.00000 12.29295
## 3 1 10 12.29329 12.29295
## 4 2 0 5.00000 11.95893
## 5 2 5 10.00000 11.95893
## 6 2 10 11.95926 11.95893
## 7 3 0 5.00000 11.04175
## 8 3 5 10.00000 11.04175
## 9 3 10 11.04205 11.04175
diet.spl.t_entry <- mutate(diet.spl.t_entry,
fu = cut(tstop, time.cuts),
risk_time = (tstop-tstart))
##Tabulate ageband including risk_time
diet.spl.t_entry %>% select(id, fu, tstart, y, risk_time) %>% filter(id<=3) %>% arrange(id, tstart)
## id fu tstart y risk_time
## 1 1 (0,5] 0 12.29295 5.000000
## 2 1 (5,10] 5 12.29295 5.000000
## 3 1 (10,15] 10 12.29295 2.293287
## 4 2 (0,5] 0 11.95893 5.000000
## 5 2 (5,10] 5 11.95893 5.000000
## 6 2 (10,15] 10 11.95893 1.959260
## 7 3 (0,5] 0 11.04175 5.000000
## 8 3 (5,10] 5 11.04175 5.000000
## 9 3 (10,15] 10 11.04175 1.042055
poisson8e1 <- glm( chd ~ fu + hieng + offset( log( risk_time) ),
family=poisson,
data=diet.spl.t_entry )
summary(poisson8e1)
##
## Call:
## glm(formula = chd ~ fu + hieng + offset(log(risk_time)), family = poisson,
## data = diet.spl.t_entry)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3994 -0.3344 -0.2717 -0.2417 4.2937
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.25958 0.26933 -15.815 <2e-16 ***
## fu(5,10] -0.23369 0.37705 -0.620 0.5354
## fu(10,15] 0.12151 0.36841 0.330 0.7415
## fu(15,22] -0.05012 0.55562 -0.090 0.9281
## hienghigh -0.64923 0.30212 -2.149 0.0316 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 391.17 on 1099 degrees of freedom
## Residual deviance: 385.52 on 1095 degrees of freedom
## AIC: 487.52
##
## Number of Fisher Scoring iterations: 6
## exp(beta) 2.5 % 97.5 %
## (Intercept) 0.01412826 0.008333566 0.02395225
## fu(5,10] 0.79160514 0.378061629 1.65750408
## fu(10,15] 1.12919965 0.548506063 2.32466319
## fu(15,22] 0.95111381 0.320099619 2.82604985
## hienghigh 0.52244901 0.288986787 0.94451712
There seems to be no confounding by time-since-entry, but there is confounding by bmi and job.