Exercise 8. Diet data: Using Poisson regression to study the effect of energy intake adjusting for confounders on two different timescales

Use Poisson regression to study the association between energy intake (hieng) and CHD adjusted for potential confounders (job, BMI). We know that people who expend a lot of energy (i.e., are physically active) require a higher energy intake. We do not have data on physical activity but we are hoping that occupation (job) will serve as a surrogate measure of work-time physical activity (conductors on London double-decker busses expend energy walking up and down the stairs all day).

Fit models both without adjusting for ‘time’ and by adjusting for attained age (you will need to split the data) and time-since-entry and compare the results.


Load the diet data using time-on-study as the timescale.

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

Load diet data and explore it.

The rates plotted on timescale attained age show a clear increasing trend as age increases, which is to be expected (older persons are more likely to suffer from CHD). The rates plotted on timescale time-since-entry are almost constant (if you have some imagination you can see that the rates are flat).

(a)

## Iterations: relative error in phi-hat = 1e-04 
## phi= 1.779334   sv2= 0.03760323   df= 6.726742   lambda= 47.31865 
## phi= 1.925449   sv2= 0.01259691   df= 4.929914   lambda= 152.8508 
## phi= 1.961624   sv2= 0.005470211   df= 3.958756   lambda= 358.6012 
## phi= 1.947998   sv2= 0.003330537   df= 3.410114   lambda= 584.8903 
## phi= 1.926782   sv2= 0.002698311   df= 3.147468   lambda= 714.0697 
## phi= 1.916927   sv2= 0.002494673   df= 3.04987   lambda= 768.4084 
## phi= 1.913296   sv2= 0.002424179   df= 3.015274   lambda= 789.2554 
## phi= 1.911978   sv2= 0.002398892   df= 3.002812   lambda= 797.0257 
## phi= 1.911498   sv2= 0.00238969   df= 2.998273   lambda= 799.8934 
## phi= 1.911322   sv2= 0.002386324   df= 2.996612   lambda= 800.9481 
## phi= 1.911257   sv2= 0.00238509   df= 2.996003   lambda= 801.3354 
## phi= 1.911233   sv2= 0.002384637   df= 2.995779   lambda= 801.4776 
## phi= 1.911225   sv2= 0.002384471   df= 2.995697   lambda= 801.5298
## Iterations: relative error in phi-hat = 1e-04 
## phi= 2.153541   sv2= 0.02593808   df= 6.269186   lambda= 83.02622 
## phi= 2.409863   sv2= 0.006238357   df= 4.163755   lambda= 386.2978 
## phi= 2.698454   sv2= 0.00171257   df= 3.170399   lambda= 1575.675 
## phi= 2.91647   sv2= 0.0005349335   df= 2.553115   lambda= 5452.023 
## phi= 3.040772   sv2= 0.0001886201   df= 2.220458   lambda= 16121.14

## Iterations: relative error in phi-hat = 1e-04 
## phi= 2.235959   sv2= 0.07486988   df= 6.59953   lambda= 29.86459 
## phi= 2.417459   sv2= 0.02283781   df= 5.288999   lambda= 105.8534 
## phi= 2.562882   sv2= 0.004970585   df= 4.113921   lambda= 515.6097 
## phi= 2.775805   sv2= 0.0009967037   df= 3.051224   lambda= 2784.986 
## phi= 3.104534   sv2= 0.0002592808   df= 2.34643   lambda= 11973.64
## Iterations: relative error in phi-hat = 1e-04 
## phi= 0.9665759   sv2= 0.02663575   df= 5.684564   lambda= 36.28866 
## phi= 1.016078   sv2= 0.006372641   df= 4.404778   lambda= 159.4438 
## phi= 1.053943   sv2= 0.002188221   df= 3.358888   lambda= 481.644 
## phi= 1.09428   sv2= 0.001338729   df= 2.80305   lambda= 817.4017 
## phi= 1.119151   sv2= 0.001015902   df= 2.592819   lambda= 1101.633 
## phi= 1.133828   sv2= 0.0008474499   df= 2.49013   lambda= 1337.929 
## phi= 1.143282   sv2= 0.00074534   df= 2.429672   lambda= 1533.906 
## phi= 1.149776   sv2= 0.0006777671   df= 2.390227   lambda= 1696.418 
## phi= 1.154442   sv2= 0.0006304254   df= 2.362804   lambda= 1831.212 
## phi= 1.157905   sv2= 0.0005959215   df= 2.342908   lambda= 1943.05 
## phi= 1.160537   sv2= 0.0005700457   df= 2.328029   lambda= 2035.867 
## phi= 1.162574   sv2= 0.0005502227   df= 2.316652   lambda= 2112.915 
## phi= 1.164172   sv2= 0.0005347879   df= 2.307805   lambda= 2176.886 
## phi= 1.16544   sv2= 0.0005226174   df= 2.300835   lambda= 2230.007 
## phi= 1.166455   sv2= 0.0005129252   df= 2.295288   lambda= 2274.123 
## phi= 1.167273   sv2= 0.0005051456   df= 2.290838   lambda= 2310.765 
## phi= 1.167936   sv2= 0.0004988616   df= 2.287245   lambda= 2341.202 
## phi= 1.168475   sv2= 0.0004937596   df= 2.284328   lambda= 2366.485 
## phi= 1.168916   sv2= 0.0004896003   df= 2.28195   lambda= 2387.49 
## phi= 1.169277   sv2= 0.0004861979   df= 2.280006   lambda= 2404.94 
## phi= 1.169573   sv2= 0.0004834071   df= 2.278411   lambda= 2419.437 
## phi= 1.169817   sv2= 0.0004811128   df= 2.277101   lambda= 2431.482 
## phi= 1.170018   sv2= 0.0004792231   df= 2.276021   lambda= 2441.49 
## phi= 1.170185   sv2= 0.0004776643   df= 2.27513   lambda= 2449.805 
## phi= 1.170322   sv2= 0.0004763768   df= 2.274395   lambda= 2456.715 
## phi= 1.170435   sv2= 0.0004753124   df= 2.273787   lambda= 2462.455 
## phi= 1.170529   sv2= 0.0004744315   df= 2.273284   lambda= 2467.225 
## phi= 1.170607   sv2= 0.0004737021   df= 2.272867   lambda= 2471.189 
## phi= 1.170672   sv2= 0.0004730977   df= 2.272522   lambda= 2474.482 
## phi= 1.170725   sv2= 0.0004725967   df= 2.272236   lambda= 2477.219 
## phi= 1.17077   sv2= 0.0004721812   df= 2.271999   lambda= 2479.493 
## phi= 1.170807   sv2= 0.0004718365   df= 2.271802   lambda= 2481.382 
## phi= 1.170837   sv2= 0.0004715505   df= 2.271638   lambda= 2482.952 
## phi= 1.170863   sv2= 0.0004713131   df= 2.271503   lambda= 2484.257 
## phi= 1.170884   sv2= 0.000471116   df= 2.27139   lambda= 2485.341 
## phi= 1.170901   sv2= 0.0004709523   df= 2.271297   lambda= 2486.242 
## phi= 1.170916   sv2= 0.0004708164   df= 2.271219   lambda= 2486.99 
## phi= 1.170928   sv2= 0.0004707035   df= 2.271155   lambda= 2487.612 
## phi= 1.170938   sv2= 0.0004706098   df= 2.271101   lambda= 2488.129 
## phi= 1.170946   sv2= 0.0004705319   df= 2.271057   lambda= 2488.559 
## phi= 1.170953   sv2= 0.0004704672   df= 2.27102   lambda= 2488.915 
## phi= 1.170959   sv2= 0.0004704135   df= 2.270989   lambda= 2489.212 
## phi= 1.170964   sv2= 0.0004703688   df= 2.270963   lambda= 2489.458

(b)

Patients with high energy intake have 48% less CHD rate. The underlying shape of the rates is assumed to be constant (i.e. the baseline is flat) over time.

## 
## Call:
## glm(formula = chd ~ hieng + offset(log(y1k)), family = poisson, 
##     data = diet)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7382  -0.6337  -0.4899  -0.3891   3.0161  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.6098     0.1890  13.811   <2e-16 ***
## hienghigh    -0.6532     0.3021  -2.162   0.0306 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 262.82  on 336  degrees of freedom
## Residual deviance: 258.00  on 335  degrees of freedom
## AIC: 354
## 
## Number of Fisher Scoring iterations: 6
##              exp(beta)     2.5 %     97.5 %
## (Intercept) 13.5959916 9.3877130 19.6907371
## hienghigh    0.5203599 0.2878432  0.9407012

(c)

The effect of high energy intake is slightly confounded by bmi and job, since the point estimate changes a little.

## [1] "driver"    "conductor" "bank"
## 
## Call:
## glm(formula = chd ~ hieng + jobNumber + bmi + offset(log(y1k)), 
##     family = poisson, data = diet)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.8376  -0.6007  -0.4815  -0.3919   3.0614  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  1.49616    1.18209   1.266   0.2056  
## hienghigh   -0.69995    0.30986  -2.259   0.0239 *
## jobNumber   -0.08706    0.17170  -0.507   0.6121  
## bmi          0.05091    0.04757   1.070   0.2845  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 255.01  on 331  degrees of freedom
## Residual deviance: 249.03  on 328  degrees of freedom
##   (5 observations deleted due to missingness)
## AIC: 347.03
## 
## Number of Fisher Scoring iterations: 6
##             exp(beta)     2.5 %     97.5 %
## (Intercept) 4.4644913 0.4401212 45.2868008
## hienghigh   0.4966098 0.2705588  0.9115256
## jobNumber   0.9166234 0.6546985  1.2833364
## bmi         1.0522316 0.9585556  1.1550622

(d)

The y variable is not correct since it is kept for all splitted records, and contains the complete follow-up rather than the risktime in that specific timeband. The risktime variable contains the correct amount of risktime for each timeband. The event variable chd is not correct since it is kept constant for all splitted records, while it should only be 1 for the last record (if the person has the event). For all other records (timebands) for that person it should be 0.

##   id   tstart    tstop        y
## 1  1 49.61669 50.00000 12.29295
## 2  1 50.00000 60.00000 12.29295
## 3  1 60.00000 61.90998 12.29295
## 4  2 50.53937 60.00000 11.95893
## 5  2 60.00000 62.49863 11.95893
## 6  3 58.78600 60.00000 11.04175
## 7  3 60.00000 69.82806 11.04175
##   id   tstart    tstop        y  risk_time
## 1  1 49.61669 50.00000 12.29295  0.3833096
## 2  1 50.00000 60.00000 12.29295 10.0000000
## 3  1 60.00000 61.90998 12.29295  1.9099770
## 4  2 50.53937 60.00000 11.95893  9.4606286
## 5  2 60.00000 62.49863 11.95893  2.4986310
## 6  3 58.78600 60.00000 11.04175  1.2139963
## 7  3 60.00000 69.82806 11.04175  9.8280583
##          chd
## agespan     0   1
##   (30,50] 190   6
##   (50,60] 275  18
##   (60,72] 218  22

The effect of high energy intake is somewhat confounded by age, but also confounded by job and bmi.

## 
## Call:
## glm(formula = chd ~ hieng + agespan + offset(log(risk_time)), 
##     family = poisson, data = diet.spl.dob)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.6253  -0.4312  -0.3246  -0.2050   2.8938  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -4.7798     0.4320 -11.065   <2e-16 ***
## hienghigh       -0.6233     0.3026  -2.060   0.0394 *  
## agespan(50,60]   0.3025     0.4721   0.641   0.5217    
## agespan(60,72]   0.8451     0.4613   1.832   0.0670 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 321.05  on 728  degrees of freedom
## Residual deviance: 311.40  on 725  degrees of freedom
## AIC: 411.4
## 
## Number of Fisher Scoring iterations: 6
##                  exp(beta)       2.5 %     97.5 %
## (Intercept)    0.008397598 0.003601145 0.01958257
## hienghigh      0.536168916 0.296269765 0.97032212
## agespan(50,60] 1.353254649 0.536460711 3.41366685
## agespan(60,72] 2.328213936 0.942643672 5.75040208

Fitting the model for CHD, with adjustment for job and bmi.

## 
## Call:
## glm(formula = chd ~ hieng + agespan + jobNumber + bmi + offset(log(risk_time)), 
##     family = poisson, data = diet.spl.dob)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7714  -0.4027  -0.3083  -0.1940   3.0174  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -6.79183    1.31326  -5.172 2.32e-07 ***
## hienghigh      -0.71303    0.31387  -2.272   0.0231 *  
## agespan(50,60]  0.53692    0.50868   1.056   0.2912    
## agespan(60,72]  1.07421    0.49668   2.163   0.0306 *  
## jobNumber1      0.43510    0.40670   1.070   0.2847    
## jobNumber2     -0.13791    0.37184  -0.371   0.7107    
## bmi             0.07388    0.04852   1.523   0.1278    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 313.24  on 718  degrees of freedom
## Residual deviance: 298.77  on 712  degrees of freedom
##   (10 observations deleted due to missingness)
## AIC: 402.77
## 
## Number of Fisher Scoring iterations: 6
##                  exp(beta)        2.5 %     97.5 %
## (Intercept)    0.001122909 8.560401e-05 0.01472974
## hienghigh      0.490157708 2.649506e-01 0.90678996
## agespan(50,60] 1.710737260 6.312339e-01 4.63635138
## agespan(60,72] 2.927691811 1.105992e+00 7.74994872
## jobNumber1     1.545112410 6.962690e-01 3.42880771
## jobNumber2     0.871175509 4.203362e-01 1.80557060
## bmi            1.076677746 9.790156e-01 1.18408219

(e)

##   id tstart    tstop        y
## 1  1      0  5.00000 12.29295
## 2  1      5 10.00000 12.29295
## 3  1     10 12.29329 12.29295
## 4  2      0  5.00000 11.95893
## 5  2      5 10.00000 11.95893
## 6  2     10 11.95926 11.95893
## 7  3      0  5.00000 11.04175
## 8  3      5 10.00000 11.04175
## 9  3     10 11.04205 11.04175
##   id      fu tstart        y risk_time
## 1  1   (0,5]      0 12.29295  5.000000
## 2  1  (5,10]      5 12.29295  5.000000
## 3  1 (10,15]     10 12.29295  2.293287
## 4  2   (0,5]      0 11.95893  5.000000
## 5  2  (5,10]      5 11.95893  5.000000
## 6  2 (10,15]     10 11.95893  1.959260
## 7  3   (0,5]      0 11.04175  5.000000
## 8  3  (5,10]      5 11.04175  5.000000
## 9  3 (10,15]     10 11.04175  1.042055
## 
## Call:
## glm(formula = chd ~ fu + hieng + offset(log(risk_time)), family = poisson, 
##     data = diet.spl.t_entry)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.3994  -0.3344  -0.2717  -0.2417   4.2937  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.25958    0.26933 -15.815   <2e-16 ***
## fu(5,10]    -0.23369    0.37705  -0.620   0.5354    
## fu(10,15]    0.12151    0.36841   0.330   0.7415    
## fu(15,22]   -0.05012    0.55562  -0.090   0.9281    
## hienghigh   -0.64923    0.30212  -2.149   0.0316 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 391.17  on 1099  degrees of freedom
## Residual deviance: 385.52  on 1095  degrees of freedom
## AIC: 487.52
## 
## Number of Fisher Scoring iterations: 6
##              exp(beta)       2.5 %     97.5 %
## (Intercept) 0.01412826 0.008333566 0.02395225
## fu(5,10]    0.79160514 0.378061629 1.65750408
## fu(10,15]   1.12919965 0.548506063 2.32466319
## fu(15,22]   0.95111381 0.320099619 2.82604985
## hienghigh   0.52244901 0.288986787 0.94451712

There seems to be no confounding by time-since-entry, but there is confounding by bmi and job.