Biostat III exercises in R

Laboratory exercise 13

Suggested solutions by

Author: Johan Zetterqvist, 2015-03-04

Modelling the diet data using Cox regression


Now fit a model to the localised melanoma data where the outcome is observed survival (i.e. all deaths are considered to be events).

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

require(foreign)  # for reading data set from Stata
require(survival) # for Surv and survfit

Load the melanoma data and explore it.

diet <- data.frame(read.dta("http://biostat3.net/download/diet.dta"))

(a) Fit the following Poisson regression model to the diet data (we fitted this same model in question 6). Now fit the following Cox model.

  1. On what scale are we measuring ‘time’ ? That is, what is the timescale?
  2. Is it correct to say that both of these models estimate the effect of high energy on CHD without controlling for any potential confounders? If not, how are these models conceptually different?
  3. Would you expect the parameter estimates for these two models to be very different? Is there a large difference?
## y is the observed time
## so it already measures time since entry
poisson13a <- glm( chd ~ hieng + offset( log( y ) ), family=poisson, data=diet)
summary(poisson13a)
## 
## Call:
## glm(formula = chd ~ hieng + offset(log(y)), family = poisson, 
##     data = diet)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7382  -0.6337  -0.4899  -0.3891   3.0161  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -4.2980     0.1890 -22.744   <2e-16 ***
## hienghigh    -0.6532     0.3021  -2.162   0.0306 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 262.82  on 336  degrees of freedom
## Residual deviance: 258.00  on 335  degrees of freedom
## AIC: 354
## 
## Number of Fisher Scoring iterations: 6
exp(cbind(coef(poisson13a), confint(poisson13a)))
##                              2.5 %     97.5 %
## (Intercept) 0.01359599 0.009161455 0.01927158
## hienghigh   0.52035993 0.282917136 0.93283916
cox13a <- coxph(Surv(y, chd) ~ hieng, data=diet, ties="breslow")
summary(cox13a)
## Call:
## coxph(formula = Surv(y, chd) ~ hieng, data = diet, ties = "breslow")
## 
##   n= 337, number of events= 46 
## 
##              coef exp(coef) se(coef)      z Pr(>|z|)  
## hienghigh -0.6475    0.5234   0.3022 -2.143   0.0321 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## hienghigh    0.5234      1.911    0.2895    0.9462
## 
## Concordance= 0.581  (se = 0.038 )
## Rsquare= 0.014   (max possible= 0.781 )
## Likelihood ratio test= 4.73  on 1 df,   p=0.02961
## Wald test            = 4.59  on 1 df,   p=0.03213
## Score (logrank) test = 4.75  on 1 df,   p=0.02923

(b) Use the attained age as the timescale and refit the Cox model. Is the estimate of the effect of high energy different? Would we expect it to be different?

poisson13a <- glm( chd ~ hieng + offset( log( y ) ), family=poisson, data=diet)
summary(poisson13a)
## 
## Call:
## glm(formula = chd ~ hieng + offset(log(y)), family = poisson, 
##     data = diet)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7382  -0.6337  -0.4899  -0.3891   3.0161  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -4.2980     0.1890 -22.744   <2e-16 ***
## hienghigh    -0.6532     0.3021  -2.162   0.0306 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 262.82  on 336  degrees of freedom
## Residual deviance: 258.00  on 335  degrees of freedom
## AIC: 354
## 
## Number of Fisher Scoring iterations: 6
exp(cbind(coef(poisson13a), confint(poisson13a)))
##                              2.5 %     97.5 %
## (Intercept) 0.01359599 0.009161455 0.01927158
## hienghigh   0.52035993 0.282917136 0.93283916
## Create two new variables for age
## age at study entry
diet$entry_age <-  as.numeric(diet$doe - diet$dob) / 365.24
## age at study exit
diet$exit_age <- as.numeric(diet$dox - diet$dob) / 365.24

## Use the new age variables to provide
## counting process data to coxph
cox13b <- coxph(Surv(entry_age, exit_age, chd) ~ hieng, data=diet, ties="breslow")
summary(cox13b)
## Call:
## coxph(formula = Surv(entry_age, exit_age, chd) ~ hieng, data = diet, 
##     ties = "breslow")
## 
##   n= 337, number of events= 46 
## 
##              coef exp(coef) se(coef)      z Pr(>|z|)  
## hienghigh -0.6113    0.5426   0.3028 -2.019   0.0435 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## hienghigh    0.5426      1.843    0.2998    0.9823
## 
## Concordance= 0.584  (se = 0.038 )
## Rsquare= 0.012   (max possible= 0.755 )
## Likelihood ratio test= 4.2  on 1 df,   p=0.04049
## Wald test            = 4.08  on 1 df,   p=0.04349
## Score (logrank) test = 4.2  on 1 df,   p=0.04035