BDS: Biostatistics 1

Code by Thorgerdur Palsdottir (modified by Mark Clements)

Exercise: Prediction

Begin by loading all the necessary packages you need at top. Note that WebR does not currently support the nlpred and glmtoolbox packages. We have also removed the dependency on tidyverse.

Read in the original data and define derived variables. Define all variables here. Note that we have used base::within() to calculate the variables.

Create the analysis set only including the variables that you will use.

Create a complete case version

Extra exercise: Impute the missing data.

Since there is only a small fraction missing here, we will use a method called predictive mean matching and use one dataset (m=1). This is not included in the lectures. This is slow:(.

Here you can see how the missing values for cholesterol were replaced

1. Table 1

Create table 1. Describe all available variables.

Show both, the original data and the imputed data.

2. Overall risk or overall rate

a. What is the outcome we are interested in?

b. What are the known risk factor for our outcome of interest?

c. Total number of persons

d. What is the overall risk or rate and the number of events

3. build the prediction model, choose model and predictors for the optimal model

a. the optimal model should be:

b. same model using age in agegroup variable and an interaction term with personality type

We will use model fit3.

c. create a risk prediction for all the persons in our data

4. discrimination

4.a ROC curve and AUC with 95% CI

4.b plot the ROC curve and find the best threshold and report the sens and spec at that threshold

or

c. AUC adjusted for optimism

Here we use the validate function to estimate the adjusted value for auc due to optimism via the bootstrap method.

Original and adjusted value

d. crossvalidation

We are unable to run the commented code on WebR.

5. calibration

a. Plot the calibration curve, estimate the intercept and the slopt and use Hosmer Lemeshov goodness of fit to estimate the calibration

b. Hosmer Lemeshow goodness of fit

Please estimate the goodness of fit by the method of Hosmer and Lemeshow. The glmtoolbox package is not available on WebR.

c. Improvment in discrimination - difference in AUC

d. compare discrimination

e. plot both roc curves in one plot

6. Decision Curve Analysis

a. plot the decision curve and estimate net benefits

b. what are the net benefits of this model

c. add the model with agegroups to the plot and discuss

clinical usefulness

Is there clinical value?