We will now analyse the full data set of patients diagnosed with localised skin melanoma. We start by reading the data selecting those with a localised stage and then define a 1/0 varible for the events that we are interested in.
Estimate the cause-specific survivor function, using the Kaplan-Meier method with survival time in months, separately for each of the two calendar periods 1975β1984 and 1985β1994. The variable year8594 indicates whether a patient was diagnosed 1985β1994 or 1975β1984. Without making reference to any formal statistical tests, does it appear that patient survival is superior during the most recent period?
There seems to be a clear difference in survival between the two periods. Patients diagnosed during 1985β94 have superior survival to those diagnosed 1975β84.
Plot the hazard function (instantaneous mortality rate):
Using the bshazard
package and base graphics:
Using ggplot2
:
Use the log rank and the Wilcoxon test to determine whether there is a statistically significant difference in patient survival between the two periods.
Havenβt heard of the log rank test? Itβs possible you may reach this exercise before we cover the details of this test during lectures. You should nevertheless do the exercise and try and interpret the results. Both of these tests (the log rank and the generalised Wilcoxon) are used to test for differences between the survivor functions. The null hypothesis is that the survivor functions are equivalent for the two calendar periods (i.e., patient survival does not depend on calendar period of diagnosis).
Estimate cause-specific mortality rates for each age group, and graph Kaplan-Meier estimates of the cause-specific survivor function for each age group. Are there differences between the age groups? Is the interpretation consistent between the mortality rates and the survival proportions?
What are the units of the estimated hazard rates? HINT: look at how you defined time.
Repeat some of the previous analyses using years instead of months. This is equivalent to dividing the time variable by 12 so all analyses will be the same except the units of time will be different (e.g., the graphs will have different labels).
Study whether there is evidence of a difference in patient survival between males and females. Estimate both the hazard and survival function and use the log rank test to test for a difference.