Author: Annika Tillander, 2014-01-30

Edited: Andreas Karlsson, 2015-02-28, 2016-03-08

Localised melanoma: Comparing estimates of cause-specific survival between periods; first graphically and then using the log rank test

You may have to install the required packages the first time you use them. You can install a package by `install.packages("package_of_interest")`

for each package you require.

```
require(foreign) # needed to read data set from Stata
require(survival) # for Surv and survfit
require(muhaz) # for hazard estimates
require(dplyr) # for data manipulation
```

We will now analyse the full data set of patients diagnosed with localised skin melanoma. We start by reading the data selecting those with a localised stage and then define a 1/0 varible for the events that we are interested in.

```
melanoma_raw <- read.dta("http://biostat3.net/download/melanoma.dta")
melanoma <- melanoma_raw %>%
filter(stage=="Localised") %>% # subset those with localised cancer
mutate(death_cancer = ifelse( status == "Dead: cancer", 1, 0),
death_all = ifelse( status == "Dead: cancer" |
status == "Dead: other", 1, 0))
```

**(a)** Estimate the cause-specific survivor function, using the Kaplan-Meier method with survival time in months, separately for each of the two calendar periods 1975â€“1984 and 1985â€“1994. The variable year8594 indicates whether a patient was diagnosed 1985â€“1994 or 1975â€“1984. Without making reference to any formal statistical tests, does it appear that patient survival is superior during the most recent period?

```
mfityear8594 <- survfit(Surv(surv_mm, death_cancer==1) ~ year8594, data = melanoma)
plot(mfityear8594, col = 1:2,
xlab = "Follow-up Time",
ylab = "Survival",
main = "Kaplan-Meier survival estimates")
legend("topright", c("Diagnosed 75-84", "Diagnosed 85-94"), col=1:2, lty = 1)
```

**(b)** The following commands can be used to plot the hazard function (instantaneous mortality rate): i. At what point in the follow-up is mortality highest? ii. Does this pattern seem reasonable from a clinicial/biological perspective? [HINT:Consider the disease with which these patients were classified as being diagnosed along with the expected fatality of the disease as a function of time since diagnosis.]

```
hazByGroup <- function(group){
with(subset(melanoma, group),
muhaz(times=surv_mm,
delta=death_cancer,
min.time = min(surv_mm),
max.time = max(surv_mm)))
}
plot(hazByGroup(melanoma$year8594 == "Diagnosed 75-84"), col=1, main="Smoothed hazard estimates")
lines(hazByGroup(melanoma$year8594 == "Diagnosed 85-94"), col=2)
legend("topright", c("Diagnosed 75-84", "Diagnosed 85-94"), col=1:2, lty = 1)
```

**(c)** Use the log rank and the Wilcoxon test to determine whether there is a statistically significant difference in patient survival between the two periods.

```
## Log-rank test for equality of survivor functions
survdiff(Surv(surv_mm, death_cancer==1) ~ year8594, data=melanoma)
```

```
## Call:
## survdiff(formula = Surv(surv_mm, death_cancer == 1) ~ year8594,
## data = melanoma)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## year8594=Diagnosed 75-84 2145 572 512 7.03 15.5
## year8594=Diagnosed 85-94 3173 441 501 7.18 15.5
##
## Chisq= 15.5 on 1 degrees of freedom, p= 8.23e-05
```

```
## Equivalent to the Peto & Peto modfication of the Gehan-Wilcoxon test
survdiff(Surv(surv_mm, death_cancer==1) ~ year8594, data=melanoma, rho=1)
```

```
## Call:
## survdiff(formula = Surv(surv_mm, death_cancer == 1) ~ year8594,
## data = melanoma, rho = 1)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## year8594=Diagnosed 75-84 2145 502 447 6.82 16.2
## year8594=Diagnosed 85-94 3173 400 455 6.70 16.2
##
## Chisq= 16.2 on 1 degrees of freedom, p= 5.82e-05
```

Havenâ€™t heard of the log rank (or Wilcoxon) test? Itâ€™s possible you may reach this exercise before we cover the details of these tests during lectures. You should nevertheless do the exercise and try and interpret the results. Both of these tests (the log rank and the generalised Wilcoxon) are used to test for differences between the survivor functions. The null hypothesis is that the survivor functions are equivalent for the two calendar periods (i.e., patient survival does not depend on calendar period of diagnosis).

**(d)** Estimate cause-specific mortality rates for each age group, and graph Kaplan-Meier estimates of the cause-specific survivor function for each age group. Are there differences between the age groups? Is the interpretation consistent between the mortality rates and the survival proportions?

```
melanoma %>%
select(death_cancer, surv_mm, agegrp) %>%
group_by(agegrp) %>%
summarise(D = sum(death_cancer), Y = sum(surv_mm)/1000, Rate = D/Y) %>%
mutate(CI_low = Rate + qnorm(0.025) * Rate / sqrt(D),
CI_high = Rate + qnorm(0.975) * Rate / sqrt(D))
```

```
## Source: local data frame [4 x 6]
##
## agegrp D Y Rate CI_low CI_high
## 1 0-44 217 157.1215 1.381097 1.197340 1.564853
## 2 45-59 282 148.8215 1.894887 1.673727 2.116048
## 3 60-74 333 121.3380 2.744400 2.449637 3.039163
## 4 75+ 181 36.2380 4.994757 4.267106 5.722408
```

```
mfit_agegrp <- survfit(Surv(surv_mm, death_cancer==1) ~ agegrp, data = melanoma)
plot(mfit_agegrp, col = 1:4,
xlab = "Months since diagnosis",
ylab = "Survival",
main = "Kaplan-Meier survival estimates")
legend("bottomleft", c("0-44", "45-59", "60-74", "75+"), col=1:4, lty = 1)
```