library(devtools)
devtools::install_github("AlexanderLyNL/Safestats", ref = "logrank")
library(safestats)
library(survival)
library(knitr)
An example data set in the requested format for the ALL-IN-META-BCG-CORONA is shown below.
Please find the details to generate this toy data set in the complete R markdown file here.
intervention | dateRand | hospital | COV19 | dateCOV19 | COV19hosp | dateCOV19hosp | dateLastFup |
---|---|---|---|---|---|---|---|
control | 2020-05-07 | A | yes | 2020-05-11 | yes | 2020-05-15 | 2020-06-23 |
control | 2020-05-04 | B | yes | 2020-05-08 | yes | 2020-05-12 | 2020-06-23 |
BCG | 2020-05-08 | A | yes | 2020-05-21 | yes | 2020-06-01 | 2020-06-23 |
control | 2020-05-07 | B | yes | 2020-05-25 | no | NA | 2020-06-23 |
BCG | 2020-05-05 | A | yes | 2020-05-24 | no | NA | 2020-06-23 |
BCG | 2020-05-10 | B | yes | 2020-06-03 | no | NA | 2020-06-23 |
control | 2020-05-14 | A | yes | 2020-06-23 | no | NA | 2020-06-23 |
control | 2020-05-10 | B | no | NA | no | NA | 2020-06-23 |
BCG | 2020-05-08 | A | no | NA | no | NA | 2020-06-23 |
BCG | 2020-05-04 | B | no | NA | no | NA | 2020-06-23 |
We need to be sure that all date variables are actual dates, such that we can substract them from each other to obtain event and censoring times.
data$dateRand <- as.Date(data$dateRand, format = "%Y-%m-%d")
data$dateLastFup <- as.Date(data$dateLastFup, format = "%Y-%m-%d")
data$dateCOV19 <- as.Date(data$dateCOV19, format = "%Y-%m-%d")
data$dateCOV19hosp <- as.Date(data$dateCOV19hosp, format = "%Y-%m-%d")
If your dates are formatted in a different way (e.g. "%d-%m-%y"
or "%y-%m-%d"
), make sure to process them correctly.
Also, make sure that the intervention variable is coded the right way around, with the control group as the first and the treatment group as the second level of the factor. That way, the hazard ratio will be defined as:
\[\begin{equation} hr = \frac{\lambda_{BCG}}{\lambda_{control}} \end{equation}\]
with a hazard ratio less than 1 showing evidence for a benefit of BCG and a hazard ratio greater than 1 showing evidence for harm.
data$intervention <- factor(data$intervention, levels = c("control", "BCG"))
Next we assign a time
and status
variable as is conventional in survival analysis, coding an event time as status = 2
and a censoring time as status = 1
, such that we can use Surv()
to assign a survival object.
startDate <- as.Date("2020-03-25")
data$timeCOV19 <- ifelse(data$COV19 %in% c("Yes", "yes"),
data$dateCOV19 - startDate,
data$dateLastFup - startDate)
data$statusCOV19 <- ifelse(data$COV19 %in% c("Yes", "yes"),
2, 1)
data$timeCOV19hosp <- ifelse(data$COV19hosp %in% c("Yes", "yes"),
data$dateCOV19hosp - startDate,
data$dateLastFup - startDate)
data$statusCOV19hosp <- ifelse(data$COV19hosp %in% c("Yes", "yes"),
2, 1)
data$survObjCOV19 <- Surv(time = data$dateRand - startDate,
time2 = data$timeCOV19,
event = data$statusCOV19,
type = "counting")
## Warning: please do not proceed by creating a Surv() object based on status = 1,2
## when your data set does not contain any events. In case of no events,
## safeLogRankTest based on a Surv() object will misinterpret all censoring times
## coded as 1 to be events. In case of no events, your e-value is a neutral 1,
## and you do not need to perform any further analysis.
data$survObjCOV19hosp <- Surv(time = data$dateRand - startDate,
time2 = data$timeCOV19hosp,
event = data$statusCOV19hosp,
type = "counting")
The start date (time = 0
) of our event and censoring times is the day the first participant of the first study (the Dutch study) was enrolled, which is startDate = 2020-03-25
. This is the start date for all studies, see our Statistical Analysis Plan: we use time since the first randomized participant instead of time since a participants’ own randomization. This makes the event-times left-truncated, which is visible in the creation of the survival object of the type counting
that includes the time of randomization to define when participants enter the risk set. For more details on this formulation, please refer to our Tutorial on left-truncation.
intervention | dateRand | hospital | COV19 | dateCOV19 | COV19hosp | dateCOV19hosp | dateLastFup | survObjCOV19 | survObjCOV19hosp |
---|---|---|---|---|---|---|---|---|---|
control | 2020-05-07 | A | yes | 2020-05-11 | yes | 2020-05-15 | 2020-06-23 | (43,47] | (43,51] |
control | 2020-05-04 | B | yes | 2020-05-08 | yes | 2020-05-12 | 2020-06-23 | (40,44] | (40,48] |
BCG | 2020-05-08 | A | yes | 2020-05-21 | yes | 2020-06-01 | 2020-06-23 | (44,57] | (44,68] |
control | 2020-05-07 | B | yes | 2020-05-25 | no | NA | 2020-06-23 | (43,61] | (43,90+] |
BCG | 2020-05-05 | A | yes | 2020-05-24 | no | NA | 2020-06-23 | (41,60] | (41,90+] |
BCG | 2020-05-10 | B | yes | 2020-06-03 | no | NA | 2020-06-23 | (46,70] | (46,90+] |
control | 2020-05-14 | A | yes | 2020-06-23 | no | NA | 2020-06-23 | (50,90] | (50,90+] |
control | 2020-05-10 | B | no | NA | no | NA | 2020-06-23 | (46,90+] | (46,90+] |
BCG | 2020-05-08 | A | no | NA | no | NA | 2020-06-23 | (44,90+] | (44,90+] |
BCG | 2020-05-04 | B | no | NA | no | NA | 2020-06-23 | (40,90+] | (40,90+] |
To agree with the design shown on the left hand side of the ALL-IN-META-BCG-CORONA dashboard, specify your design objects as follows:
designObjCOV19L <- designSafeLogrank(hrMin = 0.8,
alpha = 0.1*0.025,
alternative = "less", # one-sided test hr < 1
ratio = 1)
designObjCOV19L
##
## Safe Logrank Test Design
##
## minimal hazard ratio = 0.8
## alternative = less
## parameter: log(thetaS) = -0.2231436
## alpha = 0.0025
## decision rule: e-value > 1/alpha = 400
##
## Timestamp: 2021-06-28 17:41:11 CEST
designObjCOV19G <- designSafeLogrank(hrMin = 1/0.8,
alpha = 0.1*0.025,
alternative = "greater", # one-sided test hr > 1
ratio = 1)
designObjCOV19G
##
## Safe Logrank Test Design
##
## minimal hazard ratio = 1.25
## alternative = greater
## parameter: log(thetaS) = 0.2231436
## alpha = 0.0025
## decision rule: e-value > 1/alpha = 400
##
## Timestamp: 2021-06-28 17:41:11 CEST
designObjCOV19hospL <- designSafeLogrank(hrMin = 0.7,
alpha = 0.9*0.025,
alternative = "less", # one-sided test hr < 1
ratio = 1)
designObjCOV19hospL
##
## Safe Logrank Test Design
##
## minimal hazard ratio = 0.7
## alternative = less
## parameter: log(thetaS) = -0.3566749
## alpha = 0.0225
## decision rule: e-value > 1/alpha = 44.44444
##
## Timestamp: 2021-06-28 17:41:11 CEST
designObjCOV19hospG <- designSafeLogrank(hrMin = 1/0.7,
alpha = 0.9*0.025,
alternative = "greater", # one-sided test hr > 1
ratio = 1)
designObjCOV19hospG
##
## Safe Logrank Test Design
##
## minimal hazard ratio = 1.428571
## alternative = greater
## parameter: log(thetaS) = 0.3566749
## alpha = 0.0225
## decision rule: e-value > 1/alpha = 44.44444
##
## Timestamp: 2021-06-28 17:41:11 CEST
safeLogrankTest(data$survObjCOV19 ~ data$intervention,
designObj = designObjCOV19L)
##
## Safe Logrank Test
##
## data: data$survObjCOV19 by data$intervention (control, BCG). nEvents = 7
##
## test: log(thetaS) = -0.22314
## e-value = 1.1513 > 1/alpha = 400 : FALSE
## alternative hypothesis: true hazard ratio is less than 1
##
## design: the test was designed with alpha = 0.0025
## for minimal relevant hazard ratio = 0.8 (less)
safeLogrankTest(data$survObjCOV19 ~ data$intervention,
designObj = designObjCOV19G)
##
## Safe Logrank Test
##
## data: data$survObjCOV19 by data$intervention (control, BCG). nEvents = 7
##
## test: log(thetaS) = 0.22314
## e-value = 0.79843 > 1/alpha = 400 : FALSE
## alternative hypothesis: true hazard ratio is greater than 1
##
## design: the test was designed with alpha = 0.0025
## for minimal relevant hazard ratio = 1.25 (greater)
safeLogrankTest(data$survObjCOV19hosp ~ data$intervention,
designObj = designObjCOV19hospL)
##
## Safe Logrank Test
##
## data: data$survObjCOV19hosp by data$intervention (control, BCG). nEvents = 3
##
## test: log(thetaS) = -0.35667
## e-value = 1.2406 > 1/alpha = 44.444 : FALSE
## alternative hypothesis: true hazard ratio is less than 1
##
## design: the test was designed with alpha = 0.0225
## for minimal relevant hazard ratio = 0.7 (less)
safeLogrankTest(data$survObjCOV19hosp ~ data$intervention,
designObj = designObjCOV19hospG)
##
## Safe Logrank Test
##
## data: data$survObjCOV19hosp by data$intervention (control, BCG). nEvents = 3
##
## test: log(thetaS) = 0.35667
## e-value = 0.73506 > 1/alpha = 44.444 : FALSE
## alternative hypothesis: true hazard ratio is greater than 1
##
## design: the test was designed with alpha = 0.0225
## for minimal relevant hazard ratio = 1.4286 (greater)
We define the function procEseq()
, based on the rationale of processing an e-value sequence per calendar time detailed in the two tutorials on left-truncation and staggered entry. Our analysis uses the left-truncation approach for our data on a calendar time scale:
procEseq <- function(data, eventName, statusVar, timeVar, randDateVar, eventDateVar, interventionVar,
designObjL, designObjG, timeScale = "calendar",
startDate, endDate) {
eValuesL <-
eValuesG <- structure(rep(1,
times = endDate - startDate + 1),
names = as.character(seq.Date(from = startDate,
to = endDate,
by = "day")))
# before you observe any event, your e-value is 1
# return all 1 sequence if there are no events
if (sum(data[[statusVar]] == 2) == 0) {
eValues <- structure(list(eValuesL, eValuesG),
names = c(paste0(eventName, "L"), paste0(eventName, "G")))
return(eValues)
}
interimCalDates <- as.character(sort(unique(data[[eventDateVar]])))
# loop trough the calendar dates
for (calDate in as.character(seq.Date(from = as.Date(interimCalDates[1]),
to = endDate,
by = "day"))) {
# at dates without an event
if (!(calDate %in% interimCalDates)) {
# evidence stays the same as the day before
eValuesL[calDate] <- eValuesL[as.character(as.Date(calDate) - 1)]
eValuesG[calDate] <- eValuesG[as.character(as.Date(calDate) - 1)]
} else {
# all participants that enter after today, are not yet in my data set so far
dataSoFar <- data[data[[randDateVar]] < as.Date(calDate), ]
if (timeScale == "calendar") {
# I don't know how long participants are at risk whose events happen in the future
# only that they are still at risk today:
dataSoFar[[timeVar]] <- pmin(dataSoFar[[timeVar]], as.Date(calDate) - startDate)
# the status of participants with events in the future is censored:
dataSoFar[[statusVar]][dataSoFar[[eventDateVar]] > as.Date(calDate)] <- 1
dataSoFar$survObj <- Surv(time = dataSoFar[[randDateVar]] - startDate,
time2 = dataSoFar[[timeVar]],
event = dataSoFar[[statusVar]],
type = "counting")
}
if (timeScale == "participant") {
# I don't know how long participants are at risk whose events happen in the future
# only that they are still at risk today:
dataSoFar[[timeVar]] <- pmin(dataSoFar[[timeVar]], as.Date(calDate) - dataSoFar[[randDateVar]])
# the status of participants with events in the future is censored:
dataSoFar[[statusVar]][dataSoFar[[eventDateVar]] > as.Date(calDate)] <- 1
dataSoFar$survObj <- Surv(dataSoFar[[timeVar]], dataSoFar[[statusVar]])
}
eValuesL[calDate] <- safeLogrankTest(dataSoFar$survObj ~ dataSoFar[[interventionVar]],
designObj = designObjL
)$eValue
eValuesG[calDate] <- safeLogrankTest(dataSoFar$survObj ~ dataSoFar[[interventionVar]],
designObj = designObjG
)$eValue
}
}
eValues <- structure(list(eValuesL, eValuesG),
names = c(paste0(eventName, "L"), paste0(eventName, "G")))
return(eValues)
}
This function returns a list of two e-value sequences. In the example below: one for each one-sided test based on designObjCOV19hospL
and designObjCOV19hospG
.
eValues <- procEseq(data = data, eventName = "COV19hosp",
statusVar = "statusCOV19hosp", timeVar = "timeCOV19hosp",
randDateVar = "dateRand", eventDateVar = "dateCOV19hosp",
interventionVar = "intervention",
designObjL = designObjCOV19hospL, designObjG = designObjCOV19hospG,
timeScale = "calendar",
startDate = startDate,
endDate = max(data$dateLastFup, na.rm = T))
str(eValues)
## List of 2
## $ COV19hospL: Named num [1:91] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "names")= chr [1:91] "2020-03-25" "2020-03-26" "2020-03-27" "2020-03-28" ...
## $ COV19hospG: Named num [1:91] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "names")= chr [1:91] "2020-03-25" "2020-03-26" "2020-03-27" "2020-03-28" ...
If we have more than one hospital in our data set, we should stratify our Safe logrank e-values by hospital and multiply the sequences:
hospitals <- unique(data$hospital)
names(hospitals) <- as.character(hospitals)
eValuesCOV19.str <- lapply(hospitals, function(hospital)
procEseq(data = data[data$hospital == hospital, ],
eventName = "COV19",
statusVar = "statusCOV19", timeVar = "timeCOV19",
randDateVar = "dateRand", eventDateVar = "dateCOV19",
interventionVar = "intervention",
designObjL = designObjCOV19L, designObjG = designObjCOV19G,
timeScale = "calendar",
startDate = as.Date("2020-03-25"), # startDate defined above
endDate = max(data$dateLastFup, na.rm = T)))
eValuesCOV19hosp.str <- lapply(hospitals, function(hospital)
procEseq(data = data[data$hospital == hospital, ],
eventName = "COV19hosp",
statusVar = "statusCOV19hosp", timeVar = "timeCOV19hosp",
randDateVar = "dateRand", eventDateVar = "dateCOV19hosp",
interventionVar = "intervention",
designObjL = designObjCOV19hospL, designObjG = designObjCOV19hospG,
timeScale = "calendar",
startDate = as.Date("2020-03-25"), # startDate defined above
endDate = max(data$dateLastFup, na.rm = T)))
The above use of the function lapply()
filters the data set based on hospital and then applies the function procEseq()
that processes the data of participants belonging to each hospital into a sequence of e-values. The result is a list of lists: for each hospital in the list, a list of two e-value sequences for the two one-sided tests. Here the hospitals have names A, B.
str(eValuesCOV19.str)
## List of 2
## $ A:List of 2
## ..$ COV19L: Named num [1:91] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "names")= chr [1:91] "2020-03-25" "2020-03-26" "2020-03-27" "2020-03-28" ...
## ..$ COV19G: Named num [1:91] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "names")= chr [1:91] "2020-03-25" "2020-03-26" "2020-03-27" "2020-03-28" ...
## $ B:List of 2
## ..$ COV19L: Named num [1:91] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "names")= chr [1:91] "2020-03-25" "2020-03-26" "2020-03-27" "2020-03-28" ...
## ..$ COV19G: Named num [1:91] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "names")= chr [1:91] "2020-03-25" "2020-03-26" "2020-03-27" "2020-03-28" ...
We collect all one-sided tests for alternative less (L
) and all one-sided tests for alternative greater (G
) seperately in a matrix using sapply
, shown for the event COV19
and the greater (G
) test in the example below.
eValuesCOV19L.matrix <- sapply(hospitals, function(hospital) eValuesCOV19.str[[hospital]]$COV19L)
kable(head(eValuesCOV19L.matrix))
A | B | |
---|---|---|
2020-03-25 | 1 | 1 |
2020-03-26 | 1 | 1 |
2020-03-27 | 1 | 1 |
2020-03-28 | 1 | 1 |
2020-03-29 | 1 | 1 |
2020-03-30 | 1 | 1 |
kable(head(eValuesCOV19L.matrix[row.names(eValuesCOV19L.matrix) > as.Date("2020-05-18"), ]))
A | B | |
---|---|---|
2020-05-19 | 1.176471 | 1.071429 |
2020-05-20 | 1.176471 | 1.071429 |
2020-05-21 | 1.107266 | 1.071429 |
2020-05-22 | 1.107266 | 1.071429 |
2020-05-23 | 1.107266 | 1.071429 |
2020-05-24 | 1.022092 | 1.071429 |
We do so for each combination of the two one-sided tests and the two events COV19
and COV19hosp
, and then multiply each row of e-values for the same date using apply()
and prod()
, to the rows (MARGIN = 1
) to obtain our overall e-value sequence:
eValues <- list("COV19L" = apply(sapply(hospitals,
function(hospital) eValuesCOV19.str[[hospital]]$COV19L),
MARGIN = 1, FUN = prod, na.rm = T),
"COV19G" = apply(sapply(hospitals,
function(hospital) eValuesCOV19.str[[hospital]]$COV19G),
MARGIN = 1, FUN = prod, na.rm = T),
"COV19hospL" = apply(sapply(hospitals,
function(hospital) eValuesCOV19hosp.str[[hospital]]$COV19hospL),
MARGIN = 1, FUN = prod, na.rm = T),
"COV19hospG" = apply(sapply(hospitals,
function(hospital) eValuesCOV19hosp.str[[hospital]]$COV19hospG),
MARGIN = 1, FUN = prod, na.rm = T))
plot(NULL, xlim = as.Date(c(as.Date("2020-03-23"), as.Date(Sys.Date()))),
ylim = c(-5, 9), xaxs = "i",
ylab = "e-value", xlab = "Calendar days", yaxt = "n", xaxt = "n",
main = "Test: COVID19 hr < 1, benefit")
axis <- mapply(axis, side = 2, at = c(-5:8, log(1/designObjCOV19L$alpha)/log(2), 2^9), cex.axis = 1,
labels = c(paste0("1/", 2^(5:1)), 2^(0:7), "", 1/designObjCOV19L$alpha, 2^9), las = 1,
tick = T, tck = 1, col.ticks = "lightgrey", lty = "dotted")
axis.Date(side = 1, at = seq.Date(from = as.Date("2020-03-23"), to = as.Date(Sys.Date()), by = "week"),
format = "%g/%m/%d", tick = T, tck = 1, col.ticks = "lightgrey", lty = "dotted")
abline(h = log(1/designObjCOV19L$alpha)/log(2), lty = 2)
lines(x = as.Date(names(eValues$COV19L)), y = log(eValues$COV19L)/log(2),
'l', col = "darkgreen", lwd = 2)
plot(NULL, xlim = as.Date(c(as.Date("2020-03-23"), as.Date(Sys.Date()))),
ylim = c(-5, 9), xaxs = "i",
ylab = "e-value", xlab = "Calendar days", yaxt = "n", xaxt = "n",
main = "Test: COVID19 hr > 1, harm")
axis <- mapply(axis, side = 2, at = c(-5:8, log(1/designObjCOV19G$alpha)/log(2), 2^9), cex.axis = 1,
labels = c(paste0("1/", 2^(5:1)), 2^(0:7), "", 1/designObjCOV19G$alpha, 2^9), las = 1,
tick = T, tck = 1, col.ticks = "lightgrey", lty = "dotted")
axis.Date(side = 1, at = seq.Date(from = as.Date("2020-03-23"), to = as.Date(Sys.Date()), by = "week"),
format = "%g/%m/%d", tick = T, tck = 1, col.ticks = "lightgrey", lty = "dotted")
abline(h = log(1/designObjCOV19G$alpha)/log(2), lty = 2)
lines(x = as.Date(names(eValues$COV19G)), y = log(eValues$COV19G)/log(2),
'l', col = "darkgreen", lwd = 2)
plot(NULL, xlim = as.Date(c(as.Date("2020-03-23"), as.Date(Sys.Date()))),
ylim = c(-5, 6), xaxs = "i",
ylab = "e-value", xlab = "Calendar days", yaxt = "n", xaxt = "n",
main = "Test: COVID19 hospitalization hr < 1, benefit")
axis <- mapply(axis, side = 2, at = c(-5:5, log(1/designObjCOV19hospL$alpha)/log(2), 6),
cex.axis = 1, las = 1,
labels = c(paste0("1/", 2^(5:1)), 2^(0:4), "", round(1/designObjCOV19hospL$alpha), ""),
tick = T, tck = 1, col.ticks = "lightgrey", lty = "dotted")
axis.Date(side = 1, at = seq.Date(from = as.Date("2020-03-23"), to = as.Date(Sys.Date()), by = "week"),
format = "%g/%m/%d", tick = T, tck = 1, col.ticks = "lightgrey", lty = "dotted")
abline(h = log(1/designObjCOV19hospL$alpha)/log(2), lty = 2)
lines(x = as.Date(names(eValues$COV19hospL)), y = log(eValues$COV19hospL)/log(2),
'l', col = "darkgreen", lwd = 2)
plot(NULL, xlim = as.Date(c(as.Date("2020-03-23"), as.Date(Sys.Date()))),
ylim = c(-5, 6), xaxs = "i",
ylab = "e-value", xlab = "Calendar days", yaxt = "n", xaxt = "n",
main = "Test: COVID19 hospitalization hr > 1, harm")
axis <- mapply(axis, side = 2, at = c(-5:5, log(1/designObjCOV19hospG$alpha)/log(2), 6),
cex.axis = 1, las = 1,
labels = c(paste0("1/", 2^(5:1)), 2^(0:4), "", round(1/designObjCOV19hospG$alpha), ""),
tick = T, tck = 1, col.ticks = "lightgrey", lty = "dotted")
axis.Date(side = 1, at = seq.Date(from = as.Date("2020-03-23"), to = as.Date(Sys.Date()), by = "week"),
format = "%g/%m/%d", tick = T, tck = 1, col.ticks = "lightgrey", lty = "dotted")
abline(h = log(1/designObjCOV19hospG$alpha)/log(2), lty = 2)
lines(x = as.Date(names(eValues$COV19hospG)), y = log(eValues$COV19hospG)/log(2),
'l', col = "darkgreen", lwd = 2)