Participants in a study often not all enter on the same day, such that their dates of randomization differ. This leads to one of two phenomena in the statistical analysis that we – following the literature – will call ‘left-truncation’ and ‘staggered entry’. Whether our analysis has to deal with either of the two depends on the chosen time scale most relevant to the occurrence of the events.
On the one hand, time-to-event can be calendar time, e.g. time to an infection that occurs in (epidemic) waves. All participants in a risk set share a hazard if they are in follow-up and event-free on the same calendar date, such that late entry occurs as left-truncated event times. Left-truncation means that participants only enter the risk set once they enter the study but have already ‘survived’ some calendar time that might have observed an event for other participants. Nevertheless, they should not be part of the risk set to evaluate events that happened before they entered, since we know that an event before entry is impossible, e.g. because being alive or more general event-free is an inclusion criterion for study enrollment.
On the other hand, time-to-event can be participant time, specific to each participant, e.g. time since surgery. All participants in a risk set of an event share a hazard if they are in follow-up and event-free for the same time since their own specific date of enrollment/randomization/intervention, such that late entry occurs as ‘staggered entry’. Staggered entry means that participants that enter late could still enter the risk set of events that happened earlier, for events of participants that had the same participant time since their own date of intervention, as the late entered participant experienced since its date of intervention.
Hence in a ‘left-truncation’ analysis, participants that enter late can only enter the risk set of events that happen after (in calendar time) they enter the study, while in ‘staggered entry’ analysis, participants that enter late can enter the risk set of events that already happened. We provide a tutorial on each scenario to illustrate the difference and the right statistical analysis for the sequential Safe logrank test. This is the tutorial on left-truncation. You can find the tutorial on staggered entry here.
In case of sequential analysis under left-truncation we need to supplement the event times with the time of study entry. The main thing for an analysis of a data set per calendar time is that it needs to be aware that participants not yet included in the study can also not provide (censored) information. This tutorial illustrates how to process a time-to-event data set into a sequence of logrank e-values per calendar date that takes time of entry into account.
Please also find the complete R markdown file here.
library(devtools)
devtools::install_github("AlexanderLyNL/Safestats", ref = "logrank")
library(safestats)
library(survival)
library(knitr)
Consider the following small data set:
enrollment <- 10 # 5 treatment, 5 placebo, so:
ratio <- 1 # ratio = nT/nP = 1
fup <- 40 # folow up of 40 days
nEventsC <- 8 # 8 events in 40 days among 10
# this defines the baseline hazard
hr1 <- 0.5 # hazard ratio between treatment en placebo group
# (hr1: alternative hypothesis is true)
# 6 events anticipated within 40 days
# if treatment reduces risk with hr1:
anticipNevents <- nEventsC*(1/(1 + ratio)) +
nEventsC*(ratio/(1 + ratio))*hr1
# assuming constant hazard, we simulate the following baseline hazard and data:
lambdaC <- 1 - (1 - (nEventsC/enrollment))^(1/fup)
data <- generateSurvData(nP = enrollment*(1/(1 + ratio)),
nT = enrollment*(ratio/(1 + ratio)),
lambdaP = lambdaC,
lambdaT = hr1*lambdaC,
endTime = fup,
seed = 2006) # set seed to make the result reproducible
time | status | group |
---|---|---|
4 | 2 | P |
4 | 2 | P |
13 | 2 | T |
18 | 2 | P |
19 | 2 | T |
24 | 2 | T |
40 | 2 | P |
40 | 1 | P |
40 | 1 | T |
40 | 1 | T |
We see events occurring in either the treatment (T
) or the placebo control group (P
) at days after randomization (participant time) d = 4, 13, 18, 19, 24, 40,...
(events have status = 2
, censoring has status = 1
). Our event times now do not have a calendar date yet, nor do they have a date of study entry. The Kaplan-Meier plot above gives the survival by event time assuming that everyone entered the study at the same time, which we shall change below.
Now suppose that it took us two weeks to enroll and randomize everyone. We started on dateRand.start = 2020-05-04
and finished on dateRand.end = 2020-05-15
. So we assign our participants a random start date and order them by date of enrollment/randomization:
set.seed(2005)
data$"dateRand" <- sample(seq.Date(from = dateRand.start, to = dateRand.end, by = "day"),
size = enrollment, replace = TRUE)
data$"dateEvent/LastFup" <- as.Date(data$"dateRand" + data$"time")
data$"dateLastFup" <- as.Date("2020-06-15")
# We must allign all risks in calendar time, so we set all t = 0 to the start date of randomization:
data$"time" <- data$"dateEvent/LastFup" - dateRand.start
# Order the participants by date of enrollment/randomization:
data$"participantID" <- 1:nrow(data)
data$"participantID"[order(data$"dateRand")] <- 1:nrow(data)
data <- data[order(data$"dateRand"), ]
This change of time scale changes the Kaplan-Meier plot, since we now see events occurring in either the treatment (T
) or the placebo control group (P
) at days after the start of the study (calendar time) d = 4, 7, 17, 21, 30, 40, 44, 46, 50,...
.
The interpretation of this Kaplan-Meier plot has also changed: the survival at d days is now not the probability of surviving d days, but the probability of surviving d days given that you entered the study (it is conditional on surviving to the smallest of entry times).
However, in tests such as the sequential Safe logrank test we describe in this tutorial, we are not concerned with estimating the survival itself, or its interpretation, but in whether there is a difference in hazards between two groups. For that, only the ranks of the events are concerned, as we demonstrate below. It can be shown that if the date of study entry is independent of the event, then the hazard conditional on study entry and the unconditional hazard are equivalent. In performing these tests, our main concern is to specify correctly who is part of the risk set of the events observed, based on study entry.
calDate <- sort(data$"dateEvent/LastFup")[1]
d <- data$"time"[data$"dateEvent/LastFup" == calDate]
We do our first interim analysis at calDate = 2020-05-08
. We assume that everyone who is randomized on exactly 2020-05-08, is only at risk starting from the next day, so on this exact calendar day we have 5 participants at risk of the event, 3 in the placebo control group (grey) and 2 in the treatment group (green).
Because the time scale is calendar time, we also have 5 participants at risk for this first event (\(N_1 = 5\)), 3 in the placebo control group and 2 in the treatment group (\(N_{T, 1} = 2\)).
Our first logrank statistic on 2020-05-08 contains 1 event (\(O_1 = 1\)), which is an event in the placebo control group, so the number of events in the treatment group is 0 (\(O_{T, 1} = 0\)). The expected number of events in the treatment group is \(E_{T, 1} = N_{T, 1} \cdot \frac{O_1}{N_1} = 2 \cdot \frac{1}{5} = \frac{2}{5}\).
So our logrank \(Z\)-score at this first event is:
\[\begin{equation} Z^{(1)} = \frac{O_{T, 1} - E_{T, 1}}{\sqrt{V_{T, 1}}} = \frac{0 - \frac{2}{5}}{\sqrt{\frac{2}{5} \cdot (1 - \frac{2}{5})}} = -0.8164966 \end{equation}\]
with \(V_{T, 1}\) the variance of the Bernoulli distribution with probability \(E_{T, 1}\):
\[\begin{equation} V_{T, 1} = E_{T, 1} \cdot (1 - E_{T, 1}) = \frac{2}{5} \cdot (1 - \frac{2}{5}) \end{equation}\]
Our Safe logrank test should consider this exact logrank statistic for the first event at calendar time d = 4
and calendar date calDate = 2020-05-08
, so we need to tell the function that we have not yet observed participants or events randomized at a calender date later than 2020-05-08. We define the dataSoFar
as follows:
dataSoFar <- data[data$"dateRand" < calDate, ]
dataSoFar$"dateLastFup" <- calDate
# We do not yet know the event times in the future,
# only that these participants are in follow-up until this date
dataSoFar$"time" <- pmin(dataSoFar$"time", calDate - dateRand.start)
# The status of future event times is censored for now:
dataSoFar$"status"[dataSoFar$"dateEvent/LastFup" > calDate] <- 1
# And we can assign the following survival object
dataSoFar$"survObj" <- Surv(dataSoFar$"time", dataSoFar$"status")
kable(dataSoFar, row.names = F)
time | status | group | dateRand | dateEvent/LastFup | dateLastFup | participantID | survObj |
---|---|---|---|---|---|---|---|
4 days | 2 | P | 2020-05-04 | 2020-05-08 | 2020-05-08 | 1 | 4 |
4 days | 1 | T | 2020-05-04 | 2020-06-13 | 2020-05-08 | 2 | 4+ |
4 days | 1 | T | 2020-05-04 | 2020-05-25 | 2020-05-08 | 3 | 4+ |
4 days | 1 | P | 2020-05-07 | 2020-05-11 | 2020-05-08 | 4 | 4+ |
4 days | 1 | P | 2020-05-07 | 2020-05-25 | 2020-05-08 | 5 | 4+ |
The third and fourth line in the above code redefine the participant time as we know them on calDate = 2020-05-08
. So we can calculate our logrank test based on this calDate = 2020-05-08
version of our temporary data set dataSoFar
:
safeLogrankTest(exact = FALSE, # to get the approximate safe logrank test
designObj = designObjL, # (based on the logrank Z-statistic)
dataSoFar$"survObj" ~ dataSoFar$"group")
##
## Safe Logrank Test
##
## data: dataSoFar$survObj by dataSoFar$group (P, T). nEvents = 1
## estimates: hazard ratio = 0.19534
##
## test: z = -0.8165, log(thetaS) = -0.35667
## e-value = 1.1385 > 1/alpha = 20 : FALSE
## alternative hypothesis: true hazard ratio is less than 1
##
## design: the test was designed with alpha = 0.05
## for minimal relevant hazard ratio = 0.7 (less)
(The object designObjL
is defined below.)
We have to take a different approach for second event:
calDate <- sort(data$"dateEvent/LastFup")[2]
d <- data$"time"[data$"dateEvent/LastFup" == calDate]
At our second date of interim analysis, calDate = 2020-05-11
where we observe 1 event in the placebo control group with event calendar time d = 7
. New participants entered our study, so we need to be aware that they are in the risk set of our current event, but not in the risk set of our previous event.The code below shows that if we do not explicitly state the date of randomization, the newly entered participants will be considered part of the risk set of the first event.
dataSoFar <- data[data$"dateRand" < calDate, ]
dataSoFar$"dateLastFup" <- calDate
dataSoFar$"time" <- pmin(dataSoFar$"time", calDate - dateRand.start)
dataSoFar$"status"[dataSoFar$"dateEvent/LastFup" > calDate] <- 1
dataSoFar$"survObj" <- Surv(dataSoFar$"time", dataSoFar$"status")
kable(dataSoFar, row.names = F)
time | status | group | dateRand | dateEvent/LastFup | dateLastFup | participantID | survObj |
---|---|---|---|---|---|---|---|
4 days | 2 | P | 2020-05-04 | 2020-05-08 | 2020-05-11 | 1 | 4 |
7 days | 1 | T | 2020-05-04 | 2020-06-13 | 2020-05-11 | 2 | 7+ |
7 days | 1 | T | 2020-05-04 | 2020-05-25 | 2020-05-11 | 3 | 7+ |
7 days | 2 | P | 2020-05-07 | 2020-05-11 | 2020-05-11 | 4 | 7 |
7 days | 1 | P | 2020-05-07 | 2020-05-25 | 2020-05-11 | 5 | 7+ |
7 days | 1 | T | 2020-05-08 | 2020-05-21 | 2020-05-11 | 6 | 7+ |
7 days | 1 | T | 2020-05-08 | 2020-06-17 | 2020-05-11 | 7 | 7+ |
7 days | 1 | T | 2020-05-10 | 2020-06-03 | 2020-05-11 | 8 | 7+ |
7 days | 1 | P | 2020-05-10 | 2020-06-19 | 2020-05-11 | 9 | 7+ |
safeLogrankTest(exact = FALSE, # to get the approximate safe logrank test
designObj = designObjL, # (based on the logrank Z-statistic)
dataSoFar$"survObj" ~ dataSoFar$"group")
##
## Safe Logrank Test
##
## data: dataSoFar$survObj by dataSoFar$group (P, T). nEvents = 2
## estimates: hazard ratio = 0.090124
##
## test: z = -1.7017, log(thetaS) = -0.35667
## e-value = 1.4879 > 1/alpha = 20 : FALSE
## alternative hypothesis: true hazard ratio is less than 1
##
## design: the test was designed with alpha = 0.05
## for minimal relevant hazard ratio = 0.7 (less)
For the second event, our logrank statistic on 2020-05-11 contains 1 additional event in the placebo control group, which means that the \(O_{T, 2} = 0\) in the treatment group. We have 8 participants at risk (\(N_2 = 8\)) (all enrolled participants except the first that experienced the event; the tenth participant is not yet in the study), 5 in the treatment group (\(N_{T, 2} = 5\)) and 3 in the placebo control group.
So the expected number of events in the treatment group at this second event time is \(E_{T, 2} = N_{T, 2} \cdot \frac{O_2}{N_2} = 5 \cdot \frac{1}{8} = \frac{5}{8}\).
Our logrank \(Z\)-score of the first and the second event combined is:
\[\begin{equation} Z^{(2)} = \frac{O_{T, 1} - E_{T, 1} + O_{T, 2} - E_{T, 2}}{\sqrt{V_{T, 1} + V_{T, 2}}} = \frac{0 - \frac{2}{5} + 0 - \frac{5}{8}}{\sqrt{\frac{2}{5} \cdot (1 - \frac{2}{5}) + \frac{5}{8} \cdot (1 - \frac{5}{8})}} = -1.4882057 \end{equation}\]
with \(V_{T, 2}\) the variance of the Bernoulli distribution with probability \(E_{T, 2}\):
\[\begin{equation} V_{T, 2} = E_{T, 2} \cdot (1 - E_{T, 2}) = \frac{5}{8} \cdot (1 - \frac{5}{8}) \end{equation}\]
So we should explicitly state that the data is left-truncated:
dataSoFar <- data[data$"dateRand" < calDate, ]
dataSoFar$"dateLastFup" <- calDate
dataSoFar$"time" <- pmin(dataSoFar$"time", calDate - dateRand.start)
dataSoFar$"status"[dataSoFar$"dateEvent/LastFup" > calDate] <- 1
dataSoFar$"survObj" <- Surv(time = dataSoFar$"dateRand" - dateRand.start,
time2 = dataSoFar$"time",
event = dataSoFar$"status",
type = "counting")
kable(dataSoFar, row.names = F)
time | status | group | dateRand | dateEvent/LastFup | dateLastFup | participantID | survObj |
---|---|---|---|---|---|---|---|
4 days | 2 | P | 2020-05-04 | 2020-05-08 | 2020-05-11 | 1 | (0,4] |
7 days | 1 | T | 2020-05-04 | 2020-06-13 | 2020-05-11 | 2 | (0,7+] |
7 days | 1 | T | 2020-05-04 | 2020-05-25 | 2020-05-11 | 3 | (0,7+] |
7 days | 2 | P | 2020-05-07 | 2020-05-11 | 2020-05-11 | 4 | (3,7] |
7 days | 1 | P | 2020-05-07 | 2020-05-25 | 2020-05-11 | 5 | (3,7+] |
7 days | 1 | T | 2020-05-08 | 2020-05-21 | 2020-05-11 | 6 | (4,7+] |
7 days | 1 | T | 2020-05-08 | 2020-06-17 | 2020-05-11 | 7 | (4,7+] |
7 days | 1 | T | 2020-05-10 | 2020-06-03 | 2020-05-11 | 8 | (6,7+] |
7 days | 1 | P | 2020-05-10 | 2020-06-19 | 2020-05-11 | 9 | (6,7+] |
safeLogrankTest(exact = FALSE, # to get the approximate safe logrank test
designObj = designObjL, # (based on the logrank Z-statistic)
dataSoFar$"survObj" ~ dataSoFar$"group")
##
## Safe Logrank Test
##
## data: dataSoFar$survObj by dataSoFar$group (P, T). nEvents = 2
## estimates: hazard ratio = 0.12189
##
## test: z = -1.4882, log(thetaS) = -0.35667
## e-value = 1.4099 > 1/alpha = 20 : FALSE
## alternative hypothesis: true hazard ratio is less than 1
##
## design: the test was designed with alpha = 0.05
## for minimal relevant hazard ratio = 0.7 (less)
We take the same approach for the third event:
calDate <- sort(data$"dateEvent/LastFup")[3]
d <- data$"time"[data$"dateEvent/LastFup" == calDate]
dataSoFar <- data[data$"dateRand" < calDate, ]
dataSoFar$"dateLastFup" <- calDate
dataSoFar$"time" <- pmin(dataSoFar$"time", calDate - dateRand.start)
dataSoFar$"status"[dataSoFar$"dateEvent/LastFup" > calDate] <- 1
dataSoFar$"survObj" <- Surv(time = dataSoFar$"dateRand" - dateRand.start,
time2 = dataSoFar$"time",
event = dataSoFar$"status",
type = "counting")
kable(dataSoFar, row.names = F)
time | status | group | dateRand | dateEvent/LastFup | dateLastFup | participantID | survObj |
---|---|---|---|---|---|---|---|
4 days | 2 | P | 2020-05-04 | 2020-05-08 | 2020-05-21 | 1 | ( 0, 4] |
17 days | 1 | T | 2020-05-04 | 2020-06-13 | 2020-05-21 | 2 | ( 0,17+] |
17 days | 1 | T | 2020-05-04 | 2020-05-25 | 2020-05-21 | 3 | ( 0,17+] |
7 days | 2 | P | 2020-05-07 | 2020-05-11 | 2020-05-21 | 4 | ( 3, 7] |
17 days | 1 | P | 2020-05-07 | 2020-05-25 | 2020-05-21 | 5 | ( 3,17+] |
17 days | 2 | T | 2020-05-08 | 2020-05-21 | 2020-05-21 | 6 | ( 4,17] |
17 days | 1 | T | 2020-05-08 | 2020-06-17 | 2020-05-21 | 7 | ( 4,17+] |
17 days | 1 | T | 2020-05-10 | 2020-06-03 | 2020-05-21 | 8 | ( 6,17+] |
17 days | 1 | P | 2020-05-10 | 2020-06-19 | 2020-05-21 | 9 | ( 6,17+] |
17 days | 1 | P | 2020-05-14 | 2020-06-23 | 2020-05-21 | 10 | (10,17+] |
For the third event, our logrank statistic on 2020-05-21 contains 1 additional event in the treatment group, which means that the \(O_{T, 3} = 1\). We have 8 participants at risk (\(N_3 = 8\)) (all enrolled participants except the first two that experienced the event), 5 in the treatment group (\(N_{T, 3} = 5\)) and 3 in the placebo control group.
So the expected number of events in the treatment group at this third event time is \(E_{T, 3} = N_{T, 3} \cdot \frac{O_3}{N_3} = 5 \cdot \frac{1}{8} = \frac{5}{8}\).
Our logrank \(Z\)-score of the first, second and third event combined is:
\[\begin{equation} Z^{(3)} = \frac{O_{T, 1} - E_{T, 1} + O_{T, 2} - E_{T, 2} + O_{T, 3} - E_{T, 3}}{\sqrt{V_{T, 1} + V_{T, 2} + V_{T, 3}}} = \frac{0 - \frac{2}{5} + 0 - \frac{5}{8} + 1 - \frac{5}{8}}{\sqrt{\frac{2}{5} \cdot (1 - \frac{2}{5}) + \frac{5}{8} \cdot (1 - \frac{5}{8}) + \frac{5}{8} \cdot (1 - \frac{5}{8})}} = -0.772088 \end{equation}\]
with \(V_{T, 3}\) the variance of the Bernoulli distribution with probability \(E_{T, 2}\):
\[\begin{equation} V_{T, 3} = E_{T, 3} \cdot (1 - E_{T, 3}) = \frac{5}{8} \cdot (1 - \frac{5}{8}) \end{equation}\]
safeLogrankTest(exact = FALSE, # to get the approximate safe logrank test
designObj = designObjL, # (based on the logrank Z-statistic)
dataSoFar$"survObj" ~ dataSoFar$"group")
##
## Safe Logrank Test
##
## data: dataSoFar$survObj by dataSoFar$group (P, T). nEvents = 3
## estimates: hazard ratio = 0.41003
##
## test: z = -0.77209, log(thetaS) = -0.35667
## e-value = 1.2102 > 1/alpha = 20 : FALSE
## alternative hypothesis: true hazard ratio is less than 1
##
## design: the test was designed with alpha = 0.05
## for minimal relevant hazard ratio = 0.7 (less)
We take a slightly different approach to the fourth and fifth event because these are tied:
calDate <- sort(data$"dateEvent/LastFup")[4]
d <- data$"time"[data$"dateEvent/LastFup" == calDate]
dataSoFar <- data[data$"dateRand" < calDate, ]
dataSoFar$"dateLastFup" <- calDate
dataSoFar$"time" <- pmin(dataSoFar$"time", calDate - dateRand.start)
dataSoFar$"status"[dataSoFar$"dateEvent/LastFup" > calDate] <- 1
dataSoFar$"survObj" <- Surv(time = dataSoFar$"dateRand" - dateRand.start,
time2 = dataSoFar$"time",
event = dataSoFar$"status",
type = "counting")
kable(dataSoFar, row.names = F)
time | status | group | dateRand | dateEvent/LastFup | dateLastFup | participantID | survObj |
---|---|---|---|---|---|---|---|
4 days | 2 | P | 2020-05-04 | 2020-05-08 | 2020-05-25 | 1 | ( 0, 4] |
21 days | 1 | T | 2020-05-04 | 2020-06-13 | 2020-05-25 | 2 | ( 0,21+] |
21 days | 2 | T | 2020-05-04 | 2020-05-25 | 2020-05-25 | 3 | ( 0,21] |
7 days | 2 | P | 2020-05-07 | 2020-05-11 | 2020-05-25 | 4 | ( 3, 7] |
21 days | 2 | P | 2020-05-07 | 2020-05-25 | 2020-05-25 | 5 | ( 3,21] |
17 days | 2 | T | 2020-05-08 | 2020-05-21 | 2020-05-25 | 6 | ( 4,17] |
21 days | 1 | T | 2020-05-08 | 2020-06-17 | 2020-05-25 | 7 | ( 4,21+] |
21 days | 1 | T | 2020-05-10 | 2020-06-03 | 2020-05-25 | 8 | ( 6,21+] |
21 days | 1 | P | 2020-05-10 | 2020-06-19 | 2020-05-25 | 9 | ( 6,21+] |
21 days | 1 | P | 2020-05-14 | 2020-06-23 | 2020-05-25 | 10 | (10,21+] |
Now we have two events with the same calendar event times d = 21, 21
.
Our fourth logrank statistic on 2020-05-25 contains these 2 additional events, with only one of them in the treatment group (\(O_{T, 4} = 1\)). We have 7 participants at risk (\(N_4 = 7\)) (all enrolled participants except the first three that experienced the event), 4 in the treatment group (\(N_{T, 4} = 4\)) and 3 in the placebo control group.
So the expected number of events in the treatment group at this fourth calendar event time is \(E_{T, 4} = N_{T, 4} \cdot \frac{O_4}{N_4} = 4 \cdot \frac{2}{7} = \frac{8}{7}\).
Our logrank \(Z\)-score of the first, second, third and fourth calendar event times combined is:
\[\begin{equation} Z^{(4)} = \frac{O_{T, 1} - E_{T, 1} + O_{T, 2} - E_{T, 2} + O_{T, 3} - E_{T, 3} + O_{T, 4} - E_{T, 4}}{\sqrt{V_{T, 1} + V_{T, 2} + V_{T, 3} + V_{T, 4}}} = \frac{0 - \frac{2}{5} + 0 - \frac{5}{8} + 1 - \frac{5}{8} + 1 - \frac{8}{7}}{\sqrt{\frac{2}{5} \cdot (1 - \frac{2}{5}) + \frac{5}{8} \cdot (1 - \frac{5}{8}) + \frac{5}{8} \cdot (1 - \frac{5}{8}) + \frac{8}{7} \cdot \frac{5}{7} \cdot \frac{3}{6}}} = -0.7502141 \end{equation}\]
with \(V_{T, 4}\) the variance of the hypergeometric distribution with probability \(E_{T, 4}\):
\[\begin{equation} V_{T, 4} = E_{T, 4} \cdot \left(\frac{N_4 - O_4}{N_4}\right) \cdot \left(\frac{N_4 - N_{T, 4}}{N_4 - 1}\right) = \frac{8}{7} \cdot \frac{5}{7} \cdot \frac{3}{6} \end{equation}\]
safeLogrankTest(exact = FALSE, # to get the approximate safe logrank test
designObj = designObjL, # (based on the logrank Z-statistic)
dataSoFar$"survObj" ~ dataSoFar$"group")
##
## Safe Logrank Test
##
## data: dataSoFar$survObj by dataSoFar$group (P, T). nEvents = 5
## estimates: hazard ratio = 0.51119
##
## test: z = -0.75021, log(thetaS) = -0.35667
## e-value = 1.2456 > 1/alpha = 20 : FALSE
## alternative hypothesis: true hazard ratio is less than 1
##
## design: the test was designed with alpha = 0.05
## for minimal relevant hazard ratio = 0.7 (less)
The above shows the rationale of retrospectively obtaining a sequence of e-values for various calendar dates in the past. To process an entire data set at once, the following code obtains two sequences of e-values for two one-sided tests:
eValuesL <-
eValuesG <- structure(rep(NA, times = max(data$"dateLastFup") -
dateRand.start + 1),
names = as.character(seq.Date(from = dateRand.start,
to = max(data$"dateLastFup"),
by = "day")))
interimCalDates <- as.character(sort(unique(data$"dateEvent/LastFup"[data$"status" == 2])))
# before you observe any event, your e-value is 1
eValuesL[1:(which(names(eValuesL) == interimCalDates[1]) - 1)] <- 1
eValuesG[1:(which(names(eValuesG) == interimCalDates[1]) - 1)] <- 1
for (calDate in as.character(seq.Date(from = as.Date(interimCalDates[1]),
to = max(data$"dateLastFup"),
by = "day"))) {
# at days on which you do not observe an event
if (!(calDate %in% interimCalDates)) {
# evidence stays the same as the day before
eValuesL[calDate] <- eValuesL[as.character(as.Date(calDate) - 1)]
eValuesG[calDate] <- eValuesG[as.character(as.Date(calDate) - 1)]
} else {
dataSoFar <- data[data$"dateRand" < as.Date(calDate), ]
dataSoFar$"time" <- pmin(dataSoFar$"time", as.Date(calDate) - dateRand.start)
dataSoFar$"status"[dataSoFar$"dateEvent" > as.Date(calDate)] <- 1
dataSoFar$"survObj" <- Surv(time = dataSoFar$"dateRand" - dateRand.start,
time2 = dataSoFar$"time",
event = dataSoFar$"status",
type = "counting")
eValuesL[calDate] <- safeLogrankTest(dataSoFar$"survObj" ~ dataSoFar$"group",
designObj = designObjL, exact = FALSE
)$"eValue"
eValuesG[calDate] <- safeLogrankTest(dataSoFar$"survObj" ~ dataSoFar$"group",
designObj = designObjG, exact = FALSE
)$"eValue"
}
}
designObjL <- designSafeLogrank(hrMin = 0.7,
alpha = 0.025,
alternative = "less", # one-sided test hr < 1
ratio = 1)
designObjL
##
## Safe Logrank Test Design
##
## minimal hazard ratio = 0.7
## alternative = less
## parameter: log(thetaS) = -0.3566749
## alpha = 0.025
## decision rule: e-value > 1/alpha = 40
##
## Timestamp: 2021-06-30 10:11:15 CEST
designObjG <- designSafeLogrank(hrMin = 1/0.7,
alpha = 0.025,
alternative = "greater", # one-sided test hr > 1
ratio = 1)
designObjG
##
## Safe Logrank Test Design
##
## minimal hazard ratio = 1.428571
## alternative = greater
## parameter: log(thetaS) = 0.3566749
## alpha = 0.025
## decision rule: e-value > 1/alpha = 40
##
## Timestamp: 2021-06-30 10:11:15 CEST
We have obtained the following e-values:
eValuesL
## 2020-05-04 2020-05-05 2020-05-06 2020-05-07 2020-05-08 2020-05-09 2020-05-10
## 1.000000 1.000000 1.000000 1.000000 1.138498 1.138498 1.138498
## 2020-05-11 2020-05-12 2020-05-13 2020-05-14 2020-05-15 2020-05-16 2020-05-17
## 1.409919 1.409919 1.409919 1.409919 1.409919 1.409919 1.409919
## 2020-05-18 2020-05-19 2020-05-20 2020-05-21 2020-05-22 2020-05-23 2020-05-24
## 1.409919 1.409919 1.409919 1.210197 1.210197 1.210197 1.210197
## 2020-05-25 2020-05-26 2020-05-27 2020-05-28 2020-05-29 2020-05-30 2020-05-31
## 1.245648 1.245648 1.245648 1.245648 1.245648 1.245648 1.245648
## 2020-06-01 2020-06-02 2020-06-03 2020-06-04 2020-06-05 2020-06-06 2020-06-07
## 1.245648 1.245648 1.053283 1.053283 1.053283 1.053283 1.053283
## 2020-06-08 2020-06-09 2020-06-10 2020-06-11 2020-06-12 2020-06-13 2020-06-14
## 1.053283 1.053283 1.053283 1.053283 1.053283 1.053283 1.053283
## 2020-06-15
## 1.053283
eValuesG
## 2020-05-04 2020-05-05 2020-05-06 2020-05-07 2020-05-08 2020-05-09 2020-05-10
## 1.0000000 1.0000000 1.0000000 1.0000000 0.8508546 0.8508546 0.8508546
## 2020-05-11 2020-05-12 2020-05-13 2020-05-14 2020-05-15 2020-05-16 2020-05-17
## 0.6655506 0.6655506 0.6655506 0.6655506 0.6655506 0.6655506 0.6655506
## 2020-05-18 2020-05-19 2020-05-20 2020-05-21 2020-05-22 2020-05-23 2020-05-24
## 0.6655506 0.6655506 0.6655506 0.7511151 0.7511151 0.7511151 0.7511151
## 2020-05-25 2020-05-26 2020-05-27 2020-05-28 2020-05-29 2020-05-30 2020-05-31
## 0.6847667 0.6847667 0.6847667 0.6847667 0.6847667 0.6847667 0.6847667
## 2020-06-01 2020-06-02 2020-06-03 2020-06-04 2020-06-05 2020-06-06 2020-06-07
## 0.6847667 0.6847667 0.7844771 0.7844771 0.7844771 0.7844771 0.7844771
## 2020-06-08 2020-06-09 2020-06-10 2020-06-11 2020-06-12 2020-06-13 2020-06-14
## 0.7844771 0.7844771 0.7844771 0.7844771 0.7844771 0.7844771 0.7844771
## 2020-06-15
## 0.7844771
Plot these e-values by their calendar date: