This document gives a short introduction to the package BayesAdaptive, which can be used to design and conduct Bayesian response adaptive (BAR) trials. In the design phase, the package is used to compute operating characteristics of a BAR design, where the package supports either a subpopulation finding design (SFD) or a subpopulation stratified design (SSD). In the experimental phase, while conducting the trial, the package enables (a) randomizing patients to treatment arms and (b) checking early stopping rules for futility/efficacy.
Here, we focus on the design phase and show how to set up a set of simulations. Typically we choose the sample size based on a set of alternative treatment scenarios.
To simulate trials you will have to proceed in three steps:
Simulate arrival and potential outcome data.
Specify design options and prior parameter.
Run simulations using the data generated in 1. under the option specified in 2.
In the first step you have to generate patient arrival and potential outcome data. The functions arrival.process() and get.outcome() generate such datasets. Throughout the document we follow a short example. We generate 20 datasets and start by defining a variable that specifies the number of datasets, and a list of labels for disease types, treatment arms and subpopulations.
library(BayesAdaptive)
nr.datasets =20 # number of data-sets
NAME = list(NAME.D=paste0("disease_", 1:2), # label for disease types
NAME.A=c("control", paste0("agent", 1:3)), # label for arms
NAME.M=paste0("module", 1:3)) # label for subpopulations
Patient arrival data-sets are generated with arrival.process() based on a homogeneous Poisson process. The function requires a matrix of arrival-rates, rates.by.disease, and total sample sizes, mutants.by.disease, where for both matrices the element \((m,d)\) denotes the mean accrual per week and the total accrual of patients with disease \(d\) in subpopulation \(m\). Instead of mutants.by.disease you can also specify mutants.by.module in which case the total number of patients by subpopulation \(N_m =\sum_d N_{d,m}\) is fixed, but the number of patients with disease \(d\), \(N_{d,m}\), is random. We specify a design with 3 subpopulations and 2 disease types, where for each combination of disease and subpopulation we specify an overall sample size of \(N_{d,m}=100\) patients.
(mutants.by.disease = matrix(100, 3,2, dimnames = list(NAME$NAME.M, NAME$NAME.D)) )
## disease_1 disease_2
## module1 100 100
## module2 100 100
## module3 100 100
And we specify the accrual rates as
(rates.by.disease = matrix( c(1.2, 0.7, 1.0, 0.5, 2.2, 0.9), 3,2,
dimnames = list(NAME$NAME.M, NAME$NAME.D)))
## disease_1 disease_2
## module1 1.2 0.5
## module2 0.7 2.2
## module3 1.0 0.9
Lastly we generate nr.datasets=20 patient arrival data-sets with arrival.process() where we fix the seed at 123.
Arrival = arrival.process(
nr.datasets = nr.datasets,
seed = 123,
rates.by.disease = rates.by.disease,
mutants.by.disease = mutants.by.disease)
The output Arrival is a list of arrival datasets, where each element Arrival[[n]] is a matrix with sum(mutants.by.disease) \(=2 \times 3 \times 100=600\) rows and 3 columns. The n-th row of each matrix Arrival[[n]][i,] denotes the arrival time, the disease and the subpopulation of the i-th patient. For instance, the 10th patient in the first data set enters the trial at week 1, belongs to subpopulation 3, and has disease 1.
Arrival[[1]][10:12,]
## Arrival Disease Module
## 10-th patient 1.549 1 3
## 11-th patient 1.625 1 2
## 12-th patient 1.744 1 2
We can quickly check the total number of patients in the data-set with \((d,m)\) combination of disease and mutation
## number of patient by subpopulation and disease
xtabs(~ Disease + Module, Arrival[[1]] )
## Module
## Disease 1 2 3
## 1 100 100 100
## 2 100 100 100
which is 100 as desired.
We also generate potential outcome data before simulating any trial. This speeds up computations when we simulate trials. Moreover using the same outcome data in simulations under different prior/randomization parameters reduces the Monet-Carlo error when we compare trial results across prior/randomization parameters.
To fix a response-scenario, we specify the probability of response to treatment \(a\) for each disease and subpopulation \((d,m)\). Let’s consider 3 experimental arms and a standard of care. The response rate equals 0.3 for the standard of care and for the second/third experimental arms in population one, and a rate of 0.4 for the same arms in population 2 and 3. Agent 1 has a positive treatment effect of 0.25 in subpopulation 1 and 2.
rate = array(dim = c(2,4,3), dimnames = NAME );
rate[,,1] = .3
rate[,,-1] = .4
rate[,2,1] = .55
rate[,2,2] = .65
rate
## , , NAME.M = module1
##
## NAME.A
## NAME.D control agent1 agent2 agent3
## disease_1 0.3 0.55 0.3 0.3
## disease_2 0.3 0.55 0.3 0.3
##
## , , NAME.M = module2
##
## NAME.A
## NAME.D control agent1 agent2 agent3
## disease_1 0.4 0.65 0.4 0.4
## disease_2 0.4 0.65 0.4 0.4
##
## , , NAME.M = module3
##
## NAME.A
## NAME.D control agent1 agent2 agent3
## disease_1 0.4 0.4 0.4 0.4
## disease_2 0.4 0.4 0.4 0.4
We then generate nr.datasets=20 potential outcome datasets with get.outcome(). The function requires the arrival data Arrival as an input
Outcome = get.outcome(
seed = 123,
rate = rate,
Arrival = Arrival)
The output Outcome is a list of matrices, where each matrix Outcome[[i]] has one row for each patient and one column for each agent. The n-th row and a-th column corresponds to a potential outcome of the n-th patient when treated with agent a, conditional on the patient’s subpopulation and disease. For example the first two patients in the 1st data-set would respond to the 4th agent, but fail to respond to agent 3.
Outcome[[1]][1:2,]
## agent_1 agent_2 agent_3 agent_4
## 1-th_patient 0 0 0 1
## 2-th_patient 1 1 0 1
You can specify several design parameters according to your design needs. We elaborate on the most important parameters, and focus on restricting treatment arms to a subset of subpopulations and diseases; or choosing hyper-parameter for the Bayesian hierarchical model; and lastly specifying the parameter of the randomization function.
You may explore treatment arms only for a subset of \((d,m)\) combinations of diseases and subpopulations. This can be done by creating an eligibility array eligibility.array, where eligibility.array[d,a,m] equals one if agent a is a treatment option for disease/subpopulation (d,m) and zero otherwise. For a design without an active control the user has to set the first column of the array equal to zero, i.e. eligibility.array[,1,] =0. As an example, let’s say we want to test 3 experimental arms against a disease-specific control arm. The second experimental agent will be tested only in the first subpopulation and only for the first disease. All other arms are tested for each \((d,m)\) combination.
eligibility.array = array(1, dim = c(2,4,3), dimnames=NAME)
eligibility.array[,3,-1] = 0
eligibility.array[-1,3,1] = 0
eligibility.array
## , , NAME.M = module1
##
## NAME.A
## NAME.D control agent1 agent2 agent3
## disease_1 1 1 1 1
## disease_2 1 1 0 1
##
## , , NAME.M = module2
##
## NAME.A
## NAME.D control agent1 agent2 agent3
## disease_1 1 1 0 1
## disease_2 1 1 0 1
##
## , , NAME.M = module3
##
## NAME.A
## NAME.D control agent1 agent2 agent3
## disease_1 1 1 0 1
## disease_2 1 1 0 1
Adaptive randomization is currently implemented as a binary Probit-regression model, i.e a normal cdf link function \(g = \Phi\), which relates the linear predictor with the probability of treatment success. The probability model for the treatment outcome equals
\[P(R_i = 1 | D_i=d, M_i=m, A_i = a) = p_{d,m,a} = \Phi(\theta_{d,m,a}).\]
The array \(\theta = \{ \theta_{d,m,a} \}\) follows a Gaussian prior, where each \(\theta_{d,m,a}\) is decomposed into independent Gaussian components \(\theta_{d,m,a}= \theta_{d,m,0} +I(a>0) \zeta_{d,a,m}.\) The probability \(\Phi(\theta_{d,m,0}) = \Phi(\alpha_d + \alpha_{d,m})\) corresponds to the response probability of the standard-of-care, such that \(\alpha_d \sim N(\mu_d, W_1)\) and \(\alpha_{d,m} \sim N(\mu_{d,m}, W_2)\). Moreover, \(\zeta_{d,m,a} = \beta_a + \beta_{m,a} + \beta_{d,m,a}\) represents the treatment effect on the inverse normal scale. Here \(\beta_a \sim N( 0, W_3)\) represents the general treatment effect across subpopulations, \(\beta_{a,m} \sim N( 0, W_4)\) is a subpopulation specific random effect, and \(\beta_{d,a,m} \sim N( 0, W_5)\) is a disease-specific effect within subpopulation \(m\).
Prior parameters have to be specified with Prior.list.Fct(), which creates the prior covariance matrix of \(\theta\) and internal object for posterior estimation. The function requires two input parameters, the eligibility array and a vector which contains the variance parameter \(W_i, i=1, \cdots, 5\) for all five parameters. If no control arms are specified, i.e. eligibility.array[,1,]=0, then the function sets the variance \(W_1\) and \(W_2\) equal to 0 and suppresses both \(\alpha_d\) and \(\alpha_{d,m}\) from the model. As an example, we consider a prior such that \(p_{d,m,a} = \Phi( \theta_{d,m,a})\) is marginally uniformly distributed on \([0,1]\).
Prior = Prior.list.Fct(
eligibility.array=eligibility.array,
Var.vec =c(W1=.5,W2=.1,W3=.3,W4=.09,W5=.01))
names(Prior)
## [1] "eligibility.vec" "Prior_list"
### mean and -1 times the inverse covariance matrix
names(Prior$Prior_list)
## [1] "theta" "m1H.prior"
The function Plot.prior() can be used to select suitable prior parameters. With option plot.figure=1, the function plots a 4-panel figure. Panel (a) shows the marginal prior density of the response probability \(p_{d,m,a}, a>0\); panel (b) shows the joined prior densities \((p_{d,m,a}, p_{d',m,a})\) for \(d\neq d', a>0\); panel (c) shows the density of \((p_{d,m,a}, p_{d,m',a})\) for \(m\neq m'\); and panel (d) shows \((p_{d,m,a}, p_{d',m',a})\) for \(d\neq d', m\neq m'\).
Plot.prior(Var=c(.5,.1,.3,.09,.01), plot.figure=1)
With option plot.figure=2 the function shows the conditional distribution of \((p_{d,m,a}, p_{d',m,a})\) given the control treatment arm \((p_{d,m,0}, p_{d',m,0})\) for distinct disease \(d\neq d'\) in the same subpopulation.
Plot.prior(Var=c(.5,.1,.3,.09,.01), plot.figure=2)
Lastly, the option plot.figure=3 shows the conditional distribution \((p_{d,m,a}, p_{d',m',a})\) for distinct disease \(d\neq d'\) in different subpopulation \(m \neq m'\), given the control treatment arm \((p_{d,m,0}, p_{d',m',0})\) .
Plot.prior(Var=c(.5,.1,.3,.09,.01), plot.figure=3)
Response-adaptive randomization is implemented through the model \[ P[ C_i = a | D_i=d, M_i=m, \Sigma_i ] \propto S_{d,m,a}(i) \exp( 5 ** (N^\star - N_{d,m,0}(i) )_+ ) I_{d,m,a}(i). \] where \(x_+ = x I(x>0)\), \(N^\star\) is a design parameter, and \(C_i=a\) is the event of randomizing the \(i\)-th patient to agent \(a\) conditional on the disease and subpopulation \((D_i=d,M_i=m)\). Here \(I_{d,m,a}(i)=1\) if agent \(a\) is still active for the combination \((d,m)\) at arrival of the i-th patient and 0 otherwise.
For trials with a control arm, the statistics \(S_{d,m,a}(i)\) equals \[S_{d,m,a}(i) \propto P[ p_{d,m,a} > p_{d,m,0} | \Sigma_i]^{h(i, d, m )},\] for \(a>0\) and for the control arm \[S_{d,m,0}(i) \propto \exp \{ c [ \max_{a>0} N'_{d,m,a}(i) - N'_{d,m,0}(i) ] \},\] where \(N'_{d,m,a}(i)\) represents the number of patients randomized to agent \(a\) with disease/subpopulation \((d,m)\). For trials without a control arm \(S_{d,m,a}(i)\) equals \[S_{d,m,a}(i) \propto P[ \cap_{a'\neq a} \{ p_{d,m,a} > p_{d,m,0} \} | \Sigma_i]^{h(i, m,d )}.\]
The package implements the function \(h(i, d,m)=A*N_{d,m}(i)^b\), where the parameters A and b can be a scalar and identical for each disease and subpopulation \((d,m)\) or dependent on \((d,m)\).
If the planned total sample size for each combination \((d,m)\) is identical, we recommend choosing A,b identical across all combinations. If the planned total sample size differs across \((d,m)\), then A,b should be specified as an array, where \(A[d,a,m], b[d,a,m]\) \(a\geq 1\) represents the parameter of \(h(i, d,m)=a**N_{d,m}(i)^b\) for the combination \((d,m)\). Since we specified a total sample size of \(N_{d,m}=100\) patients for each disease and subpopulation (d,m), we use identical parameter and specify \(h\) such that \(h(i,d,m) = 1\) after half of the total number of patient in \((d,m)\) are randomized, i.e. \(N_{d,m}(i)= N_{d,m}/2 = 50\), and \(h(i,d,m) = 4\) at the end of the trial when \(N_{d,m}(i)= N_{d,m} = 100\).
N =100
b =-log(4)/log(.5)
A = 4/N^b
The randomization parameters are saved as a list
rand.vec =list(a = A, b = b, c = .01, N.star = 10)
where in addition to A and b, we also require a minimum of N.star=10 patients to be randomized to each \((d,m,a)\) combination.
Early stopping rules for futility and efficacy are based on stopping such that the combination \((d,m,a)\) is stopped for efficacy if \(V'_{d,m,a} \geq b'_{d,m,a}(i)\) or stopped for futility if \(V''_{d,m,a} \leq b''_{d,m,a}\).
For the SSD design, \(V'_{d,m,a}\) is the z-statistics for binary data, and for the SFD design \(V'_{d,m,a}\) is the maximum of the z-statistics over all disease \(d\) in \((d,m,a)\). The futility statistics \(V''_{d,m,a}\) is the posterior probability of a positive treatment effect for both designs. The stopping functions equal \[b'_{d,m,a}(i) = \lambda'_{d,m,a} ( 1 + s_1 s_2 ^{\overline{N}_{d,m,a}(i) - N.min} )\] and \[b''_{d,m,a}(i)= \lambda'' (1- s_3 ^{ \overline{N}_{d,m,a}(i) } )\].
For both functions \(\overline{N}_{d,m,a}(i) = N_{d,m,a}(i)\) for the SSD design, wheras \(\overline{N}_{d,m,a}(i) = \sum_d N_{d,m,a}(i)\) for the SFD design. Here we select efficacy and futility boundary parameters \(\lambda'_{d,m,a}=2, s_1=2,s_3=.8\) and \(\lambda''=.05, s_3=0.8\) and plot both boundaries in the following figure
The stopping parameters \(s_1,s_2,s_3, \lambda', \lambda'', N_{\min}\) are saved as a list
stopping.rules = list(b.futil = .05, ## lambda''
shape.futi = .9, ## s_3
b.effic = 2, ## lambda'
shape1.effic = 2, ## s_1
shape2.effic = .8, ## s_2
N.min = 15)
The function Simulate.trial() simulates Bayesian response-adaptive multi-disease, multi-subpopulation trials, and requires
the arrival data Arrival,
the potential outcome data Outcome,
the prior parameterd Prior,
the parameters for the response-adaptive randomization algorithm rand.vec and,
the list of parameters for the early stopping boundaries stopping.rules.
We generated all five objects already. In addition, two more parameters have to be specified, namely Time.Delay and Design. The variable Time.Delay is the time period between the beginning of the treatment and the time when the treatment outcome is evaluated. The variable Design is a vector of two elements, where Design[1] represents the design option, with \(=1\) for the SFD and \(=2\) for the SSD. The second element Design[2] represents the randomization option, with \(=1\) if a standard-of-care control arm exists, \(=2\) if a historical estimate of \(p_{d,m,0}\) for the standard-of-care exists and should be used for the randomization statistics, and \(=3\) if no estimate of \(p_{d,m,0}\) exists. As explained above, if Design[2]=3 then \(S_{d,m,a}(i) \propto P[ \cap_{a'\neq a} \{ p_{d,m,a} > p_{d,m,0} \} | \Sigma_i]^{h(i, d, m )}\) will be used.
If you choose to use historical estimates of \(p_{d,m,0}\), i.e. Design[2]=2, then you have to specify these estimates as a matrix p.historical, where the element p.historical[d,m] is the probability of a positive response for the \((d,m)\) disease-subpopulation combination.
We select a SSD with active control agent
Time.Delay= 8
Design = c(2,1)
The simulation output is specified through the vector Check, which has 6 elements, c(drop, alloc, resp.pr, stat, stat.all, sim.initial), all of which should be FALSE or TRUE.
The first component, Drop, specifies whether stopping rules should be applied. The next three arguments alloc, resp.pr and stat control if patient allocation, the empirical response probabilities and the test statistics at arrival of each patient should be returned. If Check[“stat.all”] equals FALSE then the futility and efficacy statistics for the combination DAM.check = c(d, a, m) is monitored during the trial. If last component Check[“sim.initial”] is TRUE then trials are simulated only until the arrival of the \(i^\star\)-th patient, where \(i^\star = \min \{i : N_{d,m,a}>N_{\min}\) for some \((d,m,a) \}\). As an example, we apply early stopping rules and save the patient allocation and the efficacy statistics for all combinations \((d,a,m)\).
Check = c(drop = TRUE,
alloc = TRUE,
resp.pr = FALSE,
stat = TRUE,
stat.all = TRUE,
sim.initial = FALSE)
DAM.check = c(d=1, a=1, m=1)
And simulate 20 trials with Simulate.trial()
trial.outcome = Simulate.trial(
seed = 111,
ArrivalData = Arrival,
ResponseData = Outcome,
Time.Delay = Time.Delay,
Design = Design,
Prior = Prior,
rand.vec = rand.vec,
stopping.rules = stopping.rules,
Check = Check)
The output is a list of trial results, in this case 20 trials, with the output options specified in Check
## number of trials
length(trial.outcome)
## [1] 20
## output for the 20th trial
names(trial.outcome[[20]])
## [1] "alloc" "Effi.all" "status" "Rand"
## [5] "Accrual" "Resp_RiskPop" "Resp_Events"
The element trial.outcome[[i]]$status gives an overview of the trial, we check the results for the second biomarker-subpopulation
trial.outcome[[20]]$status[,,,"module2"]
## , , status
##
## Control agent_1 agent_2 agent_3
## disease_1 1 1 -1 1
## disease_2 1 1 -1 1
##
## , , closing_time
##
## Control agent_1 agent_2 agent_3
## disease_1 NA NA -1 NA
## disease_2 NA NA -1 NA
##
## , , allocation_pr
##
## Control agent_1 agent_2 agent_3
## disease_1 0.3801061 0.3502837 0 0.2696102
## disease_2 0.3801061 0.3562308 0 0.2636631
##
## , , efficacy_stat
##
## Control agent_1 agent_2 agent_3
## disease_1 NA 1.95193 -Inf 0.1750012
## disease_2 NA 1.65090 -Inf 0.1900128
##
## , , futility_stat
##
## Control agent_1 agent_2 agent_3
## disease_1 NA 0.9987314 0.9975423 0.6819988
## disease_2 NA 0.9984742 0.9849826 0.6930017
The first slice of the array status returns the status of each combination of agents and diseases, where \(=-1\) denote that the agent was not eligible for the combination, \(=0\) that agent was dropped early for futility, \(=1\) not dropped for efficacy or futility; and \(=2\) stopped for efficacy. The second slice closing_time gives the closing time for arms that were stopped early for futility or efficacy. Lastly, the slices efficacy_stat and futility_stat return the efficacy and futility statistics at the end of the trial.
Similarly, the output elements Resp_RiskPop and Resp_Events return the number of observed outcomes and the number of observed positive outcomes at the end of the trial for each combination \((d,m,a)\) of disease, mutations and arms.
trial.outcome[[20]]$Resp_RiskPop[,,"module2"]
## Control agent_1 agent_2 agent_3
## disease_1 36 33 0 31
## disease_2 34 34 0 32
trial.outcome[[20]]$Resp_Events[,,"module2"]
## Control agent_1 agent_2 agent_3
## disease_1 12 19 0 11
## disease_2 13 20 0 13
## response probability
trial.outcome[[20]]$Resp_Events[,,"module2"]/
trial.outcome[[20]]$Resp_RiskPop[,,"module2"]
## Control agent_1 agent_2 agent_3
## disease_1 0.3333333 0.5757576 NaN 0.3548387
## disease_2 0.3823529 0.5882353 NaN 0.4062500
Since we specified the patient allocation in Check, we can also monitor patient allocation during the trial.
Col = c("black", "red", "green", "blue")
par(mfrow=c(1,2), mar=c(4,4,2,0))
plot( c(1,1), c(0,0), xlim=c(0,601), xlab="total number of patients i", ylab="Randomized to (1,a,2)",
ylim=c(0,max(trial.outcome[[20]]$alloc[,,,2])), main="disease 1" )
for(a in 1:4) lines(1:601, trial.outcome[[20]]$alloc[1,a,,2], col=Col[a] )
plot( c(1,1), c(0,0), xlim=c(0,601), xlab="total number of patients i", ylab="Randomized to (2,a,2)",
ylim=c(0,max(trial.outcome[[20]]$alloc[,,,2])), main="disease 2" )
for(a in 1:4) lines(1:601, trial.outcome[[20]]$alloc[2,a,,2], col=Col[a] )
legend("topleft", legend = c("Control", paste("Agent", 1:3)), text.col = Col, bty = "n")