We consider applying multi-armed bandits to model-assisted designs for dose-finding clinical trials. Multi-armed bandits are very simple and powerful methods to determine actions to maximize a reward in a limited number of trials. Among the multi-armed bandits, we first consider the use of Thompson sampling which determines actions based on random samples from a posterior distribution. In the small sample size, as shown in dose-finding trials, because the tails of posterior distribution are heavier and random samples are too much variability, we also consider an application of regularized Thompson sampling and greedy algorithm. The greedy algorithm determines a dose based on a posterior mean. In addition, we also propose a method to determine a dose based on a posterior median. We evaluate the performance of our proposed designs for six scenarios via simulation studies.
翻译:我们考虑将多武装强盗用于剂量调查临床试验的模型辅助设计。多武装强盗是非常简单和有力的方法,用来确定在有限的试验中最大限度地获得奖励的行动。在多武装强盗中,我们首先考虑使用Thompson抽样,该抽样根据从后方分布的随机样本确定行动。如剂量调查试验所示,在小样本规模中,由于后方分布的尾巴较重,随机样本变化性太大,我们还考虑采用正规化的Thompson抽样和贪婪算法。贪婪算法根据后方平均值确定剂量。此外,我们还提出一种方法,根据后方中位分布的随机样本确定剂量。我们通过模拟研究来评估我们为六种情景提议的设计的业绩。