Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
翻译:设计实验往往需要在学习真正的治疗效果和从分配更多样本到高级治疗中获得收入之间求得平衡。多武装盗匪问题(MABP)的最佳算法提供了最佳平衡学习和收入的分配政策,但这种算法往往在计算上很昂贵。 Gittins 指数(GI)是MABP的一个解决方案,它可以同时达到最佳和计算效率目标,最近用于Bernoulli和Gaussian的实验中。我们第一次提出了GI规则的修改,该规则可以在指数分配奖励的实验中使用。我们在模拟的2武装和3武装实验中报告其性能。与传统的非适应性设计相比,我们新的GI修改设计显示在学习(例如统计能力)方面的操作特征相似,但在收入(例如直接收益)方面则大得多。这说明,用GI方法来分配参与者,有可能提高参与者的利益,提高效率,并降低具有指数奖励的适应性多武装实验的实验的实验成本。