Multi-armed bandit algorithms have been argued for decades as useful for adaptively randomized experiments. In such experiments, an algorithm varies which arms (e.g. alternative interventions to help students learn) are assigned to participants, with the goal of assigning higher-reward arms to as many participants as possible. We applied the bandit algorithm Thompson Sampling (TS) to run adaptive experiments in three university classes. Instructors saw great value in trying to rapidly use data to give their students in the experiments better arms (e.g. better explanations of a concept). Our deployment, however, illustrated a major barrier for scientists and practitioners to use such adaptive experiments: a lack of quantifiable insight into how much statistical analysis of specific real-world experiments is impacted (Pallmann et al, 2018; FDA, 2019), compared to traditional uniform random assignment. We therefore use our case study of the ubiquitous two-arm binary reward setting to empirically investigate the impact of using Thompson Sampling instead of uniform random assignment. In this setting, using common statistical hypothesis tests, we show that collecting data with TS can as much as double the False Positive Rate (FPR; incorrectly reporting differences when none exist) and the False Negative Rate (FNR; failing to report differences when they exist)...
 翻译:数十年来,多武装匪盗算法一直被认为是适应性随机实验的有用工具。在这样的实验中,对参与者分配了不同的武器算法(例如帮助学生学习的替代干预措施),目的是给尽可能多的参与者分配更高的奖励武器。我们运用了强盗算法汤普森抽样(TS)在三个大学班里进行适应性实验。教官认为,在试图迅速利用数据使其学生在实验中使用更好的武器(例如,更好地解释一个概念)时极有价值。然而,我们的部署展示了科学家和从业者使用这种适应性实验的主要障碍:对具体现实世界实验受到多大影响缺乏可量化的统计分析(Pallmann等人,2018年;FDAD,2019年),与传统的统一随机任务相比,我们运用了强盗算算法Thompson Sampling(Thompson Sampling)而不是统一随机任务的影响的案例研究。在使用共同统计假设测试时,我们显示,与TS收集的数据可以使实际积极率(F)的差别翻倍(F),当它们不存在时,而没有真实率(F)存在时,我们报告时,不准确的差别时,我们用不准确报告时,我们使用这种差别。