We investigate fixed-budget best arm identification (BAI) for expected simple regret minimization. In each round of an adaptive experiment, a decision maker draws one of multiple treatment arms based on past observations and subsequently observes the outcomes of the chosen arm. After the experiment, the decision maker recommends a treatment arm with the highest projected outcome. We evaluate this decision in terms of the expected simple regret, a difference between the expected outcomes of the best and recommended treatment arms. Due to the inherent uncertainty, we evaluate the regret using the minimax criterion. For distributions with fixed variances (location-shift models), such as Gaussian distributions, we derive asymptotic lower bounds for the worst-case expected simple regret. Then, we show that the Random Sampling (RS)-Augmented Inverse Probability Weighting (AIPW) strategy proposed by Kato et al. (2022) is asymptotically minimax optimal in the sense that the leading factor of its worst-case expected simple regret asymptotically matches our derived worst-case lower bound. Our result indicates that, for location-shift models, the optimal RS-AIPW strategy draws treatment arms with varying probabilities based on their variances. This result contrasts with the results of Bubeck et al. (2011), which shows that drawing each treatment arm with an equal ratio is minimax optimal in a bounded outcome setting.
翻译:我们调查了固定预算最佳手臂识别(BAI),以达到预期的简单遗憾最小程度。在每一轮适应性实验中,决策者根据以往的观察结果抽取多种处理武器之一,然后观察所选手臂的结果。在实验之后,决策者建议了一个处理武器,预测结果最高。我们根据预期的简单遗憾来评估这一决定,最佳治疗武器与推荐治疗武器预期结果之间的差异。由于内在的不确定性,我们用微缩标准来评估遗憾。对于有固定差异的分布(定点-定点模式),例如高山分布,我们得出了最坏情况预期的简单遗憾的下限。然后,我们展示了随机抽样采集(RS)-放大预测(AIPW)的治疗方法,根据预期的简单遗憾(地点-定点-位模式)来评估这一决定。我们发现,卡托等人等人(2022年)提出的随机抽样(RAPW)战略(AIPW)与预期的预期结果是微缩缩缩缩缩,因为其最坏情况中的主要因素预期是简单的遗憾,与我们得到的最坏情况下的最坏的配置IP。我们的结果是,我们显示以最坏结果模型绘制了以最优比例模式绘制了最优军备对比结果。