We consider fixed-budget best-arm identification in two-armed Gaussian bandit problems. One of the longstanding open questions is the existence of an optimal strategy under which the probability of misidentification matches a lower bound. We show that a strategy following the Neyman allocation rule (Neyman, 1934) is asymptotically optimal when the gap between the expected rewards is small. First, we review a lower bound derived by Kaufmann et al. (2016). Then, we propose the "Neyman Allocation (NA)-Augmented Inverse Probability weighting (AIPW)" strategy, which consists of the sampling rule using the Neyman allocation with an estimated standard deviation and the recommendation rule using an AIPW estimator. Our proposed strategy is optimal because the upper bound matches the lower bound when the budget goes to infinity and the gap goes to zero.
翻译:我们把固定预算最佳武器识别方法视为两臂高山土匪问题。 长期存在的一个未决问题是存在一种最佳战略,根据这一战略,误认的可能性与较低约束值相符。 我们显示,在预期收益之间的差距很小时,采用奈曼分配规则(1934年,内曼分配规则(1934年,内曼分配规则)的策略是微不足道的最佳战略。 首先,我们审查考夫曼等人(2016年)得出的较低约束值。 然后,我们提出“奈曼分配(NA)强化反可变性加权法(AIPW)”战略,其中包括使用尼曼分配的抽样规则,使用估计标准偏差,以及使用AIPW估计代号的建议规则。我们提出的战略是最佳的,因为上限值在预算达到无限和差距达到零时与较低约束值相符。