We consider the fixed-budget best arm identification problem with Normal rewards. In this problem, the forecaster is given $K$ arms (treatments) and $T$ time steps. The forecaster attempts to find the best arm in terms of the largest mean via an adaptive experiment conducted with an algorithm. The performance of the algorithm is measured by the simple regret, or the quality of the estimated best arm. It is known that the frequentist simple regret can be exponentially small to $T$ for any fixed parameters, whereas the Bayesian simple regret is $\Theta(T^{-1})$ over a continuous prior distribution. This paper shows that Bayes optimal algorithm, which minimizes the Bayesian simple regret, does not have an exponential simple regret for some parameters. This finding contrasts with the many results indicating the asymptotic equivalence of Bayesian and frequentist algorithms in fixed sampling regimes. While the Bayes optimal algorithm is described in terms of a recursive equation that is virtually impossible to compute exactly, we pave the way to an analysis by introducing a key quantity that we call the expected Bellman improvement.
翻译:我们认为,固定预算最佳手臂识别问题与正常回报相提并论。 在这个问题中, 预报员得到的是美元的武器( 处理) 和 $T 时间步骤。 预报员试图通过使用算法进行的适应性实验找到最大平均值的最佳手臂。 算法的性能是通过简单的遗憾或估计的最好手臂的质量来衡量的。 众所周知, 常客简单遗憾对于任何固定参数来说都可能指数性小到$T, 而巴耶斯简单遗憾是$\Theta( T ⁇ -1} ) 。 本文显示, Bayes 最佳算法( 尽可能减少巴耶斯简单遗憾) 对某些参数没有指数性简单遗憾。 这与在固定采样制度中显示巴耶斯和常客算法的无症状等值的许多结果形成对比。 虽然巴耶斯最佳算法是用一种累进式的公式描述的, 几乎不可能准确的, 我们为分析铺平铺平了一条路, 我们称之为贝尔曼预期改进的关键数字。