We consider the fixed-budget best-arm identification problem with Normal reward distributions. In this problem, the forecaster is given $K$ arms (or treatments) and $T$ time steps. The forecaster attempts to find the best arm, defined by the largest mean, via an adaptive experiment conducted using an algorithm. The algorithm's performance is measured by the simple regret, that is, the quality of the estimated best arm. The frequentist simple regret can be exponentially small to $T$, whereas the Bayesian simple regret is polynomially small to $T$. This paper demonstrates that Bayes optimal algorithm, which minimizes the Bayesian simple regret, does not produce an exponential simple regret for some parameters, a finding that contrasts with the many results indicating the asymptotic equivalence of Bayesian and frequentist algorithms in the context of fixed sampling regimes. While the Bayes optimal algorithm is described in terms of a recursive equation that is virtually impossible to compute exactly, we establish the foundations for further analysis by introducing a key quantity that we call the expected Bellman improvement.
翻译:我们用正常的奖励分配来考虑固定预算最佳武器识别问题。 在这个问题中, 预报员得到的是KK$的军火( 或治疗) 和$T 的时间步骤。 预报员试图通过使用算法进行的适应性实验找到用最大平均值定义的最好的手臂。 算法的性能是通过简单的遗憾来测量的, 也就是说, 估计的最好的手臂的质量。 经常者简单遗憾可以指数化地小到$T, 而巴耶斯简单的遗憾是单数小到$T。 本文表明, 巴伊斯的最佳算法, 尽可能减少巴伊西亚人的简单遗憾, 并没有对某些参数产生指数化的简单遗憾, 这一结果与在固定采样制度中表明巴伊西亚人和经常性算法的无足轻重的等同性结果形成对比。 虽然 贝伊斯的最佳算法是用一种折叠式的公式描述的, 几乎无法准确计算, 我们为进一步分析打下基础, 提出一个我们称之为贝尔曼改进的关键数量。