Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the prior and is the first distribution-dependent bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the bandit instance at the end. We also provide the first lower bound on the probability of misidentification in a $2$-armed Bayesian bandit and show that our upper bound (almost) matches the lower bound. Our experiments show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.
翻译:固定预算最佳武器识别( BAI) 是一个土匪问题, 代理商在固定的观察预算范围内, 最大限度地增加确定最佳武器的可能性。 在这项工作中, 我们研究了巴伊西亚环境的这一问题。 我们建议采用巴伊西亚消除算法, 并根据其误判最佳武器的可能性得出上限。 约束反映的是先前武器的质量, 并且是这个环境中第一个依赖分配的捆绑。 我们用一种常客式的论调来证明它, 我们先是执行, 然后在最后整合土匪。 我们还提供了第一个低限, 以两美元重的巴伊西亚土块中误判的可能性, 并显示我们的上限( 近) 与下限匹配。 我们的实验显示, 巴伊西亚的消除优于常客式方法, 并且与我们环境中没有保障的古老的巴伊斯人算法相比具有竞争力 。