Fixed-budget best-arm identification (BAI) is a bandit problem where the learning agent maximizes the probability of identifying the optimal arm after a fixed number of observations. In this work, we initiate the study of this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on the probability that it fails to identify the optimal arm. The bound reflects the quality of the prior and is the first such bound in this setting. We prove it using a frequentist-like argument, where we carry the prior through, and then integrate out the random bandit instance at the end. Our upper bound asymptotically matches a newly established lower bound for $2$ arms. Our experimental results show that Bayesian elimination is superior to frequentist methods and competitive with the state-of-the-art Bayesian algorithms that have no guarantees in our setting.
翻译:固定预算最佳武器识别( BAI) 是一个土匪问题, 学习代理商在固定的观察次数后, 将确定最佳武器的最佳武器的可能性最大化。 在这项工作中, 我们开始在巴伊西亚环境中研究这一问题。 我们建议采用巴伊西亚消除算法, 并根据它未能确定最佳武器的可能性得出一个上限。 捆绑反映了先前武器的质量, 并且是这个环境中第一个这样的约束。 我们用一种经常式的比喻来证明它, 我们先从中走过, 然后在结尾处将随机的土匪比方整合出来。 我们的上层正同时匹配一个新建的低价武器约束。 我们的实验结果显示, 巴伊西亚消除方法优于常态方法, 并且与最先进的巴伊西亚人算法相比, 在我们的环境下没有保障。