We consider the top-k arm identification problem for multi-armed bandits with rewards belonging to a one-parameter canonical exponential family. The objective is to select the set of k arms with the highest mean rewards by sequential allocation of sampling efforts. We propose a unified optimal allocation problem that identifies the complexity measures of this problem under the fixed-confidence, fixed-budget settings, and the posterior convergence rate from the Bayesian perspective. We provide the first characterization of its optimality. We provide the first provably optimal algorithm in the fixed-confidence setting for k>1. We also propose an efficient heuristic algorithm for the top-k arm identification problem. Extensive numerical experiments demonstrate superior performance compare to existing methods in all three settings.
翻译:我们考虑的是多武装强盗的顶尖手臂识别问题,其奖赏属于一个单参数指数型家族。目标是通过连续分配取样工作来选择一组K型武器,其平均奖赏最高。我们提出了一个统一的最佳分配问题,从巴伊西亚的角度,确定这一问题的复杂度,从固定信心、固定预算设置和后方趋同率的角度,确定这一问题的复杂度量度。我们提供了其最佳性的初步特征。我们为K>1提供了在固定信任环境中的第一个可被确认的最佳算法。我们还为顶尖手臂识别问题提出了一个高效的超值算法。广泛的数字实验表明,在所有三种环境中,与现有方法相比,业绩优于现有方法。