We study the best-arm identification problem with fixed confidence when contextual (covariate) information is available in stochastic bandits. Although we can use contextual information in each round, we are interested in the marginalized mean reward over the contextual distribution. Our goal is to identify the best arm with a minimal number of samplings under a given value of the error rate. We show the instance-specific sample complexity lower bounds for the problem. Then, we propose a context-aware version of the "Track-and-Stop" strategy, wherein the proportion of the arm draws tracks the set of optimal allocations and prove that the expected number of arm draws matches the lower bound asymptotically. We demonstrate that contextual information can be used to improve the efficiency of the identification of the best marginalized mean reward compared with the results of Garivier & Kaufmann (2016). We experimentally confirm that context information contributes to faster best-arm identification.
翻译:我们用固定的自信来研究最佳武器识别问题。 虽然我们可以在每轮中使用背景信息, 但我们对背景分布的边缘化平均报酬感兴趣。 我们的目标是在错误率的某个特定值下以最小数量的抽样来识别最佳武器。 我们展示了具体实例样本的复杂性, 从而降低了问题的底线。 然后, 我们提出了一个“ 跟踪和停止” 战略的背景认知版本, 其中手臂的比例可以跟踪最佳分配的一套方法, 并证明预期的手臂抽取数量与较低约束值一致。 我们证明, 与 Garivier & Kaufmann (2016) 的结果相比, 可以利用背景信息来提高识别最边缘化的平均报酬的效率 。 我们实验性地确认, 环境信息有助于更快地进行最佳武器识别 。