We study pure exploration in multi-armed bandits with graph side-information. In particular, we consider the best arm (and near-best arm) identification problem in the fixed confidence setting under the assumption that the arm rewards are smooth with respect to a given arbitrary graph. This captures a range of real world pure-exploration scenarios where one often has information about the similarity of the options or actions under consideration. We propose a novel algorithm GRUB (GRaph based UcB) for this problem and provide a theoretical characterization of its performance that elicits the benefit of the graph-side information. We complement our theory with experimental results that show that capitalizing on available graph side information yields significant improvements over pure exploration methods that are unable to use this information.
翻译:我们用图表侧信息来研究对多武装强盗的纯探索,特别是,我们考虑固定信心设置中最好的手臂(和近最佳手臂)识别问题,前提是手臂报酬与特定任意图形相比是顺畅的。这反映了一系列真实的世界纯勘探情景,人们常常掌握关于所考虑的选择或行动的相似性的信息。我们为此问题提出了一个新奇的算法GROB(基于GROB(基于GRAph的UCB)),并提供其性能的理论定性,以图面信息为受益。我们用实验结果来补充我们的理论,表明利用现有的图表侧信息可以大大改进无法使用这些信息的纯勘探方法。