In pure-exploration problems, information is gathered sequentially to answer a question on the stochastic environment. While best-arm identification for linear bandits has been extensively studied in recent years, few works have been dedicated to identifying one arm that is $\varepsilon$-close to the best one (and not exactly the best one). In this problem with several correct answers, an identification algorithm should focus on one candidate among those answers and verify that it is correct. We demonstrate that picking the answer with highest mean does not allow an algorithm to reach asymptotic optimality in terms of expected sample complexity. Instead, a \textit{furthest answer} should be identified. Using that insight to choose the candidate answer carefully, we develop a simple procedure to adapt best-arm identification algorithms to tackle $\varepsilon$-best-answer identification in transductive linear stochastic bandits. Finally, we propose an asymptotically optimal algorithm for this setting, which is shown to achieve competitive empirical performance against existing modified best-arm identification algorithms.
翻译:在纯勘探问题中,信息按顺序收集,以回答关于随机环境的问题。虽然近年来对线性强盗的最佳武器识别方法进行了广泛研究,但很少有人专门致力于确定一个最接近最佳武器(而不是最接近的最佳武器)的手臂。在这个问题中,有几种正确答案,识别算法应侧重于其中的一位候选人,并核实其正确性。我们证明,以最高平均值选择答案并不能使算法在预期的样本复杂性方面达到无药可治的最佳性。相反,应当找出一个textit{furth 答案}。我们利用这种洞察来仔细选择候选人的答案,制定了一个简单程序,以调整最佳武器识别算法,在移动式线性直截土匪中找到最佳答案。最后,我们建议为这一环境采用一个非现现的最佳算法,以相对于现有经修改的最佳武器识别算法取得竞争性的经验性表现。