This paper targets a variant of the stochastic multi-armed bandit problem called good arm identification (GAI). GAI is a pure-exploration bandit problem with the goal to output as many good arms using as few samples as possible, where a good arm is defined as an arm whose expected reward is greater than a given threshold. In this work, we propose DGAI - a differentiable good arm identification algorithm to improve the sample complexity of the state-of-the-art HDoC algorithm in a data-driven fashion. We also showed that the DGAI can further boost the performance of a general multi-arm bandit (MAB) problem given a threshold as a prior knowledge to the arm set. Extensive experiments confirm that our algorithm outperform the baseline algorithms significantly in both synthetic and real world datasets for both GAI and MAB tasks.
翻译:本文针对的是所谓“良好的手臂识别(GAI)”的多武装盗匪问题。 GAI是一个纯粹探索性的土匪问题,目的是尽可能少地利用样品来输出尽可能多的好手臂,在这种情况下,好的手臂被定义为预期报酬大于某一阈值的手臂。在这项工作中,我们提议DGAI——一种不同而良好的手臂识别算法,用数据驱动的方式改进最先进的HDoC算法的样本复杂性。我们还表明,DGAI可以进一步提升一个通用的多武装盗匪(MAB)问题的性能,因为其门槛是武器组先前的知识。 广泛的实验证实,我们的算法在合成和真实世界数据组中大大优于用于GAI和MAB任务的基线算法。</s>