We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice as well.
翻译:我们为没有碰撞感测信息的多玩家多武装强盗提出了一个新的算法。我们的算法回避了所有最先进的算法共有的两个问题:它不需要作为投入对一个手臂的最低预期报酬有较低的约束,它的性能并不与最低预期报酬成反比。我们证明理论上的遗憾是用来解释这些说法的。我们用数字实验来补充我们的理论结果,表明拟议的算法在实践中也比最新水平好。