The multi-agent multi-armed bandit problem has been studied extensively due to its ubiquity in many real-life applications, such as online recommendation systems and wireless networking. We consider the setting where agents should minimize their group regret while collaborating over a given graph via some communication protocol and where each agent is given a different set of arms. Previous literature on this problem only considered one of the two desired features separately: agents with the same arm set communicate over a general graph, or agents with different arm sets communicate over a fully connected graph. In this work, we introduce a more general problem setting that encompasses all the desired features. For this novel setting, we first provide a rigorous regret analysis for the standard flooding protocol combined with the UCB policy. Then, to mitigate the issue of high communication costs incurred by flooding, we propose a new protocol called Flooding with Absorption (FWA). We provide a theoretical analysis of the regret bound and intuitions on the advantages of using FWA over flooding. Lastly, we verify empirically that using FWA leads to significantly lower communication costs despite minimal regret performance loss compared to flooding.
翻译:多试剂多臂土匪问题已经广泛研究,因为它在许多现实应用中普遍存在,如在线建议系统和无线网络等。我们考虑了代理商在通过某种通信协议合作使用某一图表时,应当最大限度地减少集体遗憾,并给予每个代理商不同的武器。以前关于该问题的文献只分别考虑了两个理想特征之一:同一臂架的代理商通过一般图表进行交流,或不同臂架的代理商通过一个完全相连的图表进行交流。在这项工作中,我们引入了一个涵盖所有预期特征的更一般性的问题设置。在这个新奇特的环境下,我们首先对标准的洪水协议以及UCB政策进行严格的遗憾分析。然后,为了减轻洪水造成的高昂通信费用问题,我们提出了一个新的协议,称为“用吸收水淹”协议。我们从理论上分析了使用FWA对洪水的好处的遗憾和直觉。最后,我们从经验上核实,使用FWA导致通信成本大大降低,尽管与洪水相比,遗憾的绩效损失是最小的。</s>