The cooperative bandit problem is increasingly becoming relevant due to its applications in large-scale decision-making. However, most research for this problem focuses exclusively on the setting with perfect communication, whereas in most real-world distributed settings, communication is often over stochastic networks, with arbitrary corruptions and delays. In this paper, we study cooperative bandit learning under three typical real-world communication scenarios, namely, (a) message-passing over stochastic time-varying networks, (b) instantaneous reward-sharing over a network with random delays, and (c) message-passing with adversarially corrupted rewards, including byzantine communication. For each of these environments, we propose decentralized algorithms that achieve competitive performance, along with near-optimal guarantees on the incurred group regret as well. Furthermore, in the setting with perfect communication, we present an improved delayed-update algorithm that outperforms the existing state-of-the-art on various network topologies. Finally, we present tight network-dependent minimax lower bounds on the group regret. Our proposed algorithms are straightforward to implement and obtain competitive empirical performance.
翻译:合作土匪问题由于在大规模决策中的应用而变得日益相关,然而,这一问题的大多数研究都完全集中于通信完美,而在大多数现实世界分布的环境中,通信往往超越随机网络,任意腐败和拖延。在本文中,我们研究合作土匪在三种典型的现实世界通信情景下学习,即:(a) 传递信息,超越随机随机随机的随机时间分配网络,(b) 在一个网络上即时分享报酬,以及(c) 以对抗性腐败的奖赏传递信息,包括用赞提因通信传递。对于其中的每一种环境,我们建议采用分散的算法,实现竞争性业绩,同时对产生遗憾的群体也提供近乎最佳的保证。此外,在通信完美的情况下,我们提出了一种改进的过时的算法,它超越了现有各种网络结构的状态。最后,我们提出了紧紧靠网络的微缩缩缩缩缩缩缩胶框,令集团感到遗憾。我们提议的算法直截了实施和取得竞争性经验性表现。