In cooperative bandits, a framework that captures essential features of collective sequential decision making, agents can minimize group regret, and thereby improve performance, by leveraging shared information. However, sharing information can be costly, which motivates developing policies that minimize group regret while also reducing the number of messages communicated by agents. Existing cooperative bandit algorithms obtain optimal performance when agents share information with their neighbors at \textit{every time step}, i.e., full communication. This requires $\Theta(T)$ number of messages, where $T$ is the time horizon of the decision making process. We propose \textit{ComEx}, a novel cost-effective communication protocol in which the group achieves the same order of performance as full communication while communicating only $O(\log T)$ number of messages. Our key step is developing a method to identify and only communicate the information crucial to achieving optimal performance. Further we propose novel algorithms for several benchmark cooperative bandit frameworks and show that our algorithms obtain \textit{state-of-the-art} performance while consistently incurring a significantly smaller communication cost than existing algorithms.
翻译:合作匪徒是一个包含集体顺序决策基本特征的框架,在这种框架内,代理人可以通过利用共享的信息,最大限度地减少集体遗憾,从而改善业绩。但是,分享信息可能费用高昂,这可以激励制定政策,最大限度地减少集体遗憾,同时减少代理人传递的信息数量。现有的合作土匪算法在代理商与邻居共享信息时取得最佳性能,即在\ textit{每时每步},即全面通信。这需要$\Theta(T)$(T) 的电文数量,其中$T$是决策过程的时间范围。我们提议了一个新的具有成本效益的通信协议,在这种协议中,集体实现与完全通信相同的性能顺序,同时只传达$O(g)T) 数量的信息。我们的关键步骤是制定一种方法,确定和仅传递对于实现最佳业绩至关重要的信息。我们进一步建议为几个基准的合作社土匪框架提供新的算法,并表明我们的算法获得\text{st-art}绩效,同时持续带来比现有算法低得多的通信成本。