By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior. Most existing approaches facilitate inter-agent communication by allowing agents to send messages to each other through free communication channels, i.e., cheap talk channels. Current methods require these channels to be constantly accessible and known to the agents a priori. In this work, we lift these requirements such that the agents must discover the cheap talk channels and learn how to use them. Hence, the problem has two main parts: cheap talk discovery (CTD) and cheap talk utilization (CTU). We introduce a novel conceptual framework for both parts and develop a new algorithm based on mutual information maximization that outperforms existing algorithms in CTD/CTU settings. We also release a novel benchmark suite to stimulate future research in CTD/CTU.
翻译:通过启用代理之间的通信,最近的合作多代理强化学习(MARL)方法已经展示出更好的任务性能和更协调的行为。现有方法通过允许代理通过自由通信渠道发送消息来促进代理间的相互通信,即 Cheap Talk 频道。目前的方法要求这些通道始终可以访问,并且先验地已知于代理之间。在这项工作中,我们推出了一个新的概念框架来解决 Cheap Talk 频道的发现和学习问题,因此这个问题有两个主要部分,即 Cheap Talk 频道的发现(CTD)和 Cheap Talk 频道的利用(CTU)。我们引入了一种基于互信息最大化的新算法,可以在CTD/CTU设置中优于现有算法。我们还发布了一个新的基准测试套件,以促进未来在CTD/CTU方面的研究。