Effective communication is an important skill for enabling information exchange in multi-agent settings and emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. Since, by definition, these settings involve arbitrary encoding of information, typically they do not allow for the learned protocols to generalize beyond training partners. In contrast, in this work, we present a novel problem setting and the Quasi-Equivalence Discovery (QED) algorithm that allows for zero-shot coordination (ZSC), i.e., discovering protocols that can generalize to independently trained agents. Real world problem settings often contain costly communication channels, e.g., robots have to physically move their limbs, and a non-uniform distribution over intents. We show that these two factors lead to unique optimal ZSC policies in referential games, where agents use the energy cost of the messages to communicate intent. Other-Play was recently introduced for learning optimal ZSC policies, but requires prior access to the symmetries of the problem. Instead, QED can iteratively discovers the symmetries in this setting and converges to the optimal ZSC policy.
翻译:有效的通信是在多试剂环境下进行信息交流的重要技能,而紧急通信现在是一个活跃的研究领域,共同的环境涉及离散的廉价聊天渠道。因为根据定义,这些环境涉及任意编辑信息,通常不允许在培训伙伴之外推广学到的规程。与此形成对照的是,在这项工作中,我们提出了一个新颖的问题设置和准等效发现算法,允许零点协调(ZSC),即发现可以向独立培训的代理人普及的协议。现实世界问题设置往往包含昂贵的通信渠道,例如机器人必须实际移动肢体,对意图进行非统一分布。我们表明,这两个因素导致在优惠游戏中采用独特的最佳ZSC政策,即代理人利用电文的能量成本来传达意向。最近引入了其他算法,以学习最佳 ZSC 政策,但需要事先了解问题对称。相反,QED可以反复发现在这一设置中,最佳ZZ政策中存在的对称。