In many common-payoff games, achieving good performance requires players to develop protocols for communicating their private information implicitly -- i.e., using actions that have non-communicative effects on the environment. Multi-agent reinforcement learning practitioners typically approach this problem using independent learning methods in the hope that agents will learn implicit communication as a byproduct of expected return maximization. Unfortunately, independent learning methods are incapable of doing this in many settings. In this work, we isolate the implicit communication problem by identifying a class of partially observable common-payoff games, which we call implicit referential games, whose difficulty can be attributed to implicit communication. Next, we introduce a principled method based on minimum entropy coupling that leverages the structure of implicit referential games, yielding a new perspective on implicit communication. Lastly, we show that this method can discover performant implicit communication protocols in settings with very large spaces of messages.
翻译:在许多共同的回报游戏中,实现良好的业绩要求参与者制定程序,以隐含地传递其私人信息 -- -- 即使用对环境有非影响的行动。多剂强化学习实践者通常使用独立学习方法处理这一问题,希望代理人学习隐含的交流作为预期回报最大化的副产品。不幸的是,独立学习方法在许多环境下无法做到这一点。在这项工作中,我们通过确定一个部分可见的共同回报游戏的类别来孤立隐含的沟通问题,我们称之为隐含式优惠游戏,其困难可归因于隐含的沟通。接下来,我们引入一个基于最小诱导性组合的原则方法,利用隐含的优先游戏的结构,对隐含的沟通产生新的视角。最后,我们表明这种方法可以在信息空间非常大的环境中发现隐含的通信协议。