Neural agents trained in reinforcement learning settings can learn to communicate among themselves via discrete tokens, accomplishing as a team what agents would be unable to do alone. However, the current standard of using one-hot vectors as discrete communication tokens prevents agents from acquiring more desirable aspects of communication such as zero-shot understanding. Inspired by word embedding techniques from natural language processing, we propose neural agent architectures that enables them to communicate via discrete tokens derived from a learned, continuous space. We show in a decision theoretic framework that our technique optimizes communication over a wide range of scenarios, whereas one-hot tokens are only optimal under restrictive assumptions. In self-play experiments, we validate that our trained agents learn to cluster tokens in semantically-meaningful ways, allowing them communicate in noisy environments where other techniques fail. Lastly, we demonstrate both that agents using our method can effectively respond to novel human communication and that humans can understand unlabeled emergent agent communication, outperforming the use of one-hot communication.
翻译:在强化学习设置方面受过训练的神经代理商可以学习通过离散的象征物相互交流,作为一个团队完成什么是不能单独做到的。然而,目前使用单热矢量作为离散的通信象征物的标准使代理商无法获得更可取的通信方面,例如零射线理解。在自然语言处理过程中的文字嵌入技术的启发下,我们提议神经代理物结构,使他们能够通过从一个有知识的连续空间产生的离散象征物进行交流。我们在一个决定性框架中显示,我们的技术在广泛的情景中优化了通信,而单热象征物只是在限制性假设下是最佳的。在自我玩耍实验中,我们证实我们受过训练的代理商学会了以语义上有意义的方式组合标志物,允许他们在其他技术失败的吵闹环境中进行交流。最后,我们证明,使用我们的方法可以有效地应对人类新通信,人类能够理解无标签的新兴代理物的通信,而人类能够理解无标签的新兴代理物的通信,比使用单热通信好。