Emergent communication has made strides towards learning communication from scratch, but has focused primarily on protocols that resemble human language. In nature, multi-agent cooperation gives rise to a wide range of communication that varies in structure and complexity. In this work, we recognize the full spectrum of communication that exists in nature and propose studying lower-level communication. Specifically, we study emergent implicit signaling in the context of decentralized multi-agent learning in difficult, sparse reward environments. However, learning to coordinate in such environments is challenging. We propose a curriculum-driven strategy that combines: (i) velocity-based environment shaping, tailored to the skill level of the multi-agent team; and (ii) a behavioral curriculum that helps agents learn successful single-agent behaviors as a precursor to learning multi-agent behaviors. Pursuit-evasion experiments show that our approach learns effective coordination, significantly outperforming sophisticated analytical and learned policies. Our method completes the pursuit-evasion task even when pursuers move at half of the evader's speed, whereas the highest-performing baseline fails at 80% of the evader's speed. Moreover, we examine the use of implicit signals in coordination through position-based social influence. We show that pursuers trained with our strategy exchange more than twice as much information (in bits) than baseline methods, indicating that our method has learned, and relies heavily on, the exchange of implicit signals.
翻译:新兴的通信在从零开始学习交流方面取得了进步,但主要侧重于与人文语言相类似的协议。在自然中,多代理人合作导致广泛的沟通,其结构和复杂性各不相同。在这项工作中,我们认识到自然中存在的通信的方方面面,并提议研究较低层次的通信。具体地说,我们研究在困难、稀少的奖励环境中分散多代理人学习的背景下出现的隐含信号。然而,在这种环境中学习如何协调是富有挑战性的。我们提出了一个课程驱动战略,其中结合了:(一) 速度型环境的形成,适合多代理人团队的技能水平;以及(二) 行为课程,帮助代理人学习成功的单一代理人行为,作为学习多代理人行为的前奏。 追逐式实验表明,我们的方法能够有效地协调,大大超过复杂的分析和学习政策。 我们的方法在追赶者以逃避者速度的一半速度移动时完成追逐-回避任务,而业绩最高的基线在逃避者的速度上达80%的失败。此外,我们研究如何使用隐性信号来进行深层次的交流,而不是通过深层次的基线方法来显示我们所学会的交流。