Explicit communication among humans is key to coordinating and learning. Social learning, which uses cues from experts, can greatly benefit from the usage of explicit communication to align heterogeneous policies, reduce sample complexity, and solve partially observable tasks. Emergent communication, a type of explicit communication, studies the creation of an artificial language to encode a high task-utility message directly from data. However, in most cases, emergent communication sends insufficiently compressed messages with little or null information, which also may not be understandable to a third-party listener. This paper proposes an unsupervised method based on the information bottleneck to capture both referential complexity and task-specific utility to adequately explore sparse social communication scenarios in multi-agent reinforcement learning (MARL). We show that our model is able to i) develop a natural-language-inspired lexicon of messages that is independently composed of a set of emergent concepts, which span the observations and intents with minimal bits, ii) develop communication to align the action policies of heterogeneous agents with dissimilar feature models, and iii) learn a communication policy from watching an expert's action policy, which we term `social shadowing'.
翻译:人类之间的明确交流是协调和学习的关键。社会学习使用专家的提示,从使用明确交流来协调不同政策、减少抽样复杂性和解决部分可观测任务中受益匪浅。新兴交流是一种明确的交流,研究创造一种人工语言来直接从数据中编码高任务功用信息。然而,在多数情况下,新兴交流传递的信息不够压缩,信息很少或空洞,第三方听众也可能无法理解。本文提出一种不受监督的方法,以信息瓶颈为基础,捕捉在多代理人强化学习中适当探索稀少的社会交流情景的特有复杂性和任务效用。我们表明,我们的模型能够(i) 开发一套以自然语言启发的信息词汇,它独立地由一系列新出现的概念组成,这些概念涵盖观察和意图,其细小部分,(ii) 开发通信,以不同特征模式协调混合剂的行动政策,并学习一项交流政策,从观察专家的行动政策,我们称之为“社会影子”。</s>