Learning interpretable communication is essential for multi-agent and human-agent teams (HATs). In multi-agent reinforcement learning for partially-observable environments, agents may convey information to others via learned communication, allowing the team to complete its task. Inspired by human languages, recent works study discrete (using only a finite set of tokens) and sparse (communicating only at some time-steps) communication. However, the utility of such communication in human-agent team experiments has not yet been investigated. In this work, we analyze the efficacy of sparse-discrete methods for producing emergent communication that enables high agent-only and human-agent team performance. We develop agent-only teams that communicate sparsely via our scheme of Enforcers that sufficiently constrain communication to any budget. Our results show no loss or minimal loss of performance in benchmark environments and tasks. In human-agent teams tested in benchmark environments, where agents have been modeled using the Enforcers, we find that a prototype-based method produces meaningful discrete tokens that enable human partners to learn agent communication faster and better than a one-hot baseline. Additional HAT experiments show that an appropriate sparsity level lowers the cognitive load of humans when communicating with teams of agents and leads to superior team performance.
翻译:在多试剂和人力试剂小组的多试剂强化学习中,对多试剂和人力试剂小组(HATs)来说,学习对半可观测环境的多试剂强化学习中,代理商可以通过学习的交流向其他人传递信息,使小组能够完成任务。在人类语言的启发下,最近的工作研究离散(仅使用一套有限的象征物)和稀散(仅在某些时间步骤上传播)的交流。然而,这种交流在人体试剂小组实验中的效用尚未调查。在这项工作中,我们分析稀疏分解方法在产生使代理商和人力试剂小组能够发挥高代理商和人力试剂小组性能的突发通信方面的效力。我们开发了只使用代理商的小组,通过我们的“执行者”计划进行很少的交流,这种交流对任何预算都有充分的限制。我们的成果显示,基准环境和任务中的业绩没有损失或最小的损失。在基准环境中测试的人体试剂小组已经采用“强制剂”模型,我们发现,一种基于原型方法能够使人类伙伴学习代理商通信速度更快和更好于一等基线。其他HATAT实验显示,使人类高级感知觉反应小组的适当性水平。