Object-centric representations have recently enabled significant progress in tackling relational reasoning tasks. By building a strong object-centric inductive bias into neural architectures, recent efforts have improved generalization and data efficiency of machine learning algorithms for these problems. One problem class involving relational reasoning that still remains under-explored is multi-agent reinforcement learning (MARL). Here we investigate whether object-centric representations are also beneficial in the fully cooperative MARL setting. Specifically, we study two ways of incorporating an agent-centric inductive bias into our RL algorithm: 1. Introducing an agent-centric attention module with explicit connections across agents 2. Adding an agent-centric unsupervised predictive objective (i.e. not using action labels), to be used as an auxiliary loss for MARL, or as the basis of a pre-training step. We evaluate these approaches on the Google Research Football environment as well as DeepMind Lab 2D. Empirically, agent-centric representation learning leads to the emergence of more complex cooperation strategies between agents as well as enhanced sample efficiency and generalization.
翻译:最近,通过在神经结构中建立强烈的以物体为中心的诱导偏向,最近的努力提高了机器学习算法的概括和数据效率。 涉及关系推理的一个问题类别,仍未得到充分探讨,就是多剂强化学习(MARL)。我们在这里调查以物体为中心的表达方式是否也有益于充分合作的MARL环境。具体地说,我们研究将一种以物剂为中心的诱导偏向纳入我们的RL算法的两种方法:1. 引进一种以物剂为中心的注意模块,在物剂之间有明确的联系。 2. 增加一种以物剂为中心的、不受监督的预测目标(即不使用行动标签),作为MARL的辅助损失,或作为培训前步骤的基础。我们评估了谷歌研究足球环境以及DeepMind实验室2D的这些方法。 以物剂为中心的代表学习方式很生动,导致代理人之间出现更复杂的合作战略,并提高样品效率和一般化。