具有内在动机的多机构环境中有条件加强强化学习目标 (Intrinsically-Motivated Goal-Conditioned Reinforcement Learning in Multi-Agent Environments)

How can a population of reinforcement learning agents autonomously learn a diversity of cooperative tasks in a shared environment? In the single-agent paradigm, goal-conditioned policies have been combined with intrinsic motivation mechanisms to endow agents with the ability to master a wide diversity of autonomously discovered goals. Transferring this idea to cooperative multi-agent systems (MAS) entails a challenge: intrinsically motivated agents that sample goals independently focus on a shared cooperative goal with low probability, impairing their learning performance. In this work, we propose a new learning paradigm for modeling such settings, the Decentralized Intrinsically Motivated Skill Acquisition Problem (Dec-IMSAP), and employ it to solve cooperative navigation tasks. Agents in a Dec-IMSAP are trained in a fully decentralized way, which comes in contrast to previous contributions in multi-goal MAS that consider a centralized goal-selection mechanism. Our empirical analysis indicates that a sufficient condition for efficiently learning a diversity of cooperative tasks is to ensure that a group aligns its goals, i.e., the agents pursue the same cooperative goal and learn to coordinate their actions through specialization. We introduce the Goal-coordination game, a fully-decentralized emergent communication algorithm, where goal alignment emerges from the maximization of individual rewards in multi-goal cooperative environments and show that it is able to reach equal performance to a centralized training baseline that guarantees aligned goals. To our knowledge, this is the first contribution addressing the problem of intrinsically motivated multi-agent goal exploration in a decentralized training paradigm.

翻译：在一个单一的试办模式中,目标化政策与内在激励机制相结合,赋予试办者掌握各种自主发现的目标的能力,将这一想法转移到合作性多试办系统(MAS)带来了挑战:具有内在动机的试办者将目标独立集中于共同的合作目标,概率低,影响其学习成绩;在这项工作中,我们提出一个新的学习模式,用以模拟这种环境,分散式的有动力技能获取技能获取问题(Dec-IMSAAP),并利用它解决合作性导航任务。 Dec-IMASAP的试办者以完全分散的方式接受培训,这与以前在考虑集中目标选择机制的多目标MAS(MAS)中所做的贡献形成对比。我们的经验分析表明,为高效率地学习合作任务多样性的充足条件是确保一个小组调整其目标,即,即,分散式分散式分散式的分散式技能获取问题(Dec-IMSAAP),并利用它解决合作性导航任务。我们引入了目标协调性游戏,以完全分散式的培训模式化的导航任务,对Dec-IMASAP的代理人进行了充分分散式培训,这与以前在考虑集中化的MASMAS(M)中所做的贡献目标的升级目标的升级,从而显示实现一个稳定的学习目标,一个稳定的升级目标,从而逐渐形成一个稳定的逻辑,从而形成一个稳定的逻辑目标。