Multi-agent reinforcement learning (MARL) is a promising framework for solving complex tasks with many agents. However, a key challenge in MARL is defining private utility functions that ensure coordination when training decentralized agents. This challenge is especially prevalent in unstructured tasks with sparse rewards and many agents. We show that successor features can help address this challenge by disentangling an individual agent's impact on the global value function from that of all other agents. We use this disentanglement to compactly represent private utilities that support stable training of decentralized agents in unstructured tasks. We implement our approach using a centralized training, decentralized execution architecture and test it in a variety of multi-agent environments. Our results show improved performance and training time relative to existing methods and suggest that disentanglement of successor features offers a promising approach to coordination in MARL.
翻译:多剂强化学习(MARL)是解决许多代理商复杂任务的一个很有希望的框架,然而,MARL的一项关键挑战是确定私营公用事业职能,确保在培训分散代理商时确保协调,这一挑战在无结构的任务中尤为普遍,因为报酬微乎其微,而且有许多代理商。我们表明,继任特征可以通过将单个代理商对全球价值功能的影响与所有其他代理商的影响区分开来,帮助应对这一挑战。我们利用这种分解方式,严格地代表支持对非结构化任务的分散代理商进行稳定培训的私营公用事业。我们采用集中培训、分权执行架构并在各种多剂环境下测试这一方法。我们的成果显示,相对于现有方法,业绩和培训时间有所改善,并表明,分离后续特征为MAR的协调工作提供了很有希望的方法。