Recently, deep multi-agent reinforcement learning (MARL) has shown the promise to solve complex cooperative tasks. Its success is partly because of parameter sharing among agents. However, such sharing may lead agents to behave similarly and limit their coordination capacity. In this paper, we aim to introduce diversity in both optimization and representation of shared multi-agent reinforcement learning. Specifically, we propose an information-theoretical regularization to maximize the mutual information between agents' identities and their trajectories, encouraging extensive exploration and diverse individualized behaviors. In representation, we incorporate agent-specific modules in the shared neural network architecture, which are regularized by L1-norm to promote learning sharing among agents while keeping necessary diversity. Empirical results show that our method achieves state-of-the-art performance on Google Research Football and super hard StarCraft II micromanagement tasks.
翻译:最近,深入的多试剂强化学习(MARL)显示了解决复杂合作任务的希望,其成功部分是由于代理商之间共享参数。然而,这种共享可能导致代理商采取类似的行为,并限制其协调能力。在本文中,我们的目标是在共享多试剂强化学习的优化和代表性方面引入多样性。具体地说,我们提议信息理论规范化,以最大限度地提高代理商身份及其轨迹之间的相互信息,鼓励广泛的探索和多种个人化行为。作为代表,我们把特定代理商模块纳入共享神经网络架构,由L1-Norm规范,以促进代理商之间的学习共享,同时保持必要的多样性。经验性结果显示,我们的方法在谷歌研究足球和超硬StarCraft II微观管理任务上取得了最新业绩。