In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration. However, agents can hardly accomplish the team task without coordination and they would be trapped in a local optimum where easy cooperation is accessed without enough individual exploration. Recent works mainly concentrate on agents' coordinated exploration, which brings about the exponentially grown exploration of the state space. To address this issue, we propose Self-Motivated Multi-Agent Exploration (SMMAE), which aims to achieve success in team tasks by adaptively finding a trade-off between self-exploration and team cooperation. In SMMAE, we train an independent exploration policy for each agent to maximize their own visited state space. Each agent learns an adjustable exploration probability based on the stability of the joint team policy. The experiments on highly cooperative tasks in StarCraft II micromanagement benchmark (SMAC) demonstrate that SMMAE can explore task-related states more efficiently, accomplish coordinated behaviours and boost the learning performance.
翻译:在多试剂强化合作学习(CMARL)中,对于代理人来说,实现自我探索与团队合作之间的平衡至关重要,然而,代理人无法在没有协调的情况下完成团队任务,他们将陷入一个地方最佳的、没有足够个人勘探便可以轻易获得合作的地方。最近的工作主要集中于代理人的协调勘探,这带来了对国家空间的急剧增长的探索。为了解决这一问题,我们提议自我激励多点探索(SMMAE),目的是通过适应性地找到自我探索与团队合作之间的平衡,从而在团队任务中取得成功。在SMMAE,我们为每个代理人培训一项独立的勘探政策,以最大限度地扩大他们自己访问过的国家空间。每个代理人根据联合团队政策的稳定性学习了一种可调整的探索概率。StarCraft II微观管理基准(SMAC)的高度合作任务实验表明,SMMAE可以更高效地探索与任务有关的国家,完成协调的行为,提高学习绩效。