Cooperative multi-agent reinforcement learning (MARL) is making rapid progress for solving tasks in a grid world and real-world scenarios, in which agents are given different attributes and goals, resulting in different behavior through the whole multi-agent task. In this study, we quantify the agent's behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks. We define role diversity from three perspectives: action-based, trajectory-based, and contribution-based to fully measure a multi-agent task. Through theoretical analysis, we find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity. The decomposed factors can significantly impact policy optimization on three popular directions including parameter sharing, communication mechanism, and credit assignment. The main experimental platforms are based on {\bf Multiagent Particle Environment (MPE)} and {\bf The StarCraft Multi-Agent Challenge (SMAC). Extensive experiments} clearly show that role diversity can serve as a robust measurement for the characteristics of a multi-agent cooperation task and help diagnose whether the policy fits the current multi-agent system for a better policy performance.
翻译:合作性多剂强化学习(MARL)正在迅速取得进展,以解决在网格世界和现实世界情景中的任务,给各种物剂提供不同的属性和目标,从而在整个多剂任务中产生不同的行为。在本研究中,我们量化了该物剂的行为差异,并通过参数共享、通信机制和信用分配等衡量其与政策绩效的关系。我们从三个角度界定了作用的多样性:基于行动的、基于轨迹的和基于贡献的,以充分衡量多剂任务。通过理论分析,我们发现MARL中受约束的错误可以分解为三个部分,这三个部分与角色多样性密切相关。分解的因素可以极大地影响三个流行方向上的政策优化,包括参数共享、通信机制和信用分配。主要实验平台基于 xb 多剂环境(MPE) 和 bf StarCraft 多剂挑战(SMAC) 。广泛的实验清楚地表明,多样性可以作为衡量多剂合作任务特征的有力衡量标准,帮助判断当前政策是否适合多剂的绩效。