In this paper, we propose a novel benchmark called the StarCraft Multi-Agent Challenges+, where agents learn to perform multi-stage tasks and to use environmental factors without precise reward functions. The previous challenges (SMAC) recognized as a standard benchmark of Multi-Agent Reinforcement Learning are mainly concerned with ensuring that all agents cooperatively eliminate approaching adversaries only through fine manipulation with obvious reward functions. This challenge, on the other hand, is interested in the exploration capability of MARL algorithms to efficiently learn implicit multi-stage tasks and environmental factors as well as micro-control. This study covers both offensive and defensive scenarios. In the offensive scenarios, agents must learn to first find opponents and then eliminate them. The defensive scenarios require agents to use topographic features. For example, agents need to position themselves behind protective structures to make it harder for enemies to attack. We investigate MARL algorithms under SMAC+ and observe that recent approaches work well in similar settings to the previous challenges, but misbehave in offensive scenarios. Additionally, we observe that an enhanced exploration approach has a positive effect on performance but is not able to completely solve all scenarios. This study proposes new directions for future research.
翻译:在本文中,我们提出一个新的基准,即StarCraft多重竞争挑战+,即代理商学会执行多阶段任务和使用环境因素,而没有精确的奖励功能。以前的挑战(SMAC)被确认为多机构强化学习的标准基准,主要是为了确保所有代理商合作消除与对手的接触,但只有用明显的奖励功能进行精细的操纵。另一方面,这一挑战对MARL算法的探索能力感兴趣,以便有效地学习隐含的多阶段任务和环境因素以及微观控制。这项研究涵盖进攻性和防御性两种情况。在进攻性情况中,代理商必须学会首先找到对手,然后消灭它们。防御性情况要求代理商使用地形特征。例如,代理商需要站在保护结构后面,以便敌人更难于攻击。我们在SMAC+下对MARL算法进行了调查,并观察到最近的方法在类似情况下与以往的挑战很有效,但在攻击性情况下有误。此外,我们认为,强化的勘探方法对业绩都有积极的影响,但不能完全解决所有情况。本研究报告提出了未来研究的新方向。