Multi-agent reinforcement learning (MARL) can model many real world applications. However, many MARL approaches rely on epsilon greedy for exploration, which may discourage visiting advantageous states in hard scenarios. In this paper, we propose a new approach QMIX(SEG) for tackling MARL. It makes use of the value function factorization method QMIX to train per-agent policies and a novel Semantic Epsilon Greedy (SEG) exploration strategy. SEG is a simple extension to the conventional epsilon greedy exploration strategy, yet it is experimentally shown to greatly improve the performance of MARL. We first cluster actions into groups of actions with similar effects and then use the groups in a bi-level epsilon greedy exploration hierarchy for action selection. We argue that SEG facilitates semantic exploration by exploring in the space of groups of actions, which have richer semantic meanings than atomic actions. Experiments show that QMIX(SEG) largely outperforms QMIX and leads to strong performance competitive with current state-of-the-art MARL approaches on the StarCraft Multi-Agent Challenge (SMAC) benchmark.
 翻译:多剂强化学习(MARL)可以模拟许多真实的世界应用。然而,许多多剂强化学习(MARL)方法依赖Epsilon贪婪的勘探,这可能会在困难的情景下阻止访问有利国家。在本文件中,我们提出了处理MARL的新的QMIX(SEG)方法。它利用了价值函数因子化方法 QMIX(QMIX)来培训每剂政策和新颖的Smantic Epsilon Greedy(SEG)勘探战略。SEG(SEG)只是常规的Epsilon贪婪勘探战略的简单延伸,但实验显示它能大大改善MARL的绩效。我们首先将行动分组成具有类似效果的行动组,然后在双等级的精英贪婪勘探等级中利用这些组来选择行动。我们说,SEGEG通过在一组行动空间探索比原子行动更丰富的语义含义来推动语义探索。实验表明,QMIX(SEG)基本上超越了QMIX(SAR-ADRVI)基准上的现有状态方法,并导致很强的业绩竞争。