In this paper, we identify the best learning scenario to train a team of agents to compete against multiple possible strategies of opposing teams. We evaluate cooperative value-based methods in a mixed cooperative-competitive environment. We restrict ourselves to the case of a symmetric, partially observable, two-team Markov game. We selected three training methods based on the centralised training and decentralised execution (CTDE) paradigm: QMIX, MAVEN and QVMix. For each method, we considered three learning scenarios differentiated by the variety of team policies encountered during training. For our experiments, we modified the StarCraft Multi-Agent Challenge environment to create competitive environments where both teams could learn and compete simultaneously. Our results suggest that training against multiple evolving strategies achieves the best results when, for scoring their performances, teams are faced with several strategies.
翻译:在本文中,我们确定了培训一组代理人员以对抗对立团队多种可能战略的最佳学习情景;我们评估了在混合合作竞争环境中基于价值的合作方法;我们仅限于一个对称的、部分可观测的、两组马尔科夫游戏;我们根据集中培训和分散执行模式选择了三种培训方法:QMIX、MAVEN和QVMix。对于每一种方法,我们考虑了三种因在培训期间遇到的团队政策不同而有所区别的学习情景。为了我们的实验,我们修改了StarCraft多要素挑战环境,以创造两个团队可以同时学习和竞争的竞争环境。我们的结果表明,在多个不断发展的战略下,培训能够取得最佳效果,因为为了取得业绩,团队面临若干战略。