Multi-agent football poses an unsolved challenge in AI research. Existing work has focused on tackling simplified scenarios of the game, or else leveraging expert demonstrations. In this paper, we develop a multi-agent system to play the full 11 vs. 11 game mode, without demonstrations. This game mode contains aspects that present major challenges to modern reinforcement learning algorithms; multi-agent coordination, long-term planning, and non-transitivity. To address these challenges, we present TiZero; a self-evolving, multi-agent system that learns from scratch. TiZero introduces several innovations, including adaptive curriculum learning, a novel self-play strategy, and an objective that optimizes the policies of multiple agents jointly. Experimentally, it outperforms previous systems by a large margin on the Google Research Football environment, increasing win rates by over 30%. To demonstrate the generality of TiZero's innovations, they are assessed on several environments beyond football; Overcooked, Multi-agent Particle-Environment, Tic-Tac-Toe and Connect-Four.
翻译:多试剂足球在AI研究中构成一个尚未解决的挑战。 现有的工作侧重于解决简化的游戏方案, 或者利用专家演示。 在本文中, 我们开发了一个多试剂系统, 以完整播放11对11的游戏模式, 没有演示。 这种游戏模式包含对现代强化学习算法、 多试剂协调、 长期规划和非过渡性构成重大挑战的方面。 为了应对这些挑战, 我们介绍TiZero; 一个从零开始学习的自我演进的多试剂系统。 TiZero 引入了一些创新, 包括适应性课程学习、 新的自玩策略, 以及一个联合优化多试剂政策的目标 。 实验性地说, 它在谷歌研究足球环境上大大优于以前的系统, 将赢率提高30%以上。 为了展示TiZero创新的普遍性, 它们在足球以外的几个环境中进行了评估; 超速、 多试剂粒环境、 Tic- Tac- Toe- Toe and connite-Four 。