与深强化学习一起全面开展MOBA运动会 (Towards Playing Full MOBA Games with Deep Reinforcement Learning)

Deheng Ye,Guibin Chen,Wen Zhang,Sheng Chen,Bo Yuan,Bo Liu,Jia Chen,Zhao Liu,Fuhao Qiu,Hongsheng Yu,Yinyuting Yin,Bei Shi,Liang Wang,Tengfei Shi,Qiang Fu,Wei Yang,Lanxiao Huang,Wei Liu

from arxiv, NeurIPS 2020

MOBA games, e.g., Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i.e., lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes. As a result, full MOBA games without restrictions are far from being mastered by any existing AI system. In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning. Specifically, we develop a combination of novel and existing learning techniques, including off-policy adaption, multi-head value estimation, curriculum self-play learning, policy distillation, and Monte-Carlo tree-search, in training and playing a large pool of heroes, meanwhile addressing the scalability issue skillfully. Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.

翻译：MOBA游戏,例如国王荣誉、传说联盟和Dota 2 游戏,对AI系统,如多试、巨大的国家行动空间、复杂的行动控制等,提出了巨大的挑战。因此,为MOBA游戏开发AI已经引起人们的极大关注。然而,现有工作在处理由代理组合爆炸造成的原始游戏复杂性方面做得不够,例如,在扩大英雄人才库时,如果OpenAI的Dota AI将游戏限制在17名英雄的人才库中。结果,完全不受限制的MOBA游戏远远没有被任何现有的AI系统掌握。在本文中,我们提议MOBA AI 游戏学习模式,从方法上讲能够用深入的强化学习来充分玩MOBA游戏。具体地说,我们开发了新颖和现有学习技术的组合,包括政策外调整、多头值估计、课程自我游戏学习、政策提炼和蒙特-卡尔洛树研究,在培训和扮演大型英雄库时,在技巧上解决可扩缩的问题。我们测试了国王荣誉,MOBA游戏的大众击败强者,我们在AIBA游戏中展示了我们最高级的AIBA游戏中的高级测试者。