与深强化学习一起全面开展MOBA运动会 (Towards Playing Full MOBA Games with Deep Reinforcement Learning)

Deheng Ye,Guibin Chen,Wen Zhang,Sheng Chen,Bo Yuan,Bo Liu,Jia Chen,Zhao Liu,Fuhao Qiu,Hongsheng Yu,Yinyuting Yin,Bei Shi,Liang Wang,Tengfei Shi,Qiang Fu,Wei Yang,Lanxiao Huang,Wei Liu

from arxiv, NeurIPS 2020

MOBA games, e.g., Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i.e., lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes. As a result, full MOBA games without restrictions are far from being mastered by any existing AI system. In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning. Specifically, we develop a combination of novel and existing learning techniques, including curriculum self-play learning, policy distillation, off-policy adaption, multi-head value estimation, and Monte-Carlo tree-search, in training and playing a large pool of heroes, meanwhile addressing the scalability issue skillfully. Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.

翻译：MOBA游戏,例如国王荣誉、传说联盟和Dota 2 游戏,对AI系统,如多试、巨大的国家行动空间、复杂的行动控制等,提出了巨大的挑战。因此,为MOBA游戏开发AI已经引起人们的极大关注。然而,现有工作在处理由代理组合爆炸造成的原始游戏复杂性方面做得不够,例如,在扩大英雄人才库时,如果OpenAI的Dota AI将游戏限制在17名英雄的人才库中。结果,完全不受限制的MOBA游戏远远没有被任何现有的AI系统掌握。在本文中,我们提议MOBA AI 游戏的学习模式,从方法上讲能够用深入的强化学习来充分玩MOBA游戏。具体地说,我们开发了新颖和现有学习技术的结合,包括课程自玩学习、政策蒸馏、脱离政策调整、多头值估计和蒙特-卡洛树类研究,在培训和发挥大型英雄群中,同时解决可缩性的问题。我们测试了国王荣誉,一个流行的MOBA游戏机中的超级测试机师,我们展示了在AIBA的高级机上如何建立高级的磁性测试。