掌握MOBABA深强化学习运动会的复杂控制 (Mastering Complex Control in MOBA Games with Deep Reinforcement Learning)

Deheng Ye,Zhao Liu,Mingfei Sun,Bei Shi,Peilin Zhao,Hao Wu,Hongsheng Yu,Shaojie Yang,Xipeng Wu,Qingwei Guo,Qiaobo Chen,Yinyuting Yin,Hao Zhang,Tengfei Shi,Liang Wang,Qiang Fu,Wei Yang,Lanxiao Huang

from arxiv, AAAI 2020

We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, our AI agent, called Tencent Solo, can defeat top professional human players in full 1v1 games.

翻译：我们研究了多玩家在线战斗竞技场1v1游戏中复杂的行动控制强化学习问题。这个问题涉及比传统的1v1游戏(如Go和Atari系列)更复杂的状态和行动空间,这使得很难以人文表现搜索任何政策。在本文中,我们提出了一个深厚的强化学习框架,从系统和算法的角度来解决这一问题。我们的系统是低联和高可扩缩性,可以进行大规模的高效探索。我们的算法包括若干新的战略,包括控制依赖脱钩、行动面具、目标关注和双曲PPPO,我们提议的演员-critical网络可以在我们的系统中得到有效的培训。我们的AI代理,叫做Tencent Solo, 测试了MOBA游戏国王荣誉。我们的AI代理,可以击败整个1V1游戏中的顶级职业人类球员。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。