DouZero: 掌握杜杜迪祖的自打深强化学习课程 (DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning)

Games are abstractions of the real world, where artificial agents learn to compete and cooperate with other agents. While significant achievements have been made in various perfect- and imperfect-information games, DouDizhu (a.k.a. Fighting the Landlord), a three-player card game, is still unsolved. DouDizhu is a very challenging domain with competition, collaboration, imperfect information, large state space, and particularly a massive set of possible actions where the legal actions vary significantly from turn to turn. Unfortunately, modern reinforcement learning algorithms mainly focus on simple and small action spaces, and not surprisingly, are shown not to make satisfactory progress in DouDizhu. In this work, we propose a conceptually simple yet effective DouDizhu AI system, namely DouZero, which enhances traditional Monte-Carlo methods with deep neural networks, action encoding, and parallel actors. Starting from scratch in a single server with four GPUs, DouZero outperformed all the existing DouDizhu AI programs in days of training and was ranked the first in the Botzone leaderboard among 344 AI agents. Through building DouZero, we show that classic Monte-Carlo methods can be made to deliver strong results in a hard domain with a complex action space. The code and an online demo are released at https://github.com/kwai/DouZero with the hope that this insight could motivate future work.

翻译：DouDizhu(a.k.a.a. fight the Landrord)是一个三人玩牌游戏,至今仍未解决。DouDizhu是一个非常具有挑战性的领域,竞争、协作、信息不完善、信息不完善、国家空间巨大,特别是一系列可能的行动,法律行动互不相同。不幸的是,现代强化学习算法主要侧重于简单和小型行动空间,并不令人意外地显示,在杜迪祖(DouDizhu)的各种完美和不完善的信息游戏中取得了显著进展。在这项工作中,我们提出了一个概念简单而有效的杜迪祖(DouZero)AI系统,即DouZero,该系统通过深层的神经网络、行动编码和平行行为体,加强传统的蒙特卡洛(Monte-Carlo)方法。从四个GPUPS、杜泽罗(DouZero)的单个服务器开始,它超越了现有的杜迪苏(D)AI(Doukehu)程序,在培训的几天里,我们第一次在BZeZ(BZ)的平台上取得了令人满意的进展。在344A AI(AI(Mote-deal) existristral) stral stral)的服务器上展示了一个硬动作,可以展示一个硬的动作。