As a challenging multi-player card game, DouDizhu has recently drawn much attention for analyzing competition and collaboration in imperfect-information games. In this paper, we propose PerfectDou, a state-of-the-art DouDizhu AI system that dominates the game, in an actor-critic framework with a proposed technique named perfect information distillation. In detail, we adopt a perfect-training-imperfect-execution framework that allows the agents to utilize the global information to guide the training of the policies as if it is a perfect information game and the trained policies can be used to play the imperfect information game during the actual gameplay. To this end, we characterize card and game features for DouDizhu to represent the perfect and imperfect information. To train our system, we adopt proximal policy optimization with generalized advantage estimation in a parallel training paradigm. In experiments we show how and why PerfectDou beats all existing AI programs, and achieves state-of-the-art performance.
翻译:作为具有挑战性的多玩牌游戏,DouDizhu最近在分析不完善信息游戏的竞争和协作方面引起了人们的极大关注。在本文中,我们建议完美杜(PetrodDou),这是一个控制游戏的最先进的杜杜(DouDizhu AI)系统,在演员-批评框架内,采用一个名为完美信息蒸馏的拟议技术。详细来说,我们采用了一个完美的培训-不完善执行框架,使代理商能够利用全球信息来指导政策培训,仿佛它是一个完美的信息游戏,训练有素的政策可以用来在实际游戏中玩不完善的信息游戏。为此,我们给杜杜朱(DouDizhu)的卡片和游戏特征定性,以代表完美和不完善的信息。为了培训我们的系统,我们采用了在平行培训模式中普遍优势估算的准政策优化。在实验中,我们展示了完美杜如何和为什么将所有现有的人工智能程序打倒,并实现最先进的业绩。