Card game AI has always been a hot topic in the research of artificial intelligence. In recent years, complex card games such as Mahjong, DouDizhu and Texas Hold'em have been solved and the corresponding AI programs have reached the level of human experts. In this paper, we are devoted to developing an AI program for a more complex card game, GuanDan, whose rules are similar to DouDizhu but much more complicated. To be specific, the characteristics of large state and action space, long length of one episode and the unsure number of players in the GuanDan pose great challenges for the development of the AI program. To address these issues, we propose the first AI program DanZero for GuanDan using reinforcement learning technique. Specifically, we utilize a distributed framework to train our AI system. In the actor processes, we carefully design the state features and agents generate samples by self-play. In the learner process, the model is updated by Deep Monte-Carlo Method. After training for 30 days using 160 CPUs and 1 GPU, we get our DanZero bot. We compare it with 8 baseline AI programs which are based on heuristic rules and the results reveal the outstanding performance of DanZero. We also test DanZero with human players and demonstrate its human-level performance.
翻译:AI游戏一直是人工智能研究中的一个热题。 近年来,像麻将、杜迪祖和得克萨斯霍尔德等复杂的纸牌游戏已经解决,相应的AI方案已经达到人类专家的水平。 在本文中,我们致力于为更复杂的纸牌游戏制定AI方案,即光丹,其规则类似于杜迪祖,但复杂得多。具体来说,大型州和行动空间的特点,一集长长,以及广丹的不肯定的玩家数量,对开发AI方案提出了巨大挑战。为了解决这些问题,我们建议使用强化学习技术为广丹实施第一个AI方案丹泽罗。具体地说,我们利用一个分布式框架来培训我们的AI系统。在演员程序中,我们仔细设计州性特征和代理物以自我游戏方式生成样本。在学习过程中,该模型由Deep Monte-Carlo方法更新。在培训了30天之后,使用了160个CPU和1个GPU,我们得到了我们的DanZero机器人。我们用8个基准的AI方案来比较这些问题。我们用强化学习技术来解决这些问题。我们用8个广丹丹丹为广域软件程序,我们用它与8个标准展示了人类的光区测试结果。我们还展示了人类的光区高级测试结果。