Mahjong is a popular multi-player imperfect-information game developed in China in the late 19th-century, with some very challenging features for AI research. Sanma, being a 3-player variant of the Japanese Riichi Mahjong, possesses unique characteristics including fewer tiles and, consequently, a more aggressive playing style. It is thus challenging and of great research interest in its own right, but has not yet been explored. In this paper, we present Meowjong, an AI for Sanma using deep reinforcement learning. We define an informative and compact 2-dimensional data structure for encoding the observable information in a Sanma game. We pre-train 5 convolutional neural networks (CNNs) for Sanma's 5 actions -- discard, Pon, Kan, Kita and Riichi, and enhance the major action's model, namely the discard model, via self-play reinforcement learning using the Monte Carlo policy gradient method. Meowjong's models achieve test accuracies comparable with AIs for 4-player Mahjong through supervised learning, and gain a significant further enhancement from reinforcement learning. Being the first ever AI in Sanma, we claim that Meowjong stands as a state-of-the-art in this game.
翻译:Mahjong是一个广受欢迎的多玩者不完善的信息游戏,中国在19世纪末期开发了这种游戏,该游戏具有一些对AI研究具有挑战性的特点。 Sanma是日本Riichi Mahjong的3位玩家变体,具有独特的特点,包括瓷砖更少,因此更具有更积极的游戏风格。 因此,它本身具有挑战性,具有极大的研究兴趣,但还没有探索。 在本文中,我们用深厚的强化学习为Sanma展示了Meowjong的AI。我们定义了一个信息丰富和紧凑的二维数据结构,用于在Sanma游戏中将可见的信息编码。我们为Sanma的5个行动(弃鱼、Pon、Kan、Kita和Riichi)预设了5个革命性神经网络(CNNs),为丢弃物、Pon、Kan、Kita和Riichi等5个行动模式提供了独特的特点。 通过使用Monte Carlo 政策梯度方法进行自我强化学习,Meowjong的模型取得了与AI的测试, 通过监督学习, Mahjong 和从加强学习获得显著的增强。 我们在Sanma 的Alt 中一直保持了这一游戏。