超越精英人类：通过自我对弈与强化学习掌握骗子扑克 (Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning)

AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.

翻译：长期以来，人工智能研究者一直将扑克类游戏视为测试多玩家动态、不完全信息及不确定性下推理能力的基准环境。尽管近期突破性进展已在无限注德州扑克中达到精英人类水平，但多玩家动态表现较为平缓：多数牌局迅速收敛，仅有两名玩家通过多轮叫注参与博弈。本文提出Solly——首个在简化版骗子扑克中实现精英人类水平的人工智能智能体，该游戏以广泛的多玩家参与为特征。我们采用无模型、演员-评论家架构的深度强化学习算法，通过自我对弈训练Solly。在单挑与多玩家骗子扑克中，Solly以胜率（赢得超过50%的牌局）与权益（赢取金额）衡量均达到精英人类水平。Solly在相同指标上也超越了包括具备推理能力的大型语言模型（LLMs）。Solly发展出新颖的叫注策略，有效实现随机化博弈，且不易被世界级人类玩家利用。