Games have a long history of serving as a benchmark for progress in artificial intelligence. Recently, approaches using search and learning have shown strong performance across a set of perfect information games, and approaches using game-theoretic reasoning and learning have shown strong performance for specific imperfect information poker variants. We introduce Player of Games, a general-purpose algorithm that unifies previous approaches, combining guided search, self-play learning, and game-theoretic reasoning. Player of Games is the first algorithm to achieve strong empirical performance in large perfect and imperfect information games -- an important step towards truly general algorithms for arbitrary environments. We prove that Player of Games is sound, converging to perfect play as available computation time and approximation capacity increases. Player of Games reaches strong performance in chess and Go, beats the strongest openly available agent in heads-up no-limit Texas hold'em poker (Slumbot), and defeats the state-of-the-art agent in Scotland Yard, an imperfect information game that illustrates the value of guided search, learning, and game-theoretic reasoning.
翻译:游戏具有长期历史,可以作为人造智能进步的基准。最近,使用搜索和学习的方法在一系列完美的信息游戏中表现出很强的性能,而使用游戏理论推理和学习的方法则显示出具体不完善的信息扑克变体的强性。我们引入了游戏玩家,这是一种通用算法,它统一了以往的方法,结合了引导搜索、自玩学习和游戏理论推理。游戏玩家是在大型完美和不完善的信息游戏中取得强力实绩的第一个算法,这是向任意环境的真正通用算法迈出的重要一步。我们证明游戏玩家是健全的,随着计算时间和近似能力的增加,我们逐渐趋于完美。游戏玩家在棋局和棋盘中表现得非常出色,击败了德克萨斯无限制握牌(Slumbot)中最强大的公开代理,击败了苏格兰游戏场中最先进的代理,这是一个不完善的信息游戏,展示了引导搜索、学习和游戏理论推理的价值。