In imperfect-information games, subgame solving is significantly more challenging than in perfect-information games, but in the last few years, such techniques have been developed. They were the key ingredient to the milestone of superhuman play in no-limit Texas hold'em poker. Current subgame-solving techniques analyze the entire common-knowledge closure of the player's current information set, that is, the smallest set of nodes within which it is common knowledge that the current node lies. However, this set is too large to handle in many games. We introduce an approach that overcomes this obstacle, by instead working with only low-order knowledge. Our approach allows an agent, upon arriving at an infoset, to basically prune any node that is no longer reachable, thereby massively reducing the game tree size relative to the common-knowledge subgame. We prove that, as is, our approach can increase exploitability compared to the blueprint strategy. However, we develop three avenues by which safety can be guaranteed. First, safety is guaranteed if the results of subgame solves are incorporated back into the blueprint. Second, we provide a method where safety is achieved by limiting the infosets at which subgame solving is performed. Third, we prove that our approach, when applied at every infoset reached during play, achieves a weaker notion of equilibrium, which we coin affine equilibrium, and which may be of independent interest. We show that affine equilibria cannot be exploited by any Nash strategy of the opponent, so an opponent who wishes to exploit must open herself to counter-exploitation. Even without the safety-guaranteeing additions, experiments on medium-sized games show that our approach always reduced exploitability even when applied at every infoset, and a depth-limited version of it led to--to our knowledge--the first strong AI for the massive challenge problem dark chess.
翻译:在不完善的信息游戏中,子游戏的解决比完美信息游戏更具挑战性,但是在过去几年中,这些技术已经开发出来。它们是超人游戏里程碑的关键成分,在不限制的得克萨斯州握住扑克。当前子游戏的解决技术分析了玩家当前一组信息的全部共同知识封闭,也就是,我们的方法可以提高当前节点所在的利用率。然而,这个组合在许多游戏中是无法处理的。我们引入了一种克服这一障碍的方法,而不是仅仅依靠低级知识。我们的方法允许一种代理,在到达一个信息赛时,基本上可以淡化任何无法再达到的特人游戏节点。当前子游戏的解决技术分析了玩家当前一组信息游戏的全部共同知识封闭,也就是,也就是,我们的方法可以提高利用性。然而,我们开发了三个可以保证安全性的途径。首先,如果将任何次级的深度方法纳入到蓝图中去,我们提供了一种替代方法,在到达了一个不固定的节点时,我们提供了一种折变平的方法,在每一个变平的游戏中都会显示一个更低的汇率,我们在每一个变黑的游戏中都会显示一个更稳定的方式。