零-南不完全信息运动会中深度限制溶解的值值函数 (Value Functions for Depth-Limited Solving in Zero-Sum Imperfect-Information Games)

We provide a formal definition of depth-limited games together with an accessible and rigorous explanation of the underlying concepts, both of which were previously missing in imperfect-information games. The definition works for an arbitrary extensive-form game and is not tied to any specific game-solving algorithm. Moreover, this framework unifies and significantly extends three approaches to depth-limited solving that previously existed in extensive-form games and multiagent reinforcement learning but were not known to be compatible. A key ingredient of these depth-limited games are value functions. Focusing on two-player zero-sum imperfect-information games, we show how to obtain optimal value functions and prove that public information provides both necessary and sufficient context for computing them. We provide a domain-independent encoding of the domains that allows for approximating value functions even by simple feed-forward neural networks, which are then able to generalize to unseen parts of the game. We use the resulting value network to implement a depth-limited version of counterfactual regret minimization. In three distinct domains, we show that the algorithm's exploitability is roughly linearly dependent on the value network's quality and that it is not difficult to train a value network with which depth-limited CFR's performance is as good as that of CFR with access to the full game.

翻译：我们正式定义了深度限制游戏,并严格解释了基础概念,这些概念以前在不完善的信息游戏中都缺少。该定义适用于任意的广效游戏,没有与任何具体的游戏解码算法挂钩。此外,这一框架统一并大大扩展了三种深度限制解决方案的方法,这些方法以前存在于广度游戏和多试剂强化学习中,但并不兼容。这些深度限制游戏的一个关键要素是价值功能。侧重于两个玩家零和不完善的信息游戏,我们展示了如何获得最佳价值功能,并证明公共信息为计算这些功能提供了必要和足够的背景。我们提供了一个域域独立编码,允许对价值功能进行近似化,即使是通过简单的进进-进-进-进-进-进-进-进式神经网络,然后能够向游戏的隐秘部分推广。我们利用由此产生的价值网络来实施一个深度限制反事实遗憾最小化的版本。在三个不同的领域,我们显示算法的可利用性大致上线性依赖价值网络的质量,并且证明公共信息为计算提供了必要的和足够的环境-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-入-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进-进