Often -- for example in war games, strategy video games, and financial simulations -- the game is given to us only as a black-box simulator in which we can play it. In these settings, since the game may have unknown nature action distributions (from which we can only obtain samples) and/or be too large to expand fully, it can be difficult to compute strategies with guarantees on exploitability. Recent work \cite{Zhang20:Small} resulted in a notion of certificate for extensive-form games that allows exploitability guarantees while not expanding the full game tree. However, that work assumed that the black box could sample or expand arbitrary nodes of the game tree at any time, and that a series of exact game solves (via, for example, linear programming) can be conducted to compute the certificate. Each of those two assumptions severely restricts the practical applicability of that method. In this work, we relax both of the assumptions. We show that high-probability certificates can be obtained with a black box that can do nothing more than play through games, using only a regret minimizer as a subroutine. As a bonus, we obtain an equilibrium-finding algorithm with $\tilde O(1/\sqrt{T})$ convergence rate in the extensive-form game setting that does not rely on a sampling strategy with lower-bounded reach probabilities (which MCCFR assumes). We demonstrate experimentally that, in the black-box setting, our methods are able to provide nontrivial exploitability guarantees while expanding only a small fraction of the game tree.
翻译:通常,比如在战争游戏、战略视频游戏和金融模拟中,游戏仅作为黑盒模拟器提供给我们,我们可以在其中玩游戏。在这些环境中,由于游戏可能具有未知的自然动作分布(我们只能从中获取样本)和/或过于庞大,无法全面扩展,因此很难用可开发性来计算策略。最近的工作\cite ⁇ hang20:Small}导致一个大形游戏证书的概念,它允许利用保证,而不能扩展整个游戏树。然而,这项工作假设黑盒可以在任何时间采样或扩展游戏树的任意节点,并且可以进行一系列精确的游戏解答(例如,线性编程程序)来计算证书。这两种假设都严重限制了该方法的实际适用性。在这项工作中,我们放松两种假设。我们显示,高概率证书只能通过黑盒获得,只能通过游戏来做更多的事情,而只能利用最小的最小度来复制或扩大游戏树树的任意节点,而精确的游戏解点(例如,线性编程)可以进行一系列的游戏解算(例如,我们可以在Otrbli) 的精度排序中进行一个比例的折算,我们只能在游戏中进行一个稳定的定。