In real-world applications, game-theoretic algorithms often interact with imperfect opponents, and incorporating opponent models into the algorithms can significantly improve performance. Opponent exploitation approaches often use the best response or robust response to compute counter-strategy to an opponent model updated during the game-play or to build a portfolio of exploitative strategies beforehand. However, in massive games with imperfect information, computing exact responses is intractable. Existing approaches for best response approximation are either domain-specific or require an extensive computation for every opponent model. Furthermore, there is no approach that can compute robust responses in massive games. We propose using depth-limited solving with optimal value function to approximate the best response and restricted Nash response. Both approaches require computing the value function beforehand, but then allow computing the responses quickly even to previously unseen opponents. Furthermore, we provide a utility lower bound for both approaches and a safety guarantee for the robust response. Our best response approach can also be used for evaluating the quality of strategies computed by novel algorithms through approximating exploitability. We empirically evaluate the approaches in terms of gain and exploitability, compare the depth-limited responses with the poker-specific local best response, and show the robust response indeed has an upper bound on exploitability.
翻译:在现实应用中,游戏理论算法往往与不完善的对手发生互动,并将对手模型纳入算法可以显著改善性能。反对性开发方法经常使用最佳反应或有力反应来计算游戏游戏期间更新的对手模型的反战略或事先建立一系列剥削性战略组合。然而,在信息不完善的大规模游戏中,计算准确反应是棘手的。最佳反应近似现有方法要么是针对域的,要么需要对每个对手模型进行广泛计算。此外,在大规模游戏中,没有能够计算强有力反应的方法。我们建议使用最优价值的深度解决功能来接近最佳反应和限制纳什的反应。两种方法都要求事先计算价值功能,然后允许快速计算反应,甚至对以前看不见的反对者。此外,我们为方法和安全可靠反应提供了较低的约束。我们的最佳反应方法也可以用来评估新算出的策略的质量,通过相近一致的利用性来计算。我们从经验角度来评估得失和可利用性的方法,将深度反应与最精确的当地最佳反应加以比较。