Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the \textit{Enforceable Payoff Frontier} (EPF) -- a generalization of the state value function for general-sum games. We approximate the optimal \textit{Stackelberg extensive-form correlated equilibrium} by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.
翻译:函数近似( FA) 是解决大型零和游戏的关键组成部分 。 然而, 在解决 & textit{ Complication- sum} 大型游戏时, 很少关注 FA, 尽管人们普遍认为这些游戏在计算上比完全竞争或合作的对等游戏更具挑战性。 关键的挑战在于, 对于普通游戏中的许多平衡性来说, 不存在与 Markov 决策程序和零和游戏中使用的国家价值函数的简单相似性。 在本文中, 我们提议学习\ textit{ compable payoff Frontier} (EPF) -- -- 普通和游戏的国家价值函数的概括化。 我们通过使用适当的备份操作和损失功能代表 EPF 来接近最佳的 \ textit{ Stackelberg 广泛- 相形关联性平衡 。 这是第一个将 FA 应用到 Stackelberg 设置的方法, 允许我们在享有基于 FA 错误的绩效保证的同时, 将游戏缩放大得多 。 此外, 我们提议的方法保证了激励兼容性, 并且容易评估 并无需依赖 自己或近最佳反应或最佳反应 。