Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the \textit{Enforceable Payoff Frontier} (EPF) -- a generalization of the state value function for general-sum games. We approximate the optimal \textit{Stackelberg extensive-form correlated equilibrium} by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.
翻译:函数逼近(FA)一直是解决大型零和博弈的关键组成部分。然而,尽管广泛被认为比其全面竞争或合作的对应物更具计算上的挑战性,但很少关注在使用 FA 解决广义和博弈(general-sum games)中。一个关键的挑战是对于许多广义和博弈中的均衡,不存在类似于马尔可夫决策过程和零和博弈中使用的状态值函数的简单类比。在本文中,我们提出使用神经网络学习“可执行收益(Enforceable Payoff)边界”(EPF),它是广义和博弈对状态值函数的一种泛化。我们通过以合适的备份操作和损失函数来表示 EPFs 并对其进行训练,从而逼近最优斯塔克尔贝格广义博弈均衡。这是第一种将 FA 应用于斯塔克尔伯格设置的方法,使我们能够将规模扩大到更大的比赛,同时仍然享有基于 FA 错误的性能保证。此外,我们提出的方法保证激励兼容性,并且易于评估,无需依赖自我游戏或近似最佳响应预言机。