实现更好平衡的协调追随者:斯塔克尔贝格运动会的端至端至端梯底 (Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games)

A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers' best response as constraints in the leader's optimization problem. These reformulation approaches can sometimes be effective, but often get trapped in low-quality solutions when followers' objectives are non-linear or non-quadratic. Moreover, these approaches assume a unique equilibrium or a specific equilibrium concept, e.g., optimistic or pessimistic, which is a limiting assumption in many situations. To overcome these limitations, we propose a stochastic gradient descent--based approach, where the leader's strategy is updated by differentiating through the followers' best responses. We frame the leader's optimization as a learning problem against followers' equilibrium, which allows us to decouple the followers' equilibrium constraints from the leader's problem. This approach also addresses cases with multiple equilibria and arbitrary equilibrium selection procedures by back-propagating through a sampled Nash equilibrium. To this end, this paper introduces a novel concept called equilibrium flow to formally characterize the set of equilibrium selection processes where the gradient with respect to a sampled equilibrium is an unbiased estimate of the true gradient. We evaluate our approach experimentally against existing baselines in three Stackelberg problems with multiple followers and find that in each case, our approach is able to achieve higher utility for the leader.

翻译：游戏理论中越来越多的工作将传统的 Stackelberg 游戏扩展为与一位领导人和多名追随者一起玩纳什均衡的游戏。这些游戏中计算平衡的标准方法将追随者的最佳反应作为领导者优化问题的制约因素重新排列。这些重新制定的方法有时会有效,但当追随者的目标不是线性或非线性时,往往会陷入低质量的解决方案中。此外, 这些方法还假设一种独特的平衡或特定平衡概念,例如乐观或悲观,这在许多情况下是一种有限的假设。为了克服这些限制,我们建议了一种基于梯度梯度下行的平衡标准方法,在这个方法中,领导者的战略通过对追随者最佳反应的制约来更新。我们把领导者的优化作为学习问题来对付追随者平衡的问题,这使我们能够将追随者平衡的制约与领导者的问题进行调和。这个方法还涉及多种平衡性选择方法,通过抽样的纳什均衡进行反向反向分析,因此,本文提出了一种叫得偏向梯度的基于梯位性梯度的新概念,通过追随者的最佳反应来正式地确定我们当前标准级标准级选择过程。