利用差异向后上演,解决结构化的等级级运动会 (Solving Structured Hierarchical Games Using Differential Backward Induction)

Many real-world systems possess a hierarchical structure where a strategic plan is forwarded and implemented in a top-down manner. Examples include business activities in large companies or policy making for reducing the spread during pandemics. We introduce a novel class of games that we call structured hierarchical games (SHGs) to capture these strategic interactions. In an SHG, each player is represented as a vertex in a multi-layer decision tree and controls a real-valued action vector reacting to orders from its predecessors and influencing its descendants' behaviors strategically based on its own subjective utility. SHGs generalize extensive form games as well as Stackelberg games. For general SHGs with (possibly) nonconvex payoffs and high-dimensional action spaces, we propose a new solution concept which we call local subgame perfect equilibrium. By exploiting the hierarchical structure and strategic dependencies in payoffs, we derive a back propagation-style gradient-based algorithm which we call Differential Backward Induction to compute an equilibrium. We theoretically characterize the convergence properties of DBI and empirically demonstrate a large overlap between the stable points reached by DBI and equilibrium solutions. Finally, we demonstrate the effectiveness of our algorithm in finding \emph{globally} stable solutions and its scalability for a recently introduced class of SHGs for pandemic policy making.

翻译：许多现实世界系统都有一个等级结构,即战略计划以自上而下的方式提交和执行,例子包括大型公司的业务活动或减少流行病传播的政策,我们引入了新型游戏,我们称之为结构等级游戏(SHGs)以捕捉这些战略互动。在SHG中,每个玩家在多层次决策树中都代表为顶点,并控制着一种真正价值的行动矢量,即对其前身的命令作出反应,并根据其主观效用战略性地影响后代的行为。SHGs将广泛的形式游戏以及Stackelberg游戏普遍化。对于具有(可能)非康克斯支付和高维度行动空间的一般SHGs,我们提出了一种新的解决方案概念,即我们称之为局部平衡。通过利用等级结构和在报酬方面的战略依赖性,我们获得了一种反向传播式的梯度算法,我们称之为不同的后向感调,以计算平衡性。我们理论上将DBI的趋同性特性和实验性地表明DBI所达成的稳定点与平衡性解决方案之间的大量重叠。我们最近为Gsmagal 找到一个稳定的全球解决方案的效能。