In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players' objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observation data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game.
翻译:在多试剂动态游戏中,每种物剂的纳什平衡状态轨迹取决于其成本函数和游戏的信息模式。然而,其他物剂可能无法使用每种物剂的成本和轨迹。先前关于使用部分观察来推断动态游戏的成本的工作假定了一个开放环信息模式。在这项工作中,我们证明,反馈纳什平衡概念更能表达,编码更复杂。可取的做法是开发具体工具,用以在反馈游戏中推断玩家的目标。因此,我们认为,在反馈信息模式下,只使用部分状态观察和不完整的轨迹数据,可以发现动态游戏的成本推断问题。为此,我们首先提出反向反馈游戏损失功能,其最小化能产生与观察数据最接近的反馈纳什平衡状态轨迹。我们描述损失功能的景观和差异性。鉴于难以获得准确的梯度,我们的主要贡献是一种高效的梯度对吸附剂,它能产生新的反向反馈游戏解算器,用第一阶的优化来尽量减少损失。在彻底的经验评估中,我们证明我们的算法可靠地反映了我们算算算法的稳妥性,并且更能地反映了纳什游戏参与者的基线的反馈方法。