In this paper, we study the problem of multiple stochastic agents interacting in a dynamic game scenario with continuous state and action spaces. We define a new notion of stochastic Nash equilibrium for boundedly rational agents, which we call the Entropic Cost Equilibrium (ECE). We show that ECE is a natural extension to multiple agents of Maximum Entropy optimality for single agents. We solve both the "forward" and "inverse" problems for the multi-agent ECE game. For the forward problem, we provide a Riccati algorithm to compute closed-form ECE feedback policies for the agents, which are exact in the Linear-Quadratic-Gaussian case. We give an iterative variant to find locally ECE feedback policies for the nonlinear case. For the inverse problem, we present an algorithm to infer the cost functions of the multiple interacting agents given noisy, boundedly rational input and state trajectory examples from agents acting in an ECE. The effectiveness of our algorithms is demonstrated in a simulated multi-agent collision avoidance scenario, and with data from the INTERACTION traffic dataset. In both cases, we show that, by taking into account the agents' game theoretic interactions using our algorithm, a more accurate model of agents' costs can be learned, compared with standard inverse optimal control methods.
翻译:在本文中,我们研究了在动态游戏场景中与连续状态和动作空间互动的多个随机剂的问题。我们为封闭性理性剂定义了一种新的“随机纳什平衡”概念,我们称之为“Entrapic Cost Equilitrium (ECE) 。我们显示,欧洲经委会是针对单一剂最大通气优化的多种剂的自然延伸。我们解决了多试剂 ECE 游戏的“前向”和“反向”问题。关于前向问题,我们提供了一种Riccati算法,用于为这些剂计算封闭式的欧洲经委会反馈政策,这在Linear-Quadratic-Gaussian案中是准确的。我们给出了一个迭代变量,用于为非直线性制剂找到本地的欧洲经委会反馈政策。关于反向问题,我们提出了一种算法,用以推导出多个互动剂的成本功能,因为有噪音,有约束性的合理输入和状态轨迹样。对于前向问题,我们的算法的有效性表现在模拟多试剂避免碰撞的情景中,以及来自INAL-Quadtration-Quatict-Gal daltraction 数据设置的数据。在两种情况下,我们用最精确的代算方法进行比较时,我们所学的算算出一种比算方法,我们所学到最精确的代理剂,我们所学到最佳的计算方法,我们所学到最佳的计算方法,我们所学到的最佳代算。