Robots and autonomous systems must interact with one another and their environment to provide high-quality services to their users. Dynamic game theory provides an expressive theoretical framework for modeling scenarios involving multiple agents with differing objectives interacting over time. A core challenge when formulating a dynamic game is designing objectives for each agent that capture desired behavior. In this paper, we propose a method for inferring parametric objective models of multiple agents based on observed interactions. Our inverse game solver jointly optimizes player objectives and continuous-state estimates by coupling them through Nash equilibrium constraints. Hence, our method is able to directly maximize the observation likelihood rather than other non-probabilistic surrogate criteria. Our method does not require full observations of game states or player strategies to identify player objectives. Instead, it robustly recovers this information from noisy, partial state observations. As a byproduct of estimating player objectives, our method computes a Nash equilibrium trajectory corresponding to those objectives. Thus, it is suitable for downstream trajectory forecasting tasks. We demonstrate our method in several simulated traffic scenarios. Results show that it reliably estimates player objectives from a short sequence of noise-corrupted partial state observations. Furthermore, using the estimated objectives, our method makes accurate predictions of each player's trajectory.
翻译:机器人和自主系统必须彼此互动, 并彼此环境互动, 以向用户提供高质量的服务。 动态游戏理论为模型假设情景提供了一个清晰的理论框架, 模型假设涉及多个代理, 不同目标随时间互动。 制定动态游戏的核心挑战就是为每个代理设计目标, 捕捉想要的行为。 在本文中, 我们提出一种方法, 用来根据观察到的互动来推断多个代理的参数的参数的参数性模型。 我们的反向游戏求解器通过 Nash 平衡限制, 共同优化播放器目标和连续状态估算。 因此, 我们的方法能够直接最大化观测概率, 而不是其他非概率替代标准。 我们的方法不需要对游戏状态或玩家战略进行充分观察, 来识别玩家目标。 相反, 它从噪音、 部分状态观察中有力地恢复了这些信息。 作为估算玩家目标的副产品, 我们的方法可以计算出与这些目标相匹配的纳什平衡轨迹。 因此, 它适合下游轨道预报任务。 我们在若干模拟的交通状况中展示了我们的方法。 结果显示, 它可靠地估计了播放员的目标来自每个噪振状态部分观察过程的短序。