Robots deployed to the real world must be able to interact with other agents in their environment. Dynamic game theory provides a powerful mathematical framework for modeling scenarios in which agents have individual objectives and interactions evolve over time. However, a key limitation of such techniques is that they require a-priori knowledge of all players' objectives. In this work, we address this issue by proposing a novel method for learning players' objectives in continuous dynamic games from noise-corrupted, partial state observations. Our approach learns objectives by coupling the estimation of unknown cost parameters of each player with inference of unobserved states and inputs through Nash equilibrium constraints. By coupling past state estimates with future state predictions, our approach is amenable to simultaneous online learning and prediction in receding horizon fashion. We demonstrate our method in several simulated traffic scenarios in which we recover players' preferences for, e.g., desired travel speed and collision-avoidance behavior. Results show that our method reliably estimates game-theoretic models from noise-corrupted data that closely matches ground-truth objectives, consistently outperforming state-of-the-art approaches.
翻译:部署到真实世界的机器人必须能够与其环境中的其他代理人互动。 动态游戏理论提供了一个强大的数学框架, 用于模拟各种假设情景, 使代理人有个人目标和互动随时间演变。 然而, 此类技术的关键局限性在于它们需要优先了解所有玩家的目标。 在这项工作中, 我们提出一种新的方法来学习玩家在连续的动态游戏中的目标, 从噪音干扰的局部状态观测中学习。 我们的方法通过将每个玩家的未知成本参数的估算与未观测到的国家和通过纳什均衡限制的投入的推断结合起来来了解目标。 通过将过去的状态估算与未来的状态预测结合起来, 我们的方法可以同时进行在线学习和预测, 重新显示地平线速度。 我们在若干模拟的交通假设中展示了我们的方法, 恢复玩家对( 例如) 想要的旅行速度和避免碰撞的行为的偏好。 结果显示, 我们的方法可靠地估计了游戏的理论模型, 与接近地面目标的数据相近。