学习与未知目标的对手一起玩轨迹游戏 (Learning to Play Trajectory Games Against Opponents with Unknown Objectives)

Many autonomous agents, such as intelligent vehicles, are inherently required to interact with one another. Game theory provides a natural mathematical tool for robot motion planning in such interactive settings. However, tractable algorithms for such problems usually rely on a strong assumption, namely that the objectives of all players in the scene are known. To make such tools applicable for ego-centric planning with only local information, we propose an adaptive model-predictive game solver, which jointly infers other players' objectives online and computes a corresponding generalized Nash equilibrium (GNE) strategy. The adaptivity of our approach is enabled by a differentiable trajectory game solver whose gradient signal is used for maximum likelihood estimation (MLE) of opponents' objectives. This differentiability of our pipeline facilitates direct integration with other differentiable elements, such as neural networks (NNs). Furthermore, in contrast to existing solvers for cost inference in games, our method handles not only partial state observations but also general inequality constraints. In two simulated traffic scenarios, we find superior performance of our approach over both existing game-theoretic methods and non-game-theoretic model-predictive control (MPC) approaches. We also demonstrate our approach's real-time planning capabilities and robustness in two hardware experiments.

翻译：许多自主的代理人,例如智能飞行器,都必然需要彼此互动。游戏理论为在这种互动环境中的机器人运动规划提供了自然数学工具。然而,这些问题的可移植算法通常依赖于一个强有力的假设,即现场所有参与者的目标都已经为人所知。为了使这些工具适用于仅掌握当地信息的以自我为中心的规划,我们提议了一个适应性模型预测游戏求解器,它联合推断其他参与者的在线目标,并计算出相应的普遍纳什平衡(GNE)战略。在两种模拟交通假设中,我们的方法的适应性是由不同的轨迹游戏求解器所促成的,其梯度信号被用于对对手的目标进行最大可能性的估计(MLE)。我们的管道的可移植性促进了与其他不同要素的直接整合,例如神经网络。此外,与游戏成本推断的现有解答器相比,我们的方法不仅处理部分的国家观测,而且还处理一般不平等制约。在两种模拟交通假设中,我们的方法的优异性表现于现有的游戏-理论方法和非游戏-理论模型-模型-模型-模型-模型-模型-模型-模型-规划能力。