Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.
翻译:翻译后的摘要:
准确和鲁棒的邻近智能体轨迹预测对于在复杂场景中行驶的自动驾驶汽车至关重要。为了编码复杂交互的强大功能,近年来大多数提出的方法都是基于深度学习的。然而,它们通常对过去的观察结果具有很大的依赖性,而且无法有效地捕捉来自稀疏样本的瞬态和应变性交互。本文提出了一种多智能体轨迹预测的分层混合框架,其中融合了深度学习 (DL) 和强化学习 (RL) 来处理由多尺度交互塑造的运动的挑战。在深度学习阶段,基于交通场景,将其划分为多个中间尺度异构图,采用基于Transformer的图神经网络对中间和全局级别的异构交互进行编码。在强化学习阶段,利用在深度学习阶段预测的关键未来点,将交通场景分成本地子场景。为了模拟运动规划过程,生成轨迹预测,设计了一个基于Transformer的PPO策略梯度方法 (Proximal Policy Optimization),并结合了车辆运动学模型来规划微观交互的主导影响下的运动。设计了多目标奖励,平衡了面向智能体的精度和面向场景的兼容性。实验结果表明,我们的提议与Argoverse预测基准测试的最新技术水平相匹配。可视化结果显示,分层学习框架捕捉了多尺度交互,并提高了预测轨迹的可行性和规范性。