Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.
翻译:准确、可靠地预测邻近智能体的轨迹对于自主车辆穿越复杂场景至关重要。近年来提出的大多数方法都是基于深度学习的,因为它们在编码复杂交互方面非常强大。然而,由于它们过度依赖于过去的观察结果,因此生成了不真实的预测,不能有效地捕捉来自稀疏样本的瞬态和偶然交互。为了解决预测由多尺度交互塑造的运动的挑战,本文提出了一种深度学习(DL)和强化学习(RL)的分层混合框架,用于多智能体轨迹预测。在DL阶段,根据交通场景,将场景分为多个中等规模的异构图,并采用Transformer样式GNN在中等和全局级别对异构交互进行编码。在RL阶段,使用DL阶段预测的关键未来点将交通场景划分为本地子场景。为了模拟运动规划过程以产生轨迹预测,设计了一种基于Transformer的Proximal Policy Optimization(PPO),并将车辆运动学模型融合其中,以在微观交互的主导影响下规划运动。设计了多目标奖励,以平衡代理人中心化的准确性和场景整体的兼容性。实验结果表明,我们的提案与Argoverse预测基准的最新方法相匹配。由可视化结果展示,分层学习框架捕捉了多尺度交互,并提高了预测轨迹的可行性和符合性。