Predicting multimodal future behavior of traffic participants is essential for robotic vehicles to make safe decisions. Existing works explore to directly predict future trajectories based on latent features or utilize dense goal candidates to identify agent's destinations, where the former strategy converges slowly since all motion modes are derived from the same feature while the latter strategy has efficiency issue since its performance highly relies on the density of goal candidates. In this paper, we propose Motion TRansformer (MTR) framework that models motion prediction as the joint optimization of global intention localization and local movement refinement. Instead of using goal candidates, MTR incorporates spatial intention priors by adopting a small set of learnable motion query pairs. Each motion query pair takes charge of trajectory prediction and refinement for a specific motion mode, which stabilizes the training process and facilitates better multimodal predictions. Experiments show that MTR achieves state-of-the-art performance on both the marginal and joint motion prediction challenges, ranking 1st on the leaderboards of Waymo Open Motion Dataset. Code will be available at https://github.com/sshaoshuai/MTR.
翻译:现有工作探索直接预测基于潜在特征的未来轨迹,或利用密集目标候选人确定代理人目的地,因为所有运动模式都源自同一特征,而后一种战略由于高度依赖目标候选人的密度而具有效率问题,前者具有效率问题;在本文件中,我们提议运动TRansex(MTR)框架,将运动预测作为全球意图本地化和当地流动改进的共同优化模式。除了使用目标候选人外,中期审查还采用空间意图前科,采用一套小的可学习运动查询对配方。每个运动对口对口负责对具体运动模式的轨迹预测和完善,以稳定培训过程,促进更好的多式联运预测。实验显示中期审查在边缘和联合运动预测挑战上都取得了最新业绩,在Waymo Open Motion数据集的领导板上排名第1位。代码将在https://github.com/sshaoshui/MTR上查阅。