Predicting multiple plausible future trajectories of the nearby vehicles is crucial for the safety of autonomous driving. Recent motion prediction approaches attempt to achieve such multimodal motion prediction by implicitly regularizing the feature or explicitly generating multiple candidate proposals. However, it remains challenging since the latent features may concentrate on the most frequent mode of the data while the proposal-based methods depend largely on the prior knowledge to generate and select the proposals. In this work, we propose a novel transformer framework for multimodal motion prediction, termed as mmTransformer. A novel network architecture based on stacked transformers is designed to model the multimodality at feature level with a set of fixed independent proposals. A region-based training strategy is then developed to induce the multimodality of the generated proposals. Experiments on Argoverse dataset show that the proposed model achieves the state-of-the-art performance on motion prediction, substantially improving the diversity and the accuracy of the predicted trajectories. Demo video and code are available at https://decisionforce.github.io/mmTransformer.
翻译:近期的动态预测方法试图实现这种多式联运预测,其方式是隐含地规范特征或明确提出多个候选提案;然而,由于潜在特征可能集中于数据最常使用的模式,而基于提案的方法在很大程度上依赖于事先的知识来产生和选择提案,因此仍然具有挑战性。在这项工作中,我们提议了一个称为毫米Transexter的新的多式联运预测变压器框架。基于堆叠式变压器的新颖的网络结构设计了一套固定的独立提案,在地物一级模拟多式联运。然后,制定了一个基于区域的培训战略,以促成所产生提案的多式联运。Argoverse数据集实验显示,拟议的模型在运动预测上取得了最新业绩,大大改进了预测轨迹的多样性和准确性。Demo视频和代码可在https://deforce.github.io/mmTransfrenchen查阅。