Predicting the future behavior of road users is one of the most challenging and important problems in autonomous driving. Applying deep learning to this problem requires fusing heterogeneous world state in the form of rich perception signals and map information, and inferring highly multi-modal distributions over possible futures. In this paper, we present MultiPath++, a future prediction model that achieves state-of-the-art performance on popular benchmarks. MultiPath++ improves the MultiPath architecture by revisiting many design choices. The first key design difference is a departure from dense image-based encoding of the input world state in favor of a sparse encoding of heterogeneous scene elements: MultiPath++ consumes compact and efficient polylines to describe road features, and raw agent state information directly (e.g., position, velocity, acceleration). We propose a context-aware fusion of these elements and develop a reusable multi-context gating fusion component. Second, we reconsider the choice of pre-defined, static anchors, and develop a way to learn latent anchor embeddings end-to-end in the model. Lastly, we explore ensembling and output aggregation techniques -- common in other ML domains -- and find effective variants for our probabilistic multimodal output representation. We perform an extensive ablation on these design choices, and show that our proposed model achieves state-of-the-art performance on the Argoverse Motion Forecasting Competition and the Waymo Open Dataset Motion Prediction Challenge.
翻译:预测道路使用者的未来行为是自主驾驶中最具挑战性和最重要的问题之一。 对这一问题应用深层次的学习要求以丰富的感知信号和地图信息的形式,使世界差异化,并推断出对未来可能的未来的高度多模式分布。 在本文中,我们展示了多帕特++,一个未来预测模型,在大众基准上实现最先进的业绩。多帕特++通过重新审视许多设计选择来改进多帕特结构。第一个关键设计差异是,从大量基于图像的输入世界状态编码偏离了输入世界状态,而偏向于分散的不同场景元素编码:多帕特+ 消耗了描述道路特征的契约和高效的多行线,以及直接的原始剂状态信息(例如位置、速度、加速)。 我们提出一个环境认知的多帕特+,使这些元素在大众基准上实现最先进的业绩。多盘化的多盘化组合组件部分。第二,我们重新考虑了预先定义的固定锚定的模型的选择,并开发了一种学习在模型中最终嵌定点定位的方法。 最后,我们探索了快速和高效的多路路段, 展示了我们这些模型的模型的模型和产出的模型,展示,我们为这些模型的模型的模型的模型,我们展示了一种可选取的模型和产出的模型,展示了一种可选取。