Deep learning has recently achieved significant progress in trajectory forecasting. However, the scarcity of trajectory data inhibits the data-hungry deep-learning models from learning good representations. While mature representation learning methods exist in computer vision and natural language processing, these pre-training methods require large-scale data. It is hard to replicate these approaches in trajectory forecasting due to the lack of adequate trajectory data (e.g., 34K samples in the nuScenes dataset). To work around the scarcity of trajectory data, we resort to another data modality closely related to trajectories-HD-maps, which is abundantly provided in existing datasets. In this paper, we propose PreTraM, a self-supervised pre-training scheme via connecting trajectories and maps for trajectory forecasting. Specifically, PreTraM consists of two parts: 1) Trajectory-Map Contrastive Learning, where we project trajectories and maps to a shared embedding space with cross-modal contrastive learning, and 2) Map Contrastive Learning, where we enhance map representation with contrastive learning on large quantities of HD-maps. On top of popular baselines such as AgentFormer and Trajectron++, PreTraM boosts their performance by 5.5% and 6.9% relatively in FDE-10 on the challenging nuScenes dataset. We show that PreTraM improves data efficiency and scales well with model size.
翻译:最近,深层学习在轨迹预测方面取得了显著进展。然而,轨迹数据稀缺的情况使数据饥饿的深层学习模式无法从良好的表现中学习。虽然在计算机视觉和自然语言处理中存在成熟的代表性学习方法,但这些培训前方法需要大规模的数据。由于缺少足够的轨迹数据,很难在轨迹预测中复制这些方法(例如,核卫星数据集中的34K样本)。为了围绕轨迹数据稀缺的情况开展工作,我们采用另一个与轨迹-HD-maps密切相关的数据模式,在现有数据集中提供了大量的数据。在本文件中,我们提议通过连接轨迹和轨迹预测地图,采用自我监督的预培训方法。具体地说,PreTRAM在轨迹预测中复制这些方法,因为缺乏足够的轨迹数据(例如核卫星数据集中的34K样本和地图),我们计划轨迹和地图匹配学习采用共同的模型嵌入空间,在现有数据集中提供了大量内容。我们建议PretraM,这是通过连接轨迹轨迹和轨迹预测的自我监督前系统测试的高级基线,显示其具有挑战性的数据。