The problem of multimodal intent and trajectory prediction for human-driven vehicles in parking lots is addressed in this paper. Using models designed with CNN and Transformer networks, we extract temporal-spatial and contextual information from trajectory history and local bird's eye view (BEV) semantic images, and generate predictions about intent distribution and future trajectory sequences. Our methods outperforms existing models in accuracy, while allowing an arbitrary number of modes, encoding complex multi-agent scenarios, and adapting to different parking maps. In addition, we present the first public human driving dataset in parking lot with high resolution and rich traffic scenarios for relevant research in this field.
翻译:本文讨论了停车场内载人驾驶车辆的多式联运意图和轨迹预测问题。我们使用CNN和变形器网络设计的模型,从轨迹历史和当地鸟类眼观(BEV)语义图像中提取时间空间和背景信息,并对意图分布和未来轨迹序列作出预测。我们的方法比现有模型准确性强得多,同时允许任意使用若干种模式,对复杂的多剂情景进行编码,并适应不同的泊车地图。此外,我们还在停车场展示第一套公用载人驾驶数据集,高分辨率和丰富的交通情况,用于这一领域的相关研究。