Occupancy maps are widely recognized as an efficient method for facilitating robot motion planning in static environments. However, for intelligent vehicles, occupancy of both the present and future moments is required to ensure safe driving. In the automotive industry, the accurate and continuous prediction of future occupancy maps in traffic scenarios remains a formidable challenge. This paper investigates multi-sensor spatio-temporal fusion strategies for continuous occupancy prediction in a systematic manner. This paper presents FusionMotion, a novel bird's eye view (BEV) occupancy predictor which is capable of achieving the fusion of asynchronous multi-sensor data and predicting the future occupancy map with variable time intervals and temporal horizons. Remarkably, FusionMotion features the adoption of neural ordinary differential equations on recurrent neural networks for occupancy prediction. FusionMotion learns derivatives of BEV features over temporal horizons, updates the implicit sensor's BEV feature measurements and propagates future states for each ODE step. Extensive experiments on large-scale nuScenes and Lyft L5 datasets demonstrate that FusionMotion significantly outperforms previous methods. In addition, it outperforms the BEVFusion-style fusion strategy on the Lyft L5 dataset while reducing synchronization requirements. Codes and models will be made available.
翻译:在汽车业,准确和持续地预测今后交通情况中的占用图仍然是一项艰巨的挑战。本文调查了以系统方式连续占用预测的多传感器空间时空聚合战略。本文展示了“融合”-鸟眼视(BEV)占用预测,它能够将非同步多传感器数据融合起来,并用不同的时间间隔和时间范围预测未来占用图。很显然,“融合”是指在经常的占用预测神经网络上采用神经普通差异方程式。“变动”学习BEV特征在时间范围上的衍生物,更新隐性传感器的BEV特征测量,并传播每个步骤的未来状态。大规模核同步多传感器和Lyft L5数据集的广泛实验将显示Fisionmotion大大超越了固定时间间隔和时空视野战略。“变换”模式将缩小现有变异模式。