A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform. The object association leverages quasi-dense similarity learning to identify objects in various poses and viewpoints with appearance cues only. After initial 2D association, we further utilize 3D bounding boxes depth-ordering heuristics for robust instance association and motion-based 3D trajectory prediction for re-identification of occluded vehicles. In the end, an LSTM-based object velocity learning module aggregates the long-term trajectory information for more accurate motion extrapolation. Experiments on our proposed simulation data and real-world benchmarks, including KITTI, nuScenes, and Waymo datasets, show that our tracking framework offers robust object association and tracking on urban-driving scenarios. On the Waymo Open benchmark, we establish the first camera-only baseline in the 3D tracking and 3D detection challenges. Our quasi-dense 3D tracking pipeline achieves impressive improvements on the nuScenes 3D tracking benchmark with near five times tracking accuracy of the best vision-only submission among all published methods. Our code, data and trained models are available at https://github.com/SysCV/qd-3dt.
翻译:可靠和准确的三维跟踪框架对于预测周围物体的未来位置和规划观察员在自主驾驶等多种应用中的行动至关重要。我们提议了一个框架,可以有效地将移动物体随时间移动联系起来,并从移动平台上摄取的2D图像序列中估计其完整的三维约束框信息。目标协会利用半临界相似性学习来识别不同形状和观点中的物体,并仅使用外观提示。在最初的二维联系之后,我们进一步利用三维捆绑框深度排序超常模型,以建立强有力的实例关联和基于运动的三维轨迹预测,以重新确定隐蔽的车辆。最后,基于LSTM的三维天体速度学习模块将长期轨迹信息汇总起来,以便更准确地进行运动外推。关于我们拟议的模拟数据和现实世界基准的实验,包括KITTI、nuScenes和Waymo数据集,表明我们的跟踪框架为城市驱动情景提供了强有力的物体关联和跟踪。在Waymo Othermo Oble公开基准上,我们建立了在三维-D跟踪和三维-C跟踪所有可获取的轨迹跟踪的三维S。