Detection And Tracking of Moving Objects (DATMO) is an essential component in environmental perception for autonomous driving. While 3D detectors using surround-view cameras are just flourishing, there is a growing tendency of using different transformer-based methods to learn queries in 3D space from 2D feature maps of perspective view. This paper proposes Sparse R-CNN 3D (SRCN3D), a novel two-stage fully-convolutional mapping pipeline for surround-view camera detection and tracking. SRCN3D adopts a cascade structure with twin-track update of both fixed number of proposal boxes and proposal latent features. Proposal boxes are projected to perspective view so as to aggregate Region of Interest (RoI) local features. Based on that, proposal features are refined via a dynamic instance interactive head, which then generates classification and the offsets applied to original bounding boxes. Compared to prior arts, our sparse feature sampling module only utilizes local 2D features for adjustment of each corresponding 3D proposal box, leading to a complete sparse paradigm. The proposal features and appearance features are both taken in data association process in a multi-hypotheses 3D multi-object tracking approach. Extensive experiments on nuScenes dataset demonstrate the effectiveness of our proposed SRCN3D detector and tracker. Code is available at https://github.com/synsin0/SRCN3D.
翻译:移动物体的探测和跟踪(DATMO)是自动驾驶环境感知的一个基本组成部分。虽然使用环视相机的3D探测器正在蓬勃发展,但越来越倾向于使用不同的变压器方法从 2D 视野地貌图中学习3D空间的查询。本文提议使用Sprass R-CNN 3D(SRCN3D),这是一个新的双阶段全面进化绘图管道,用于环视摄像机的探测和跟踪。SRCN3D采用一个双轨结构,对提议框和提议潜在特征进行双轨更新。预计提案框将查看,以汇总利益区(ROI)的地方特征。在此基础上,提案特征通过动态实例互动头进行改进,然后生成分类和对原始捆绑框的抵消。与以前的艺术相比,我们稀少的特征取样模块仅利用本地的2D功能对每个相应的3D建议框进行调整,从而形成完全稀薄的范式。在数据组合过程中,提议设置的特征和外貌特征均在多子CNN0/R3号多点跟踪方法中采用。在目前可用的数据SDSmSmcrod轨道上进行广泛的测试。