To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector.
翻译:为了在任何特定时间跟踪其他交通参与者的三维位置和轨迹,现代自主车辆配备了覆盖车辆整个周围的多摄像头。然而,基于摄像头的三维物体跟踪方法优先优化单摄像头设置,并在多摄像头设置中采用热后聚变。在本文中,我们提出了一个全景三维物体跟踪方法,称为CC-3DTT,即联营者和模型对象轨迹在时间和跨视图上都具有时间和跨视图,并提高了跟踪的整体一致性。特别是,我们的方法将多个相机在相关之前的三维探测引信连接起来,显著减少身份开关,并改进运动模型。我们在大型驾驶数据集上的实验显示,在连接之前的聚变将带来与电后聚变的较大改进幅度。我们在具有竞争力的 Nuscenes 3D 跟踪基准上所有基于摄像头的方法(AMOTA)中设定了12.6%的平均多点跟踪精确度改进率(AMOTA),比AMOTA以前公布的6.5%的AMOTA和同一3D探测器所用方法要高出6.。