In this paper, we propose a new joint object detection and tracking (JoDT) framework for 3D object detection and tracking based on camera and LiDAR sensors. The proposed method, referred to as 3D DetecTrack, enables the detector and tracker to cooperate to generate a spatio-temporal representation of the camera and LiDAR data, with which 3D object detection and tracking are then performed. The detector constructs the spatio-temporal features via the weighted temporal aggregation of the spatial features obtained by the camera and LiDAR fusion. Then, the detector reconfigures the initial detection results using information from the tracklets maintained up to the previous time step. Based on the spatio-temporal features generated by the detector, the tracker associates the detected objects with previously tracked objects using a graph neural network (GNN). We devise a fully-connected GNN facilitated by a combination of rule-based edge pruning and attention-based edge gating, which exploits both spatial and temporal object contexts to improve tracking performance. The experiments conducted on both KITTI and nuScenes benchmarks demonstrate that the proposed 3D DetecTrack achieves significant improvements in both detection and tracking performances over baseline methods and achieves state-of-the-art performance among existing methods through collaboration between the detector and tracker.
翻译:在本文中,我们提出一个新的基于相机和激光雷达传感器的3D物体探测和跟踪联合物体探测和跟踪框架(JoDT),建议的方法称为3D DetecTrack,使探测器和跟踪器能够合作生成相机和激光雷达数据的时空空间代表,然后进行3D物体探测和跟踪。探测器通过对相机和激光雷达聚合获得的空间特征进行加权时间汇总,构建了空间时空特征。然后,探测器利用在上一个时间步骤之前保持的轨道信息,对初步检测结果进行了重新配置。根据探测器生成的轨迹时空特征,跟踪器能够利用图形神经网络(GNNN)将所探测到的物体与先前跟踪的物体连接起来。我们设计了一个完全连接的GNNN,通过基于规则的边缘根边线和以注意力为基础的边缘定位组合,利用空间和时间对象环境来改进跟踪性能。在轨迹上进行的实验,在探测器和NEOSc-S-Scentes之间实现了重要的检测跟踪性能基准,通过现有的检测方法,通过现有的探测跟踪方法,在基准和D-Rack-S-S-S-Strades-S-S-S-S-S-S-S-S-S-S-S-S-Star-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-