Online 3D multi-object tracking (MOT) has witnessed significant research interest in recent years, largely driven by demand from the autonomous systems community. However, 3D offline MOT is relatively less explored. Labeling 3D trajectory scene data at a large scale while not relying on high-cost human experts is still an open research question. In this work, we propose Batch3DMOT which follows the tracking-by-detection paradigm and represents real-world scenes as directed, acyclic, and category-disjoint tracking graphs that are attributed using various modalities such as camera, LiDAR, and radar. We present a multi-modal graph neural network that uses a cross-edge attention mechanism mitigating modality intermittence, which translates into sparsity in the graph domain. Additionally, we present attention-weighted convolutions over frame-wise k-NN neighborhoods as suitable means to allow information exchange across disconnected graph components. We evaluate our approach using various sensor modalities and model configurations on the challenging nuScenes and KITTI datasets. Extensive experiments demonstrate that our proposed approach yields an overall improvement of 3.3% in the AMOTA score on nuScenes thereby setting the new state-of-the-art for 3D tracking and further enhancing false positive filtering.
翻译:近些年来,对3D在线多目标跟踪(MOT)的研究兴趣很大,主要是由自主系统社区的需求驱动的。然而,3D离线MOT的探索相对较少。将3D轨迹数据大规模贴上三D轨迹数据而不依赖高成本的人类专家仍然是一个开放的研究问题。在这项工作中,我们建议Batch3DMOT遵循跟踪逐次检测模式,并代表真实世界的场景,以定向、循环和分级跟踪图为主。我们使用摄影机、LIDAR和雷达等不同模式对真实世界的场景进行了评估。我们展示了多式图神经网络,使用交叉关注机制缓解交错模式,这在图形领域转化成松散。此外,我们提出在框架型 k-NNN 邻区上对关注加权的演进,作为在断开的图形组件中进行信息交流的适当手段。我们用各种传感器模式和模型配置对具有挑战性的nuScenes和KITTI数据集进行了评估。我们提出的广泛实验表明,我们提出的办法将利用交叉关注机制的神经神经模型进一步改进了A-centalTA的3.MTA的3.S-creal 3-C-C-C-C-C-C-C-C-C-BARBAR3MTA。