3D multi-object tracking (MOT) is a key problem for autonomous vehicles, required to perform well-informed motion planning in dynamic environments. Particularly for densely occupied scenes, associating existing tracks to new detections remains challenging as existing systems tend to omit critical contextual information. Our proposed solution, InterTrack, introduces the Interaction Transformer for 3D MOT to generate discriminative object representations for data association. We extract state and shape features for each track and detection, and efficiently aggregate global information via attention. We then perform a learned regression on each track/detection feature pair to estimate affinities, and use a robust two-stage data association and track management approach to produce the final tracks. We validate our approach on the nuScenes 3D MOT benchmark, where we observe significant improvements, particularly on classes with small physical sizes and clustered objects. As of submission, InterTrack ranks 1st in overall AMOTA among methods using CenterPoint detections.
翻译:3D 多目标跟踪(MOT)是自主车辆的一个关键问题,在动态环境中进行知情的运动规划需要这种工具。 特别是对于密集的场景,将现有轨道与新探测联系起来仍然具有挑战性,因为现有系统往往省略关键背景信息。我们提议的解决方案“InterTrack”为3DMOT引入了互动变换器,以产生数据协会的歧视性对象表征。我们为每个轨道和探测提取了状态和形状特征,并通过关注有效地汇总了全球信息。我们随后对每对轨道/探测特征进行学习回归,以估计亲近性,并使用稳健的两阶段数据联系和跟踪管理方法来生成最终轨道。我们验证了我们在NusScenes 3DMOT基准上的做法,我们在那里观察到了显著的改进,特别是在小物理大小和集成物体的类别上。作为提交材料,InterTracrack在使用Centpoint探测方法的总体 AMOTA中排名第一。