Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects across video frames. Detection boxes serve as the basis of both 2D and 3D MOT. The inevitable changing of detection scores leads to object missing after tracking. We propose a hierarchical data association strategy to mine the true objects in low-score detection boxes, which alleviates the problems of object missing and fragmented trajectories. The simple and generic data association strategy shows effectiveness under both 2D and 3D settings. In 3D scenarios, it is much easier for the tracker to predict object velocities in the world coordinate. We propose a complementary motion prediction strategy that incorporates the detected velocities with a Kalman filter to address the problem of abrupt motion and short-term disappearing. ByteTrackV2 leads the nuScenes 3D MOT leaderboard in both camera (56.4% AMOTA) and LiDAR (70.1% AMOTA) modalities. Furthermore, it is nonparametric and can be integrated with various detectors, making it appealing in real applications. The source code is released at https://github.com/ifzhang/ByteTrack-V2.
 翻译:多目标跟踪(MOT)旨在跨视频帧估计物体的边界框和身份。检测框作为2D和3D MOT的基础。由于检测分数的不可避免的变化,跟踪后导致对象丢失。我们提出了一种分层数据关联策略,以挖掘低分检测框中的真实对象,从而缓解了对象丢失和碎片轨迹的问题。简单通用的数据关联策略在2D和3D环境下都显示出效果。在3D场景中,跟踪器更容易预测世界坐标下的对象速度。我们提出了一种补充运动预测策略,将检测到的速度与卡尔曼滤波器相结合,以解决突然运动和短期消失的问题。ByteTrackV2在nuScenes的摄像头(56.4% AMOTA)和LiDAR(70.1% AMOTA)模态中领先于3D MOT排行榜。此外,它是非参数的,并且可以与各种检测器集成,在实际应用中具有吸引力。源代码发布在https://github.com/ifzhang/ByteTrack-V2。