Multi-object tracking (MOT) is one of the most challenging tasks in computer vision, where it is important to correctly detect objects and associate these detections across frames. Current approaches mainly focus on tracking objects in each frame of a video stream, making it almost impossible to run the model under conditions of limited computing resources. To address this issue, we propose StableTrack, a novel approach that stabilizes the quality of tracking on low-frequency detections. Our method introduces a new two-stage matching strategy to improve the cross-frame association between low-frequency detections. We propose a novel Bbox-Based Distance instead of the conventional Mahalanobis distance, which allows us to effectively match objects using the Re-ID model. Furthermore, we integrate visual tracking into the Kalman Filter and the overall tracking pipeline. Our method outperforms current state-of-the-art trackers in the case of low-frequency detections, achieving $\textit{11.6%}$ HOTA improvement at $\textit{1}$ Hz on MOT17-val, while keeping up with the best approaches on the standard MOT17, MOT20, and DanceTrack benchmarks with full-frequency detections.
翻译:多目标跟踪是计算机视觉领域最具挑战性的任务之一,其核心在于准确检测目标并在视频帧间实现检测结果的稳定关联。现有方法主要聚焦于对视频流每一帧中的目标进行跟踪,导致在计算资源受限条件下几乎无法运行模型。为解决这一问题,我们提出StableTrack,一种在低频检测条件下稳定跟踪质量的新方法。该方法引入了一种新颖的两阶段匹配策略,以改善低频检测结果在跨帧关联中的表现。我们提出了一种基于边界框的距离度量以替代传统的马氏距离,从而能够利用重识别模型有效匹配目标。此外,我们将视觉跟踪技术整合到卡尔曼滤波器及整体跟踪流程中。在低频检测场景下,本方法显著优于当前最先进的跟踪器,在MOT17验证集上以1Hz频率实现HOTA指标11.6%的提升,同时在标准MOT17、MOT20和DanceTrack基准测试中,于全频率检测条件下仍保持与最优方法相当的性能。