Multiple object tracking (MOT) is the task containing detection and association. Plenty of trackers have achieved competitive performance. Unfortunately, for the lack of informative exchange on these subtasks, they are often biased toward one of the two and underperform in complex scenarios, such as the inevitable misses and mistaken trajectories of targets when tracking individuals within a crowd. This paper proposes TransFiner, a transformer-based approach to post-refining MOT. It is a generic attachment framework that depends on query pairs, the bridge between an original tracker and TransFiner. Each query pair, through the fusion decoder, produces refined detection and motion clues for a specific object. Before that, they are feature-aligned and group-labeled under the guidance of tracking results (locations and class predictions) from the original tracker, finishing tracking refinement with focus and comprehensively. Experiments show that our design is effective, on the MOT17 benchmark, we elevate the CenterTrack from 67.8% MOTA and 64.7% IDF1 to 71.5% MOTA and 66.8% IDF1.
翻译:多重对象跟踪( MOT) 是包含检测和关联的任务 。 许多跟踪者已经取得了竞争性的性能 。 不幸的是, 由于缺乏关于这些子任务的信息交流, 它们往往偏向于其中之一, 在复杂的情景中表现不佳, 比如在跟踪人群中的个人时, 目标的必然误差和误差轨迹。 本文提出了 TransFinner, 一种基于变压器的方法, 用于修复后MOT。 它是一个通用附加框架, 取决于查询对比、 原始跟踪者与 TransFinner之间的桥梁。 每一个查询对, 通过聚合解码器, 产生精细的探测和运动线索。 在此之前, 它们在原始跟踪结果( 定位和类预测) 的指导下, 以焦点和全面的方式完成跟踪完善。 实验显示, 我们的设计在MOT17基准上是有效的, 我们把CentTracrack从67.8% MOTA和64. 7% UNFD1提升为7.5% MOTA和 66.8% UNFIF 1 。