Multiple object tracking (MOT) is the task containing detection and association. Plenty of trackers have achieved competitive performance. Unfortunately, for the lack of informative exchange on these subtasks, they are often biased toward one of the two and remain underperforming in complex scenarios, such as the expected false negatives and mistaken trajectories of targets when passing each other. In this paper, we propose TransFiner, a transformer-based post-refinement approach for MOT. It is a generic attachment framework that leverages the images and tracking results (locations and class predictions) from the original tracker as inputs, which are then used to launch TransFiner powerfully. Moreover, TransFiner depends on query pairs, which produce pairs of detection and motion through the fusion decoder and achieve comprehensive tracking improvement. We also provide targeted refinement by labeling query pairs according to different refinement levels. Experiments show that our design is effective, on the MOT17 benchmark, we elevate the CenterTrack from 67.8% MOTA and 64.7% IDF1 to 71.5% MOTA and 66.8% IDF1.
翻译:多重对象跟踪( MOT) 是包含检测和关联的任务 。 大量跟踪者已经实现了竞争性的绩效 。 不幸的是, 由于这些子任务缺乏信息交流, 他们往往偏向于其中之一, 在复杂的情景中表现不佳, 比如预期的假阴差和目标的错误轨迹, 当相互传递时, 我们建议使用基于变压器的变压器后精化方法 TransFinner 进行MOT 。 这是一个通用附加框架, 将图像和跟踪结果( 地点和类预测)从原始跟踪器中作为输入工具加以利用, 然后用它来有力发射 TransFinner 。 此外, TransFinner 依赖对查询对对配对, 后者通过聚合解码生成探测和移动配对, 并实现全面跟踪改进。 我们还提供有针对性的改进, 将查询对配对按不同精细度等级进行标签。 实验显示, 我们的设计在MOT17基准下是有效的, 我们把CentTracrack从67.8% MOTA和64. 7 % UNFTA MOTA 和66. 61.8 。