In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end multi-object tracking with a pretrained object detector. Existing end-to-end methods, e.g. MOTR and TrackFormer, are inferior to their tracking-by-detection counterparts mainly due to their poor detection performance. We aim to improve MOTR by elegantly incorporating an extra object detector. We first adopt the anchor formulation of queries and then use an extra object detector to generate proposals as anchors, providing detection prior to MOTR. The simple modification greatly eases the conflict between joint learning detection and association tasks in MOTR. MOTRv2 keeps the end-to-end feature and scales well on large-scale benchmarks. MOTRv2 ranks the 1st place (73.4% HOTA on DanceTrack) in the 1st Multiple People Tracking in Group Dance Challenge. Moreover, MOTRv2 achieves state-of-the-art performance on BDD100K dataset. We hope this simple and effective pipeline can provide some new insights to the end-to-end MOT community. Code is available at \url{https://github.com/megvii-research/MOTRv2}.
翻译:在本文中,我们提议MOTRv2, 这是一种简单而有效的管道,用于与预先训练的物体探测器一起对端到端多物体进行跟踪。现有的端到端方法,例如MOTR和ThackFormer,主要由于检测性能差,不如其逐次检测的对应方法。我们的目标是通过优雅地增加一个额外的物体探测器来改进MOTR。我们首先采用查询的锚式配方,然后使用额外的物体探测器作为锚,在MOTR之前提供检测。简单的修改大大缓解了MOTR. MOTRv2中联合学习探测和联系任务之间的冲突。在大型基准上保持端到端的特征和比例。MOTRv2排在第1次多人跟踪小组舞蹈挑战中排名第1位(73.4% HOTA on danceTrack)。此外, MOTRv2在BDD100K数据集上实现了状态的艺术性表现。我们希望这种简单而有效的管道能够为末端MOT社区提供一些新的洞见度。代码可以在MOT.MRAG2/MAGM. amgregream@ams.