In this paper, we propose MOTRv2, a simple yet effective pipeline to bootstrap end-to-end multi-object tracking with a pretrained object detector. Existing end-to-end methods, MOTR and TrackFormer are inferior to their tracking-by-detection counterparts mainly due to their poor detection performance. We aim to improve MOTR by elegantly incorporating an extra object detector. We first adopt the anchor formulation of queries and then use an extra object detector to generate proposals as anchors, providing detection prior to MOTR. The simple modification greatly eases the conflict between joint learning detection and association tasks in MOTR. MOTRv2 keeps the query propogation feature and scales well on large-scale benchmarks. MOTRv2 ranks the 1st place (73.4% HOTA on DanceTrack) in the 1st Multiple People Tracking in Group Dance Challenge. Moreover, MOTRv2 reaches state-of-the-art performance on the BDD100K dataset. We hope this simple and effective pipeline can provide some new insights to the end-to-end MOT community. Code is available at \url{https://github.com/megvii-research/MOTRv2}.
翻译:在本文中,我们提出了MOTRv2,一种简单而有效的管道,通过预先训练的对象检测器引导端到端多目标跟踪。现有的端到端方法MOTR和TrackFormer由于其较差的检测性能而不如跟踪-检测方法。我们的目标是通过优雅地纳入额外的对象检测器来改进MOTR。我们首先采用查询的固定方式,然后使用额外的对象检测器生成提案作为锚点,提供MOTR的检测先验。这种简单的修改极大地减轻了MOTR中联合学习检测和关联任务之间的冲突。 MOTRv2保留了查询传播特征,并在大规模基准测试中具有良好的扩展性。 MOTRv2在DanceTrack上排名第一(73.4%的HOTA),位于人群舞蹈挑战赛第一名。此外,MOTRv2在BDD100K数据集上达到了最新的性能。我们希望这种简单有效的管道可以向端到端MOT社区提供一些新的见解。代码可在\url{https://github.com/megvii-research/MOTRv2}上找到。