As an important area in computer vision, object tracking has formed two separate communities that respectively study Single Object Tracking (SOT) and Multiple Object Tracking (MOT). However, current methods in one tracking scenario are not easily adapted to the other due to the divergent training datasets and tracking objects of both tasks. Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking. In this work, we present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm. A track transformer is developed in our UTT to track the target in both SOT and MOT. The correlation between the target and tracking frame features is exploited to localize the target. We demonstrate that both SOT and MOT tasks can be solved within this framework. The model can be simultaneously end-to-end trained by alternatively optimizing the SOT and MOT objectives on the datasets of individual tasks. Extensive experiments are conducted on several benchmarks with a unified model trained on SOT and MOT datasets. Code will be available at https://github.com/Flowerfan/Trackron.
翻译:作为计算机愿景的一个重要领域,物体跟踪已经形成两个不同的社区,分别研究单一物体跟踪(SOT)和多物体跟踪(MOT),分别研究单一物体跟踪(SOT)和多物体跟踪(MOT)。然而,由于两项任务的培训数据集和跟踪对象各不相同,目前的一种跟踪设想中的方法不易与另一种方法相适应。尽管UniTrack\cite{wang2021差异}表明可以使用多头共同的外观模型来处理个别跟踪任务,但是它未能利用大规模跟踪数据集进行培训,在单一物体跟踪方面表现不佳。在这项工作中,我们介绍统一变换跟踪器(UTT),以解决不同情景中的问题。在我们的UTT中开发了跟踪变轨器,以跟踪两个任务的目标数据集。目标与跟踪框架特征之间的关联性被利用来将目标本地化。我们证明,可以在这一框架内解决SOT和MOT的任务。该模型可以通过优化单个任务数据集上的SOT和MOT目标,在几个基准上进行广泛的实验,在SOT/MCRock/Mset将进行。