The presence of objects that are confusingly similar to the tracked target, poses a fundamental challenge in appearance-based visual tracking. Such distractor objects are easily misclassified as the target itself, leading to eventual tracking failure. While most methods strive to suppress distractors through more powerful appearance models, we take an alternative approach. We propose to keep track of distractor objects in order to continue tracking the target. To this end, we introduce a learned association network, allowing us to propagate the identities of all target candidates from frame-to-frame. To tackle the problem of lacking ground-truth correspondences between distractor objects in visual tracking, we propose a training strategy that combines partial annotations with self-supervision. We conduct comprehensive experimental validation and analysis of our approach on several challenging datasets. Our tracker sets a new state-of-the-art on six benchmarks, achieving an AUC score of 67.1% on LaSOT and a +5.8% absolute gain on the OxUvA long-term dataset.
翻译:与跟踪目标相近的物体的存在令人困惑,在以外观为基础的视觉跟踪中构成一个根本性挑战。这些分散物体很容易被错误地归类为目标本身,导致最终跟踪失败。虽然大多数方法都试图通过更强大的外观模型抑制分散物体,但我们采取了另一种办法。我们提议跟踪分散物体以便继续跟踪目标。为此,我们引入了一个学习式的联系网络,使我们能够从框架到框架传播所有目标候选人的身份。为了解决在视觉跟踪中分散物体之间缺乏地面真相对应的问题,我们提议了一项将部分说明与自我监督结合起来的培训战略。我们对若干挑战性数据集的方法进行全面的实验性验证和分析。我们的追踪器在六个基准上设置了新的状态,在LaSOT上达到AUC67.1%的分数,在OxUVA长期数据集上达到+5.8%的绝对收益。