In this paper, we propose a self-supervised RGB-T tracking method. Different from existing deep RGB-T trackers that use a large number of annotated RGB-T image pairs for training, our RGB-T tracker is trained using unlabeled RGB-T video pairs in a self-supervised manner. We propose a novel cross-input consistency-based self-supervised training strategy based on the idea that tracking can be performed using different inputs. Specifically, we construct two distinct inputs using unlabeled RGB-T video pairs. We then track objects using these two inputs to generate results, based on which we construct our cross-input consistency loss. Meanwhile, we propose a reweighting strategy to make our loss function robust to low-quality training samples. We build our tracker on a Siamese correlation filter network. To the best of our knowledge, our tracker is the first self-supervised RGB-T tracker. Extensive experiments on two public RGB-T tracking benchmarks demonstrate that the proposed training strategy is effective. Remarkably, despite training only with a corpus of unlabeled RGB-T video pairs, our tracker outperforms seven supervised RGB-T trackers on the GTOT dataset.
翻译:在本文中,我们提出一种自我监督的RGB-T跟踪方法。不同于使用大量附加说明的 RGB-T 图像配对进行训练的现有深层 RGB-T 跟踪器,我们的RGB-T 跟踪器以自我监督的方式使用未贴标签的 RGB-T 视频配对进行训练。我们提出一种新的跨投入一致性自监督的自我监督培训战略,其依据的理念是,可以使用不同的投入进行跟踪。具体地说,我们用未贴标签的 RGB-T 视频配对来建立两个不同的输入器。然后我们用这两个输入来跟踪对象来产生结果,以此为基础构建我们的交叉投入一致性损失。与此同时,我们提议了一项重新加权战略,以便使我们的损失功能对低质量的培训样本具有强大力。我们用一个Siamse相关过滤器来建立我们的追踪器。根据我们的知识,我们的追踪器是第一个自我监督的RGB-T 跟踪器。在两个公开的 RGB-T 跟踪基准上进行的广泛实验表明,拟议的培训战略是有效的。值得注意的是,尽管我们仅用一个已监管的RGB 的轨道对7 的 RGB 数据进行训练。