Self-supervised pre-training and transformer-based networks have significantly improved the performance of object detection. However, most of the current self-supervised object detection methods are built on convolutional-based architectures. We believe that the transformers' sequence characteristics should be considered when designing a transformer-based self-supervised method for the object detection task. To this end, we propose SeqCo-DETR, a novel Sequence Consistency-based self-supervised method for object DEtection with TRansformers. SeqCo-DETR defines a simple but effective pretext by minimizes the discrepancy of the output sequences of transformers with different image views as input and leverages bipartite matching to find the most relevant sequence pairs to improve the sequence-level self-supervised representation learning performance. Furthermore, we provide a mask-based augmentation strategy incorporated with the sequence consistency strategy to extract more representative contextual information about the object for the object detection task. Our method achieves state-of-the-art results on MS COCO (45.8 AP) and PASCAL VOC (64.1 AP), demonstrating the effectiveness of our approach.
翻译:自我监督的训练前和变压器网络大大提高了物体探测的性能,但是,目前大多数自监督的物体探测方法都是建立在革命结构基础上的。我们认为,在设计一个基于变压器的自我监督的物体探测任务方法时,应当考虑变压器的序列特性。为此,我们提出了Seqco-DETR,这是与TRansexers进行物体探测的新颖的基于序列一致的自我监督自监督方法。Seqco-DETR定义了一个简单而有效的借口,即尽量减少具有不同图像观点的变压器输出序列的差异,作为投入,并利用双方匹配来找到最相关的序列对,以提高序列级的自我监督代表学习性能。此外,我们提供了一个基于掩码的增强战略,结合了序列一致性战略,以获取关于物体探测任务的物体更具代表性的背景资料。我们的方法在MS COCOCO (45.8 AP) 和 PCASCAL VOC (64.1 AP) 上取得了最先进的成果,展示了我们的MSCOL VOC (64.1 AP) 方法的有效性。</s>