The prior self-supervised learning researches mainly select image-level instance discrimination as pretext task. It achieves a fantastic classification performance that is comparable to supervised learning methods. However, with degraded transfer performance on downstream tasks such as object detection. To bridge the performance gap, we propose a novel object-level self-supervised learning method, called Contrastive learning with Downstream background invariance (CoDo). The pretext task is converted to focus on instance location modeling for various backgrounds, especially for downstream datasets. The ability of background invariance is considered vital for object detection. Firstly, a data augmentation strategy is proposed to paste the instances onto background images, and then jitter the bounding box to involve background information. Secondly, we implement architecture alignment between our pretraining network and the mainstream detection pipelines. Thirdly, hierarchical and multi views contrastive learning is designed to improve performance of visual representation learning. Experiments on MSCOCO demonstrate that the proposed CoDo with common backbones, ResNet50-FPN, yields strong transfer learning results for object detection.
翻译:先前的自我监督学习研究主要选择图像级实例歧视作为借口任务。 它取得了与监督的学习方法相似的极好分类性能。 但是,由于在诸如物体探测等下游任务上转移的性能退化,我们提议了一种新的目标级自我监督学习方法,称为“与下游背景差异的对比学习(CoDo ) 。 借口任务被转换为侧重于不同背景的实例位置建模,特别是下游数据集。 背景差异能力被视为对物体探测至关重要。 首先, 提出了数据增强战略, 将实例粘贴在背景图像上, 然后将捆绑框的性功能用于背景信息。 其次, 我们实施了培训前网络和主流探测管道之间的结构协调。 第三, 分级和多视角对比学习旨在改进视觉演示学习的性能。 对 MOCOCO的实验表明, 与共同骨干ResNet50-FPN 的拟议的CO 具有共同骨架, 产生强烈的转移学习结果, 用于物体探测。