Prior research on self-supervised learning has led to considerable progress on image classification, but often with degraded transfer performance on object detection. The objective of this paper is to advance self-supervised pretrained models specifically for object detection. Based on the inherent difference between classification and detection, we propose a new self-supervised pretext task, called instance localization. Image instances are pasted at various locations and scales onto background images. The pretext task is to predict the instance category given the composited images as well as the foreground bounding boxes. We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning. In addition, we propose an augmentation method on the bounding boxes to further enhance the feature alignment. As a result, our model becomes weaker at Imagenet semantic classification but stronger at image patch localization, with an overall stronger pretrained model for object detection. Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection on PASCAL VOC and MSCOCO.
翻译:关于自我监督学习的先前研究在图像分类方面取得了相当大的进展,但往往随着在物体探测上的转移性能降低而导致在图像分类上取得了显著进展。本文件的目的是推进专门用于物体探测的自我监督的预先训练模型。基于分类和探测之间的内在差异,我们提出一个新的自我监督的托辞任务,称为实例本地化。图像实例在不同地点和尺度上贴在背景图像上。借口的任务是预测根据合成图像以及前方框的组合图案类别。我们显示,将捆绑盒纳入培训前的训练有助于更好地任务调整和结构调整以进行转移学习。此外,我们提议在捆绑盒上采用一种增强方法,以进一步加强特征的一致性。结果,我们的模型在图像网络语义分类中变弱,但在图像补接合定位上变强,而总体的事先培训前的物体探测模型则更强。实验结果显示,我们的方法产生了在PASAL VOC和MCCO物体探测方面最先进的转移学习结果。