Self-supervised pretraining has been shown to yield powerful representations for transfer learning. These performance gains come at a large computational cost however, with state-of-the-art methods requiring an order of magnitude more computation than supervised pretraining. We tackle this computational bottleneck by introducing a new self-supervised objective, contrastive detection, which tasks representations with identifying object-level features across augmentations. This objective extracts a rich learning signal per image, leading to state-of-the-art transfer accuracy on a variety of downstream tasks, while requiring up to 10x less pretraining. In particular, our strongest ImageNet-pretrained model performs on par with SEER, one of the largest self-supervised systems to date, which uses 1000x more pretraining data. Finally, our objective seamlessly handles pretraining on more complex images such as those in COCO, closing the gap with supervised transfer learning from COCO to PASCAL.
翻译:自我监督的训练前阶段显示,这些业绩收益的计算成本很高,但最先进的方法比受监督的训练前阶段更需要数量级的计算。我们通过采用新的自我监督的、对比式的检测方法来解决这一计算瓶颈问题,这种检测方法要求通过辨别跨扩增的物体级特征进行演示。这个目标每幅图像产生丰富的学习信号,导致各种下游任务的最先进的传输准确性,同时需要10x的预培训。特别是,我们最强大的图像网络预设模型与SEER(SEAR)相同,SEER是迄今为止最大的自我监督系统之一,它使用1,000x更多的预培训数据。最后,我们的目标无缝地处理对更复杂的图像如COCOCO(CCO)的预培训,通过监督从COCO(CO)到PASAL(PACAL)的转移来缩小差距。