We present a self-supervised learning (SSL) method suitable for semi-global tasks such as object detection and semantic segmentation. We enforce local consistency between self-learned features, representing corresponding image locations of transformed versions of the same image, by minimizing a pixel-level local contrastive (LC) loss during training. LC-loss can be added to existing self-supervised learning methods with minimal overhead. We evaluate our SSL approach on two downstream tasks -- object detection and semantic segmentation, using COCO, PASCAL VOC, and CityScapes datasets. Our method outperforms the existing state-of-the-art SSL approaches by 1.9% on COCO object detection, 1.4% on PASCAL VOC detection, and 0.6% on CityScapes segmentation.
翻译:我们提出了一个适合半全球任务(如物体探测和语义分离)的自监督学习方法。我们通过在培训期间尽量减少像素级地方对比(LC)损失,加强自我学习特征之间的地方一致性,代表同一图像变异版本的相应图像位置。LC损失可以添加到现有的自监督学习方法中,而管理费用最低。我们用COCO、PASCAL VOC和CityScapes数据集评估我们的SSL在两个下游任务(物体探测和语义分离)上的方法。我们的方法比COCO的物体探测现有最先进的SSL方法高出1.9%,PASCAL VOC探测为1.4%, CityScapection为0.6%。