Contrastive learning methods have significantly narrowed the gap between supervised and unsupervised learning on computer vision tasks. In this paper, we explore their application to geo-located datasets, e.g. remote sensing, where unlabeled data is often abundant but labeled data is scarce. We first show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks. To close the gap, we propose novel training methods that exploit the spatio-temporal structure of remote sensing data. We leverage spatially aligned images over time to construct temporal positive pairs in contrastive learning and geo-location to design pre-text tasks. Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing. Moreover, we demonstrate that the proposed method can also be applied to geo-tagged ImageNet images, improving downstream performance on various tasks. Project Webpage can be found at this link geography-aware-ssl.github.io.
翻译:在本文中,我们探索了这些数据应用于地理定位数据集,例如遥感,其中未贴标签的数据往往很多,但标签数据很少。我们首先表明,由于其不同的特点,在标准基准的对比性和监督性学习之间仍然存在非三角差距。为了缩小差距,我们提出了利用遥感数据空间时空结构的新培训方法。我们利用空间对齐图像,在对比性学习和地理定位中构建时间正对,设计文本前任务。我们的实验表明,我们拟议的方法缩小了在图像分类、对象探测和遥感语义分化方面的对比性和监督性学习之间的差距。此外,我们还表明,拟议的方法也可以应用于地理标记图像网络图像,改进各种任务的下游性能。项目网页可以在地理-觉悟-sl.github.io链接中找到。