Self-supervised methods have shown tremendous success in the field of computer vision, including applications in remote sensing and medical imaging. Most popular contrastive-loss based methods like SimCLR, MoCo, MoCo-v2 use multiple views of the same image by applying contrived augmentations on the image to create positive pairs and contrast them with negative examples. Although these techniques work well, most of these techniques have been tuned on ImageNet (and similar computer vision datasets). While there have been some attempts to capture a richer set of deformations in the positive samples, in this work, we explore a promising alternative to generating positive examples for remote sensing data within the contrastive learning framework. Images captured from different sensors at the same location and nearby timestamps can be thought of as strongly augmented instances of the same scene, thus removing the need to explore and tune a set of hand crafted strong augmentations. In this paper, we propose a simple dual-encoder framework, which is pre-trained on a large unlabeled dataset (~1M) of Sentinel-1 and Sentinel-2 image pairs. We test the embeddings on two remote sensing downstream tasks: flood segmentation and land cover mapping, and empirically show that embeddings learnt from this technique outperform the conventional technique of collecting positive examples via aggressive data augmentations.
翻译:自我监督的方法在计算机视觉领域,包括遥感和医疗成像的应用方面都取得了巨大成功。最受欢迎的以对比性损失为基础的方法,如SimCLR、MoCo、Moco-v2, 使用相同图像的多重视图,在图像上应用配置增强功能来创建正对,并将它们与负面实例相对照。虽然这些技术行之有效,但大多数这些技术都是在图像网(和类似的计算机视觉数据集)上调整的。在这项工作中,我们尝试在正面样本中捕捉更丰富的变形。我们探索了一种有希望的替代方法,以在对比性学习框架内生成遥感数据的积极范例。从同一地点的不同传感器和附近的时标上采集的图像可以被视为在同一场景中大大增强实例,从而消除了探索和调整一组手工制作的强大增强功能的必要性。在本文中,我们提出了一个简单的简单的双编码框架,在大规模无标签的数据集(~1M)上进行了预先培训,以取代在对比性学习框架中生成的遥感数据。我们测试了从这一摄像系中进行两次遥感和实验性地压式的地面图案。