This work proposes a hybrid unsupervised/supervised learning method to pretrain models applied in earth observation downstream tasks where only a handful of labels denoting very general semantic concepts are available. We combine a contrastive approach to pretrain models with a pretext task to predict spatially coarse elevation maps which are commonly available worldwide. The intuition behind is that there is generally some correlation between the elevation and targets in many remote sensing tasks, allowing the model to pre-learn useful representations. We assess the performance of our approach on a segmentation downstream task on labels gathering many possible subclasses (pixel level classification of farmlands vs. other) and an image binary classification task derived from the former, on a dataset on the north-east of Colombia. On both cases we pretrain our models with 39K unlabeled images, fine tune the downstream task only with 80 labeled images and test it with 2944 labeled images. Our experiments show that our methods, GLCNet+Elevation for segmentation and SimCLR+Elevation for classification, outperform their counterparts without the elevation pretext task in terms of accuracy and macro-average F1, which supports the notion that including additional information correlated to targets in downstream tasks can lead to improved performance.
翻译:本文提出了一种混合无监督/有监督学习方法,用于预训练应用于地球观测的模型,其中只有少数标签表示非常一般的语义概念。我们结合对比方法预训练模型,预测通常在全球范围内可用的空间粗高程图的预文本任务。其背后的直觉是,在许多遥感任务中,高程和目标通常存在某种相关性,使模型能够预先学习有用的表示形式。我们在哥伦比亚东北部的数据集上评估了我们方法的性能,对标签进行了许多可能的子类别(远mland的像素级分类与其他),并从前者中派生了一个图像二进制分类任务。在这两种情况下,我们用39K个未标记的图像预先训练我们的模型,在只有80个标记的图像的情况下对下游任务进行了微调,并使用2944个标记的图像进行了测试。我们的实验表明,我们的方法,在分割和分类任务中,GLCNet + Elevation和SimCLR + Elevation在准确性和平均宏F1方面优于没有高程预文本任务的对应物,这支持一个观点,即在下游任务中包括与目标相关的额外信息可以带来改进的性能。