This work proposes a hybrid unsupervised/supervised learning method to pretrain models applied in earth observation downstream tasks where only a handful of labels denoting very general semantic concepts are available. We combine a contrastive approach to pretrain models with a pretext task to predict spatially coarse elevation maps which are commonly available worldwide. The intuition behind is that there is generally some correlation between the elevation and targets in many remote sensing tasks, allowing the model to pre-learn useful representations. We assess the performance of our approach on a segmentation downstream task on labels gathering many possible subclasses (pixel level classification of farmlands vs. other) and an image binary classification task derived from the former, on a dataset on the north-east of Colombia. On both cases we pretrain our models with 39K unlabeled images, fine tune the downstream task only with 80 labeled images and test it with 2944 labeled images. Our experiments show that our methods, GLCNet+Elevation for segmentation and SimCLR+Elevation for classification, outperform their counterparts without the elevation pretext task in terms of accuracy and macro-average F1, which supports the notion that including additional information correlated to targets in downstream tasks can lead to improved performance.
翻译:本文提出了一种混合的无监督/监督学习方法,用于预训练在地球观测下游任务中应用的模型,其中只有一小部分标签表示非常一般的语义概念,我们结合对比方法和预处理任务来预测粗略的海拔地图,该地图通常在全球范围内都可用。其背后的直觉是,在许多遥感任务中,海拔和目标通常有一定的相关性,使模型能够预先学习有用的表示。我们在哥伦比亚东北部的数据集上评估了我们方法的性能,该数据集上进行了分割任务的子类(农田与其他的像素级分类)和从前者派生的图像二元分类任务。在这两种情况下,我们使用39K个未标记的图像对模型进行预训练,并仅使用80个标记的图像对下游任务进行微调,并使用2944个标记的图像进行测试。我们的实验表明,我们的方法GLCNet+Elevation用于分割和SimCLR+Elevation用于分类,相较于没有海拔预处理任务的对应方法,在准确性和宏平均F1方面表现更好,这支持了包括与下游任务相关的附加信息可能会导致性能提高的观点。