Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to $\uparrow$ 7\%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to $\uparrow$ 14\%) and semantic segmentation.
翻译:大型视觉模型的未经监督的训练前方法显示,开发类似的卫星图像技术提供了重要的机会,因为无标签数据是丰富的,固有的时间和多光谱结构为进一步改进现有的训练前战略提供了途径。本文介绍SatMAE,这是基于蒙面自动编码器(MAE)的时光或多光谱卫星图像培训前框架。为了利用时间信息,我们包括一个时间嵌入和独立遮蔽图像的间隔时间。此外,我们证明将多光谱数据编码为带不同光谱位置编码的带群是有益的。我们的方法大大改进了以前的先进技术,既包括基准数据集的监督学习业绩(最高达7美元),也包括下游遥感任务的转移学习业绩,包括土地覆盖分类(最高达14美元)和语义分割。