Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning performance on benchmark datasets (up to $\uparrow$ 7%), and transfer learning performance on downstream remote sensing tasks, including land cover classification (up to $\uparrow$ 14%) and semantic segmentation. Code and data are available on the project website: https://sustainlab-group.github.io/SatMAE/
翻译:大型视觉模型的未经监督的训练前方法显示,大型视觉模型的训练前方法可以提高下游监督任务的业绩。开发类似的卫星图像技术提供了重要的机会,因为未贴标签的数据非常丰富,而固有的时间和多光谱结构提供了进一步改进现有训练前战略的途径。本文介绍SatMAE,这是基于蒙面自动编码器(MAE)的时光或多光谱卫星图像的训练前框架。为了利用时间信息,我们包括一个时间嵌入与独立遮蔽图像的时空连接。此外,我们证明将多光谱数据作为带群进行编码并带有不同的光谱位置编码是有益的。我们的方法大大改进了以前的先进技术,既包括基准数据集(最高达7%美元)的有监督的学习业绩,也包括下游遥感任务的转移学习业绩,包括土地覆盖分类(最高达14%)和语系分化。项目网站上有代码和数据:https://sustainlab-group.github.io/SatMAE/Me/droad/d