In recent years Convolutional neural networks (CNN) have made significant progress in computer vision. These advancements have been applied to other areas, such as remote sensing and have shown satisfactory results. However, the lack of large labeled datasets and the inherent complexity of remote sensing problems have made it difficult to train deep CNNs for dense prediction problems. To solve this issue, ImageNet pretrained weights have been used as a starting point in various dense predictions tasks. Although this type of transfer learning has led to improvements, the domain difference between natural and remote sensing images has also limited the performance of deep CNNs. On the other hand, self-supervised learning methods for learning visual representations from large unlabeled images have grown substantially over the past two years. Accordingly, in this paper we have explored the effectiveness of in-domain representations in both supervised and self-supervised forms to solve the domain difference between remote sensing and the ImageNet dataset. The obtained weights from remote sensing images are utilized as initial weights for solving semantic segmentation and object detection tasks and state-of-the-art results are obtained. For self-supervised pre-training, we have utilized the SimSiam algorithm as it is simple and does not need huge computational resources. One of the most influential factors in acquiring general visual representations from remote sensing images is the pre-training dataset. To examine the effect of the pre-training dataset, equal-sized remote sensing datasets are used for pre-training. Our results have demonstrated that using datasets with a high spatial resolution for self-supervised representation learning leads to high performance in downstream tasks.
翻译:近些年来, Convolution 神经网络(CNN)在计算机视觉方面取得了显著进步,这些进步已应用于遥感等其他领域,并显示出令人满意的结果;然而,缺乏大量标签数据集,遥感问题固有的复杂性使得难以对深层次CNN进行密集的预测问题的培训。为了解决这一问题,图像网预先培训的重量被用作各种密集的预测任务的一个起点。虽然这种传输学习导致改进,但自然图像和遥感图像之间的域差也限制了深层CNN的性能。另一方面,在过去两年里,从大型无标签图像中学习视觉表现的自上而下的学习方法有了大幅增长。因此,在本文中,我们探索了在监管和自上下两方面进行现场展示的有效性,以解决遥感和图像网络数据集之间的域差。从遥感图像前的权重被利用,作为解决语系和对象探测任务的初步权重和状态结果。对于从大型无标签图像显示的视觉表现,我们使用最高级的Simerverial 数据显示,我们使用最高级的直观性数据图象学结果。