Change detection in remote sensing imagery is essential for a variety of applications such as urban planning, disaster management, and climate research. However, existing methods for identifying semantically changed areas overlook the availability of semantic information in the form of existing maps describing features of the earth's surface. In this paper, we leverage this information for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Motivated by this observation, we propose the new task of Conditional Change Detection, where pre-change semantic information is used as input next to bi-temporal images. To fully exploit the extra information, we propose MapFormer, a novel architecture based on a multi-modal feature fusion module that allows for feature processing conditioned on the available semantic information. We further employ a supervised, cross-modal contrastive loss to guide the learning of visual representations. Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we demonstrate the robustness of our approach to the quality of the pre-change semantic information and the absence pre-change imagery. The code will be made publicly available.
翻译:变更检测在遥感图像中是各种应用领域(如城市规划、灾难管理和气候研究)中不可或缺的,但是现有方法在识别语义变化区域时忽略地表现有地图所提供的语义信息的可用性。本文利用这些信息来检测双时相图像中的变化。我们展示了,通过使用连接潜在表示来简单地整合附加信息,就足以显着优于现有的变化检测方法。根据这个观察结果,我们提出了新任务的有条件变化检测,其中除了双时相图像外,预变更语义信息也被用作输入。为了充分利用附加信息,我们提出了MapFormer,这是一种基于多模态特征融合模块的新型架构,它允许在可用语义信息的条件下进行特征处理。我们还采用了一种有监督的跨模型对比损失来指导视觉表示的学习。我们的方法在Dynamic EarthNet 和 HRSCD 上的二元变更IoU方面分别绝对优于现有的变更检测方法11.7% 和 18.4%。此外,我们展示了我们的方法对预变更语义信息的质量和预变更影像的缺失的健壮性。代码将公开发布。