Due to the availability of multi-modal remote sensing (RS) image archives, one of the most important research topics is the development of cross-modal RS image retrieval (CM-RSIR) methods that search semantically similar images across different modalities. Existing CM-RSIR methods require the availability of a high quality and quantity of annotated training images. The collection of a sufficient number of reliable labeled images is time consuming, complex and costly in operational scenarios, and can significantly affect the final accuracy of CM-RSIR. In this paper, we introduce a novel self-supervised CM-RSIR method that aims to: i) model mutual-information between different modalities in a self-supervised manner; ii) retain the distributions of modal-specific feature spaces similar to each other; and iii) define the most similar images within each modality without requiring any annotated training image. To this end, we propose a novel objective including three loss functions that simultaneously: i) maximize mutual information of different modalities for inter-modal similarity preservation; ii) minimize the angular distance of multi-modal image tuples for the elimination of inter-modal discrepancies; and iii) increase cosine similarity of the most similar images within each modality for the characterization of intra-modal similarities. Experimental results show the effectiveness of the proposed method compared to state-of-the-art methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/SS-CM-RSIR.
翻译:由于多式遥感图像档案的可用性,最重要的研究课题之一是开发跨模式的RS图像检索方法(CM-RSIR),这些方法可以以不同的方式搜索语义相似的图像;现有的CM-RSIR方法需要高质量和数量高的附加说明的培训图像;收集足够数量的可靠标签图像需要时间、复杂和昂贵的操作性情景,并会大大影响CM-RSIR的最后准确性。在本文中,我们采用了一种新的自我监督的CM-RSIR方法,目的是:i)以自我监督的方式在不同的模式之间建模相互信息;ii)保持模式特定特征空间的分布,彼此相似;iii)在每种模式中定义最相似的图像,而不需要任何附加说明的培训图像。为此,我们提出了一个新的目标,包括三个损失功能,即:i)最大限度地提供不同模式的相互信息;ii)以自我监督的方式尽量减少多种模式图像的相对距离;ii)在各种类型图像的正像化方法中,在内部分析方法中,拟议采用最相似的正式格式方法,从而消除不同的方式。