Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images. However, current RSCTIR methods mainly focus on global features of RS images, which leads to the neglect of local features that reflect target relationships and saliency. In this article, we first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels. MIDF leverages local information to correct global information, utilizes global information to supplement local information, and uses the dynamic addition of the two to generate prominent visual representation. To alleviate the pressure of the redundant targets on the graph convolution network (GCN) and to improve the model s attention on salient instances during modeling local features, the de-noised representation matrix and the enhanced adjacency matrix (DREA) are devised to assist GCN in producing superior local representations. DREA not only filters out redundant features with high similarity, but also obtains more powerful local features by enhancing the features of prominent objects. Finally, to make full use of the information in the similarity matrix during inference, we come up with a plug-and-play multivariate rerank (MR) algorithm. The algorithm utilizes the k nearest neighbors of the retrieval results to perform a reverse search, and improves the performance by combining multiple components of bidirectional retrieval. Extensive experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task. The code of GaLR method, MR algorithm, and corresponding files have been made available at https://github.com/xiaoyuan1996/GaLR .
翻译:最近,由于能够使遥感图像快速和灵活的信息提取成为遥感图像的快速和灵活信息提取工具,RSCTIR目前的方法主要侧重于RS图像的全球特征,从而导致忽视反映目标关系和显著性的本地特征。在本篇文章中,我们首先提议了一个基于全球和地方信息的新颖的RSCTIR框架(GALR),并设计了一个多级信息动态组合模块(MIDF),以有效地整合不同层次的特征。MIDF利用当地信息来纠正全球信息,利用全球信息补充当地信息,并使用两种图像的动态添加来生成突出的视觉形象。为了减轻图象变动网络(GCN)冗余目标的压力,并改进模型对地方特征建模中的亮点的注意(GALRR),用于帮助GCN生成高级的本地图像。DREA不仅过滤了具有高度相似性的全球信息,还利用了高清晰度的直径直的服务器的功能,最后还利用了最强性能。