简单、有效和一般:交叉视图图像的新的后骨地理定位 (Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization)

In this work, we aim at an important but less explored problem of a simple yet effective backbone specific for cross-view geo-localization task. Existing methods for cross-view geo-localization tasks are frequently characterized by 1) complicated methodologies, 2) GPU-consuming computations, and 3) a stringent assumption that aerial and ground images are centrally or orientation aligned. To address the above three challenges for cross-view image matching, we propose a new backbone network, named Simple Attention-based Image Geo-localization network (SAIG). The proposed SAIG effectively represents long-range interactions among patches as well as cross-view correspondence with multi-head self-attention layers. The "narrow-deep" architecture of our SAIG improves the feature richness without degradation in performance, while its shallow and effective convolutional stem preserves the locality, eliminating the loss of patchify boundary information. Our SAIG achieves state-of-the-art results on cross-view geo-localization, while being far simpler than previous works. Furthermore, with only 15.9% of the model parameters and half of the output dimension compared to the state-of-the-art, the SAIG adapts well across multiple cross-view datasets without employing any well-designed feature aggregation modules or feature alignment algorithms. In addition, our SAIG attains competitive scores on image retrieval benchmarks, further demonstrating its generalizability. As a backbone network, our SAIG is both easy to follow and computationally lightweight, which is meaningful in practical scenario. Moreover, we propose a simple Spatial-Mixed feature aggregation moDule (SMD) that can mix and project spatial information into a low-dimensional space to generate feature descriptors... (The code is available at https://github.com/yanghongji2007/SAIG)

翻译：在这项工作中,我们的目标是解决一个重要但较少探索的问题,即一个简单而有效的骨干系统,具体针对交叉视图地理定位任务。交叉视图地理定位任务的现有方法通常具有以下特征:(1) 复杂方法,(2) 通用图形计算,(3) 严格的假设,即空中和地面图像是中央或方向一致的。为了应对以上三个交叉视图图像匹配挑战,我们建议建立一个新的骨干网络,名为“简单关注基于关注的图像地理定位网络 ” 。拟议的SAIG有效地代表了跨视图地理定位任务之间以及多头自闭层交叉视图通信之间的长期质量互动。我们SAIG的现有“精确地”结构改善了功能的丰富性,而没有降低性能,而其浅而有效的革命性干点则保护了位置,消除了拼贴边界信息的丢失。我们SAIG在交叉视图地理定位上取得了最新的结果,但比以往的工作要简单得多。此外,模型参数和与多头层自我智能的交叉通信通信通信。在SAIG系统上,在SA-IG数据库中将一个简单且不具有竞争力的缩略缩缩缩缩的缩缩缩缩缩的缩缩图中,在SLAGAGF的缩图中,在总体图像中可以进一步上进行升级。