Cross-view geo-localization (CVGL) enables drone localization by matching aerial images to geo-tagged satellite databases, which is critical for autonomous navigation in GNSS-denied environments. However, existing methods rely on resource-intensive feature alignment and multi-branch architectures, incurring high inference costs that limit their deployment on mobile edge devices. We propose MobileGeo, a mobile-friendly framework designed for efficient on-device CVGL. MobileGeo achieves its efficiency through two key components: 1) During training, a Hierarchical Distillation (HD-CVGL) paradigm, coupled with Uncertainty-Aware Prediction Alignment (UAPA), distills essential information into a compact model without incurring inference overhead. 2) During inference, an efficient Multi-view Selection Refinement Module (MSRM) leverages mutual information to filter redundant views and reduce computational load. Extensive experiments demonstrate that MobileGeo outperforms previous state-of-the-art methods, achieving a 4.19\% improvement in AP on University-1652 dataset while being over 5$\times$ more efficient in FLOPs and 3$\times$ faster. Crucially, MobileGeo runs at 251.5 FPS on an NVIDIA AGX Orin edge device, demonstrating its practical viability for real-time on-device drone geo-localization.
翻译:跨视角地理定位通过将航拍图像与地理标记的卫星数据库进行匹配,实现无人机定位,这对于在GNSS拒止环境中的自主导航至关重要。然而,现有方法依赖于资源密集的特征对齐和多分支架构,导致较高的推理成本,限制了其在移动边缘设备上的部署。我们提出了MobileGeo,一个专为高效设备端跨视角地理定位设计的移动友好框架。MobileGeo通过两个关键组件实现其高效性:1)在训练阶段,采用分层蒸馏范式(HD-CVGL)结合不确定性感知预测对齐(UAPA),将关键信息蒸馏至紧凑模型中,且不引入推理开销。2)在推理阶段,高效的多视角选择优化模块(MSRM)利用互信息过滤冗余视角,降低计算负载。大量实验表明,MobileGeo在University-1652数据集上的平均精度提升了4.19%,同时浮点运算效率提高超过5倍,推理速度加快3倍。尤为重要的是,MobileGeo在NVIDIA AGX Orin边缘设备上实现了251.5 FPS的运行速度,证明了其在实时设备端无人机地理定位中的实际可行性。