We present an efficient end-to-end pipeline for largescale landmark recognition and retrieval. We show how to combine and enhance concepts from recent research in image retrieval and introduce two architectures especially suited for large-scale landmark identification. A model with deep orthogonal fusion of local and global features (DOLG) using an EfficientNet backbone as well as a novel Hybrid-Swin-Transformer is discussed and details how to train both architectures efficiently using a step-wise approach and a sub-center arcface loss with dynamic margins are provided. Furthermore, we elaborate a novel discriminative re-ranking methodology for image retrieval. The superiority of our approach was demonstrated by winning the recognition and retrieval track of the Google Landmark Competition 2021.
翻译:我们展示了高效的端对端管道,用于大规模地标识别和检索。我们展示了如何结合和加强最近图像检索研究中的概念,并引入了两个特别适合大规模地标识别的架构。我们讨论了一个使用高效网络骨干以及新型混合-双向转换的本地和全球地物(DOLG)的深垂直融合模型,并详细介绍了如何利用渐进方法和动态边际的子中心弧形损失对两个架构进行有效的培训。此外,我们制定了一种新的歧视性的图像检索重新排序方法。我们的方法的优势表现是赢得Google Landmark 2021竞赛的承认和检索轨道。