Finding correspondences across images is an important task in many visual applications. Recent state-of-the-art methods focus on end-to-end learning-based architectures designed in a coarse-to-fine manner. They use a very deep CNN or multi-block Transformer to learn robust representation, which requires high computation power. Moreover, these methods learn features without reasoning about objects, shapes inside images, thus lacks of interpretability. In this paper, we propose an architecture for image matching which is efficient, robust, and interpretable. More specifically, we introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic and then augment the features inside each topic for accurate matching. To infer topics, we first learn global embedding of topics and then use a latent-variable model to detect-then-assign the image structures into topics. Our method can only perform matching in co-visibility regions to reduce computations. Extensive experiments in both outdoor and indoor datasets show that our method outperforms the recent methods in terms of matching performance and computational efficiency. The code is available at https://github.com/TruongKhang/TopicFM.
翻译:在许多视觉应用中, 寻找图像之间的对应是一个重要的任务。 最近的最先进的艺术方法在很多视觉应用中是一项重要任务 。 最近的最先进的功能匹配模块名为“ 主题FM ”, 它可以将图像之间的空间结构组织成一个主题, 然后为精确匹配而增加每个主题的功能。 它们使用非常深的CNN 或多块变换器来学习强健的代表性, 这需要很高的计算能力。 此外, 这些方法可以在没有对象推理的情况下学习特征, 形状在图像内部, 因而缺乏可解释性 。 在本文中, 我们建议一个图像匹配的架构, 效率、 强力和可解释性。 更具体地说, 我们引入了一个叫做“ 主题FMFM ” 的新颖的功能匹配模块, 可以将图像结构组织起来, 然后将每个主题的功能加到全球范围, 然后用一个潜在的可变模型来检测图像结构到主题中。 我们的方法只能在共同可见的区域进行匹配以减少计算。 在户内外数据集中进行的广泛实验显示, 我们的方法在匹配性能和计算效率方面都不符合最近的方法。 。 该代码可以在 http://Kpresm。