This study addresses an image-matching problem in challenging cases, such as large scene variations or textureless scenes. To gain robustness to such situations, most previous studies have attempted to encode the global contexts of a scene via graph neural networks or transformers. However, these contexts do not explicitly represent high-level contextual information, such as structural shapes or semantic instances; therefore, the encoded features are still not sufficiently discriminative in challenging scenes. We propose a novel image-matching method that applies a topic-modeling strategy to encode high-level contexts in images. The proposed method trains latent semantic instances called topics. It explicitly models an image as a multinomial distribution of topics, and then performs probabilistic feature matching. This approach improves the robustness of matching by focusing on the same semantic areas between the images. In addition, the inferred topics provide interpretability for matching the results, making our method explainable. Extensive experiments on outdoor and indoor datasets show that our method outperforms other state-of-the-art methods, particularly in challenging cases. The code is available at https://github.com/TruongKhang/TopicFM.
翻译:本研究在具有挑战性的情况下解决图像匹配问题,例如大场景变异或没有纹理的场景。为了对这种情况变得稳健,大多数先前的研究都试图通过图形神经网络或变压器对场景的全球背景进行编码,然而,这些背景并不明确代表高层次的背景资料,例如结构形状或语义实例;因此,在具有挑战性的场景中,编码的特征仍然不够有区别性。我们提出了一个新型的图像匹配方法,将专题模型战略应用于在图像中编码高层次背景。拟议的方法将潜在的语义事件称为主题。它明确地将图像模拟成一个主题的多音义分布,然后进行概率特征匹配。这种方法通过注重图像之间的同一语义区域来改进匹配的稳健性。此外,所推断的专题提供了匹配结果的可解释性,使我们的方法可以解释。在户外和室内数据集上进行的广泛实验表明,我们的方法比其他状态-艺术方法要强得多,特别是在具有挑战性的情况下。代码可在 httpsg/Kgiuthrub/topict。