Cost aggregation is a highly important process in image matching tasks, which aims to disambiguate the noisy matching scores. Existing methods generally tackle this by hand-crafted or CNN-based methods, which either lack robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields and inadaptability. In this paper, we introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map with the help of some architectural designs that allow us to fully enjoy global receptive fields of self-attention mechanism. Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer that its complexity grows with the size of spatial and feature dimensions, which restrict its applicability only at limited resolution and result in rather limited performance, we propose CATs++, an extension of CATs. Our proposed methods outperform the previous state-of-the-art methods by large margins, setting a new state-of-the-art for all the benchmarks, including PF-WILLOW, PF-PASCAL, and SPair-71k. We further provide extensive ablation studies and analyses.
翻译:在图像匹配任务中,成本汇总是一个非常重要的过程,目的是消除杂乱的匹配得分。现有的方法通常通过手工制作或CNN方法来解决这个问题,这些方法要么对严重的变形缺乏强力,要么继承了CNN的限制,这种限制没有歧视不正确的匹配,因为接受字段有限,适应性不适应性强。在本文中,我们采用与变异器(CATs)的成本汇总,以便通过探索初步关联地图之间的全球共识来解决这一问题,同时借助一些建筑设计,使我们能够充分享受全球接受的自我保护机制领域。此外,还减轻CAT可能面临的一些限制,即由于使用标准变异器,其复杂性随着空间和地貌的大小而增长,从而导致高计算成本,这种变异器将其适用性限制在有限的分辨率上,并导致相当有限的性能。我们提议采用CATs++,扩大CATs。我们提出的方法超越了以往最先进的大边际关系方法,为所有基准设定了新的状态,包括PF-WILOW、PF-PAS-PAS-CALLA和SPA。