Deep features have been proven powerful in building accurate dense semantic correspondences in various previous works. However, the multi-scale and pyramidal hierarchy of convolutional neural networks has not been well studied to learn discriminative pixel-level features for semantic correspondence. In this paper, we propose a multi-scale matching network that is sensitive to tiny semantic differences between neighboring pixels. We follow the coarse-to-fine matching strategy and build a top-down feature and matching enhancement scheme that is coupled with the multi-scale hierarchy of deep convolutional neural networks. During feature enhancement, intra-scale enhancement fuses same-resolution feature maps from multiple layers together via local self-attention and cross-scale enhancement hallucinates higher-resolution feature maps along the top-down hierarchy. Besides, we learn complementary matching details at different scales thus the overall matching score is refined by features of different semantic levels gradually. Our multi-scale matching network can be trained end-to-end easily with few additional learnable parameters. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on three popular benchmarks with high computational efficiency.
翻译:在以前的各种作品中,在建立精确密集的语义对应物方面,已经证明具有很强的深层特征。然而,对进化神经网络的多尺度和金字塔级结构没有进行很好的研究,以学习语义对应物的歧视性像素级特征。在本文件中,我们建议建立一个多尺度匹配网络,对邻近像素之间的细小语义差异十分敏感。我们遵循粗到软的配对战略,并建立一个自上而下的增强功能和配对增强机制,与深层共进神经网络的多级结构相配合。在增强特征过程中,通过本地自我意识和跨尺度增强高分辨率类同从多个层次从多个层次一起绘制的同级增强导体特征图。此外,我们还在不同尺度上学习了补充性匹配细节,从而通过不同语义层次的特征逐步完善了总体匹配。我们多尺度的匹配网络可以很容易地在终端到终端,同时很少有其他可学习的参数。实验结果显示,拟议方法在三个通用基准上实现了高计算效率的状态。