Methods that combine local and global features have recently shown excellent performance on multiple challenging deep image retrieval benchmarks, but their use of local features raises at least two issues. First, these local features simply boil down to the localized map activations of a neural network, and hence can be extremely redundant. Second, they are typically trained with a global loss that only acts on top of an aggregation of local features; by contrast, testing is based on local feature matching, which creates a discrepancy between training and testing. In this paper, we propose a novel architecture for deep image retrieval, based solely on mid-level features that we call Super-features. These Super-features are constructed by an iterative attention module and constitute an ordered set in which each element focuses on a localized and discriminant image pattern. For training, they require only image labels. A contrastive loss operates directly at the level of Super-features and focuses on those that match across images. A second complementary loss encourages diversity. Experiments on common landmark retrieval benchmarks validate that Super-features substantially outperform state-of-the-art methods when using the same number of features, and only require a significantly smaller memory footprint to match their performance. Code and models are available at: https://github.com/naver/FIRe.
翻译:结合当地和全球特征的方法最近显示,在具有多重挑战性的深层图像检索基准方面表现优异,但它们对本地特征的使用至少提出了两个问题。 首先,这些本地特征只是简单地归结到神经网络的局部地图启动程序,因此可能是极其多余的。 其次,它们通常经过全球损失的训练,而只是在当地特征的集合上才能发挥作用; 相比之下, 测试的基础是本地特征匹配, 从而在培训和测试之间造成差异。 在本文件中, 我们提议一个用于深度图像检索的新结构, 仅以我们称之为超自然的中等层面特征为基础。 这些超级特征是由一个迭代关注模块构建的, 构成一个有顺序的设置, 每一个元素都聚焦于局部和相异的图像模式。 对于培训来说, 它们只需要有图像标签。 对比性损失直接发生在超强的功能水平上, 并且侧重于相匹配图像的图像。 第二项补充损失鼓励多样性。 共同的里程碑检索基准实验证实, 当使用相同的特征数量时, 超功能大大超越了状态- 艺术方法。 只需要一个显著的缩缩缩缩缩的模型/ 。