In content-based image retrieval, the first-round retrieval result by simple visual feature comparison may be unsatisfactory, which can be refined by visual re-ranking techniques. In image retrieval, it is observed that the contextual similarity among the top-ranked images is an important clue to distinguish the semantic relevance. Inspired by this observation, in this paper, we propose a visual re-ranking method by contextual similarity aggregation with self-attention. In our approach, for each image in the top-K ranking list, we represent it into an affinity feature vector by comparing it with a set of anchor images. Then, the affinity features of the top-K images are refined by aggregating the contextual information with a transformer encoder. Finally, the affinity features are used to recalculate the similarity scores between the query and the top-K images for re-ranking of the latter. To further improve the robustness of our re-ranking model and enhance the performance of our method, a new data augmentation scheme is designed. Since our re-ranking model is not directly involved with the visual feature used in the initial retrieval, it is ready to be applied to retrieval result lists obtained from various retrieval algorithms. We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
翻译:在基于内容的图像检索中,通过简单视觉特征比较获得的第一回合检索结果可能不尽如人意,可以通过视觉重新排序技术加以改进。在图像检索中,观察到上层图像的相近性是区分语义相关性的重要线索。在本文中,我们根据这一观察,提出以自省方式通过背景相似性聚合进行视觉重新排序的方法。在我们的方法中,对于最高K级列表中的每张图像,我们通过将它与一组锁定图像进行比较,将其转化为亲近性特性矢量。然后,通过将背景信息与变压器编码器组合起来,使上K级图像的亲近性特征得到改进。最后,使用亲近性特征来重新计算查询和上K级图像之间的相似性分数,以便重新排序后。为了进一步提高我们重新排序模型的稳健性并提高我们方法的性能,我们设计了新的数据增强计划。由于我们的重新排序模型与初始检索中使用的视觉特征没有直接关联性,因此,我们已准备从初步检索中进行全面的图像检索,我们提出的一般数据排序列表将被用于测试。