While there has been substantial progress in learning suitable distance metrics, these techniques in general lack transparency and decision reasoning, i.e., explaining why the input set of images is similar or dissimilar. In this work, we solve this key problem by proposing the first method to generate generic visual similarity explanations with gradient-based attention. We demonstrate that our technique is agnostic to the specific similarity model type, e.g., we show applicability to Siamese, triplet, and quadruplet models. Furthermore, we make our proposed similarity attention a principled part of the learning process, resulting in a new paradigm for learning similarity functions. We demonstrate that our learning mechanism results in more generalizable, as well as explainable, similarity models. Finally, we demonstrate the generality of our framework by means of experiments on a variety of tasks, including image retrieval, person re-identification, and low-shot semantic segmentation.
翻译:虽然在学习适当的远距离计量方面已经取得了很大进展,但这些技术总体上缺乏透明度和决策推理,即解释输入的图像组为何相似或不同。在这项工作中,我们通过提出第一个方法来提出具有梯度关注的通用视觉相似性解释来解决这一关键问题。我们证明,我们的技术与具体的相似性模型类型(例如,我们表现出对Siamese、三胞胎和四胞胎模型的适用性)是不可知的。此外,我们把我们提议的相似性关注作为学习过程的一个原则部分,从而形成一种新的学习相似性功能的范例。我们证明,我们的学习机制可以产生更普遍适用的以及可解释的相似性模型。最后,我们通过对各种任务进行实验,包括图像检索、人再识别和低发的语义分割,来证明我们框架的普遍性。