Grouping has been commonly used in deep metric learning for computing diverse features. However, current methods are prone to overfitting and lack interpretability. In this work, we propose an improved and interpretable grouping method to be integrated flexibly with any metric learning framework. Our method is based on the attention mechanism with a learnable query for each group. The query is fully trainable and can capture group-specific information when combined with the diversity loss. An appealing property of our method is that it naturally lends itself interpretability. The attention scores between the learnable query and each spatial position can be interpreted as the importance of that position. We formally show that our proposed grouping method is invariant to spatial permutations of features. When used as a module in convolutional neural networks, our method leads to translational invariance. We conduct comprehensive experiments to evaluate our method. Our quantitative results indicate that the proposed method outperforms prior methods consistently and significantly across different datasets, evaluation metrics, base models, and loss functions. For the first time to the best of our knowledge, our interpretation results clearly demonstrate that the proposed method enables the learning of distinct and diverse features across groups. The code is available on https://github.com/XinyiXuXD/DGML-master.
翻译:用于计算不同特性的深度衡量学习通常使用分组方法,但是,目前的方法容易过于完善,而且缺乏解释性。在这项工作中,我们建议采用改进和可解释的分组方法,以灵活地与任何计量学习框架相结合。我们的方法基于关注机制,每个组都有可学习的查询。查询是完全可培训的,在与多样性损失相结合时可以捕捉特定群体的信息。我们方法的一个令人感兴趣的属性是,它自然地适合解释性能。可以学习的查询和每个空间位置之间的注意分数可以被解释为该位置的重要性。我们正式表明,我们提议的分组方法不易对特征的空间变异性。当我们作为同源神经网络的一个模块使用时,我们的方法会导致翻译性变异性。我们进行全面实验,以评价我们的方法。我们的定量结果表明,拟议的方法在不同的数据集、评价指标、基准模型和损失函数之间,始终明显地超越了先前的方法。我们所了解的第一次,我们的解释结果清楚地表明,拟议的方法能够使不同和不同的DVX/DG/MLA得到的版本。