Recent studies have shown that many deep metric learning loss functions perform very similarly under the same experimental conditions. One potential reason for this unexpected result is that all losses let the network focus on similar image regions or properties. In this paper, we investigate this by conducting a two-step analysis to extract and compare the learned visual features of the same model architecture trained with different loss functions: First, we compare the learned features on the pixel level by correlating saliency maps of the same input images. Second, we compare the clustering of embeddings for several image properties, e.g. object color or illumination. To provide independent control over these properties, photo-realistic 3D car renders similar to images in the Cars196 dataset are generated. In our analysis, we compare 14 pretrained models from a recent study and find that, even though all models perform similarly, different loss functions can guide the model to learn different features. We especially find differences between classification and ranking based losses. Our analysis also shows that some seemingly irrelevant properties can have significant influence on the resulting embedding. We encourage researchers from the deep metric learning community to use our methods to get insights into the features learned by their proposed methods.
翻译:最近的研究显示,许多深度衡量学习损失的功能在相同的实验条件下表现非常相似。这种意外结果的一个潜在原因是,所有损失都让网络关注类似的图像区域或属性。在本文中,我们通过进行两步分析来调查这一点,以提取和比较经过不同损失功能培训的同一模型结构中学到的视觉特征:首先,我们比较像素层次上学到的特征,将同一输入图像的显著图象相挂钩。第二,我们比较若干图像属性的嵌入群集,例如对象颜色或照明。为了对这些属性进行独立控制,照片现实的三维汽车与Cars196数据集中的图像相似。我们的分析将14个经过预先训练的模型进行比较,发现即使所有模型都使用类似的不同损失功能,也能指导模型学习不同的特征。我们特别发现基于损失的分类和排名之间的差异。我们的分析还表明,有些看起来无关的属性可能对由此形成的嵌入产生重大影响。我们鼓励深层次的计量学习社区的研究人员使用我们的方法,以了解他们建议的方法所学到的特征。