Deep Metric Learning (DML) proposes to learn metric spaces which encode semantic similarities as embedding space distances. These spaces should be transferable to classes beyond those seen during training. Commonly, DML methods task networks to solve contrastive ranking tasks defined over binary class assignments. However, such approaches ignore higher-level semantic relations between the actual classes. This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes, impacting the generalizability of the learned metric space. To tackle this issue, we propose a language guidance objective for visual similarity learning. Leveraging language embeddings of expert- and pseudo-classnames, we contextualize and realign visual representation spaces corresponding to meaningful language semantics for better semantic consistency. Extensive experiments and ablations provide a strong motivation for our proposed approach and show language guidance offering significant, model-agnostic improvements for DML, achieving competitive and state-of-the-art results on all benchmarks. Code available at https://github.com/ExplainableML/LanguageGuidance_for_DML.
翻译:深磁学习(DML) 提议学习将语义相似性编码为嵌入空间距离的测量空间。这些空间应可转让给培训期间所见以外的班级。通常,DML采用任务网络解决比二等派任务界定的对比性排序任务。然而,这类方法忽视了实际班级之间的较高层次语义关系。这导致学习了嵌入空间,以编码不完整的语义背景,并歪曲了各班级之间的语义关系,从而影响了所学度空间的普遍性。为了解决这一问题,我们提出了视觉相似性学习的语言指导目标。利用专家和伪类名称的语言嵌入,我们将视觉代表空间与有意义的语言语义一致性相对应,我们根据背景化和调整了视觉表达空间。广泛的实验和推介为我们拟议的方法提供了强烈的动力,并展示了语言指导,为DML提供了显著的模型-语义改进,在所有基准上实现了竞争性和状态-艺术成果。代码见https://github.com/ExplainMLanguageGuidance_DMOR_DL。