Deep Metric Learning (DML) aims to find representations suitable for zero-shot transfer to a priori unknown test distributions. However, common evaluation protocols only test a single, fixed data split in which train and test classes are assigned randomly. More realistic evaluations should consider a broad spectrum of distribution shifts with potentially varying degree and difficulty. In this work, we systematically construct train-test splits of increasing difficulty and present the ooDML benchmark to characterize generalization under out-of-distribution shifts in DML. ooDML is designed to probe the generalization performance on much more challenging, diverse train-to-test distribution shifts. Based on our new benchmark, we conduct a thorough empirical analysis of state-of-the-art DML methods. We find that while generalization tends to consistently degrade with difficulty, some methods are better at retaining performance as the distribution shift increases. Finally, we propose few-shot DML as an efficient way to consistently improve generalization in response to unknown test shifts presented in ooDML. Code available here: https://github.com/CompVis/Characterizing_Generalization_in_DML.
翻译:深磁学习(DML)的目的是寻找适合零发传输到先天未知的测试分布的表达方式。然而,通用评价协议仅测试一个单一的固定数据,其中对火车和测试班进行随机分配。更现实的评估应当考虑分布变化的广泛范围,其程度和难度可能各不相同。在这项工作中,我们系统地构建了日益困难的火车测试分解,并提出了ooDML基准,以说明DML在分配转移下的一般化特点。ooDML旨在探索在更具挑战性、多样化的火车到测试分布变化方面的概括化业绩。根据我们的新基准,我们对最先进的DML方法进行彻底的经验分析。我们发现,虽然一般化往往会随着分布变化的增加而持续退化,但有些方法在保持绩效方面比较好。最后,我们建议少发DML作为持续改进对ODML提出的未知的测试转移的概括性的有效方法。这里有代码:https://github.com/Commpvis/Charasterizing_Genal_in_DML。