Recently, substantial research efforts in Deep Metric Learning (DML) focused on designing complex pairwise-distance losses, which require convoluted schemes to ease optimization, such as sample mining or pair weighting. The standard cross-entropy loss for classification has been largely overlooked in DML. On the surface, the cross-entropy may seem unrelated and irrelevant to metric learning as it does not explicitly involve pairwise distances. However, we provide a theoretical analysis that links the cross-entropy to several well-known and recent pairwise losses. Our connections are drawn from two different perspectives: one based on an explicit optimization insight; the other on discriminative and generative views of the mutual information between the labels and the learned features. First, we explicitly demonstrate that the cross-entropy is an upper bound on a new pairwise loss, which has a structure similar to various pairwise losses: it minimizes intra-class distances while maximizing inter-class distances. As a result, minimizing the cross-entropy can be seen as an approximate bound-optimization (or Majorize-Minimize) algorithm for minimizing this pairwise loss. Second, we show that, more generally, minimizing the cross-entropy is actually equivalent to maximizing the mutual information, to which we connect several well-known pairwise losses. Furthermore, we show that various standard pairwise losses can be explicitly related to one another via bound relationships. Our findings indicate that the cross-entropy represents a proxy for maximizing the mutual information -- as pairwise losses do -- without the need for convoluted sample-mining heuristics. Our experiments over four standard DML benchmarks strongly support our findings. We obtain state-of-the-art results, outperforming recent and complex DML methods.
翻译:最近,Deep Metric Learning(DML)的大量研究工作侧重于设计复杂的双向-距离损失,这需要复杂的计划来方便优化优化,例如采样采矿或配对权重。标准交叉热带损失在DML中基本上被忽略。表面上,交叉热带活动似乎与计量学习无关和无关,因为它没有明确涉及双向距离。然而,我们提供了一种理论分析,将交叉热带活动与一些众所周知的近期对口损失联系起来。我们的联系来自两种不同的观点:一种观点基于明确的优化洞察;另一种观点则基于标签和所学特征之间相互信息的区别性和染色性观点。首先,我们明确表明交叉热带活动是新双向损失的上限,其结构与各种双向损失相似:在最大程度上减少阶级内部距离的同时最大限度地减少不同层次的距离。结果是将交叉热带活动与不同层次的交叉理解性损失视为一种大致的界限(或者主要化的) 对比性(或者主要化的) 将我们相互之间损失转化为相互损失的对比。