In deep metric learning, the Triplet Loss has emerged as a popular method to learn many computer vision and natural language processing tasks such as facial recognition, object detection, and visual-semantic embeddings. One issue that plagues the Triplet Loss is network collapse, an undesirable phenomenon where the network projects the embeddings of all data onto a single point. Researchers predominately solve this problem by using triplet mining strategies. While hard negative mining is the most effective of these strategies, existing formulations lack strong theoretical justification for their empirical success. In this paper, we utilize the mathematical theory of isometric approximation to show an equivalence between the Triplet Loss sampled by hard negative mining and an optimization problem that minimizes a Hausdorff-like distance between the neural network and its ideal counterpart function. This provides the theoretical justifications for hard negative mining's empirical efficacy. In addition, our novel application of the isometric approximation theorem provides the groundwork for future forms of hard negative mining that avoid network collapse. Our theory can also be extended to analyze other Euclidean space-based metric learning methods like Ladder Loss or Contrastive Learning.
翻译:在深入的计量学习中,三联式损失已成为一种受欢迎的方法,用于学习许多计算机视觉和自然语言处理任务,如面部识别、物体探测和视觉-语义嵌入。一个困扰三联式损失的问题是网络崩溃,这是一个不受欢迎的现象,网络将所有数据嵌入一个点。研究人员主要通过使用三连式采矿战略解决这个问题。虽然硬负式采矿是这些战略中最有效的,但现有的配方缺乏其成功经验的强有力的理论依据。在本文中,我们利用数对称近近的数学理论来显示通过硬性负式采矿取样的三联式损失与最大限度地减少神经网络与其理想对应功能之间与豪斯多夫式距离的优化问题之间的等等同。这为硬负式采矿的经验效率提供了理论依据。此外,我们新采用的偏差式近像理论为今后避免网络崩溃的硬式采矿形式提供了基础。我们的理论还可以扩展为分析其他以欧洲空间为基础的标准学习方法,如Ladder损失或反向学习。