具有概率筛选实例的低比远距深米学习 (Noise-Resistant Deep Metric Learning with Probabilistic Instance Filtering)

Noisy labels are commonly found in real-world data, which cause performance degradation of deep neural networks. Cleaning data manually is labour-intensive and time-consuming. Previous research mostly focuses on enhancing classification models against noisy labels, while the robustness of deep metric learning (DML) against noisy labels remains less well-explored. In this paper, we bridge this important gap by proposing Probabilistic Ranking-based Instance Selection with Memory (PRISM) approach for DML. PRISM calculates the probability of a label being clean, and filters out potentially noisy samples. Specifically, we propose a novel method, namely the von Mises-Fisher Distribution Similarity (vMF-Sim), to calculate this probability by estimating a von Mises-Fisher (vMF) distribution for each data class. Compared with the existing average similarity method (AvgSim), vMF-Sim considers the variance of each class in addition to the average similarity. With such a design, the proposed approach can deal with challenging DML situations in which the majority of the samples are noisy. Extensive experiments on both synthetic and real-world noisy dataset show that the proposed approach achieves up to 8.37% higher Precision@1 compared with the best performing state-of-the-art baseline approaches, within reasonable training time.

翻译：在现实世界数据中通常会发现噪音标签,这导致深神经网络的性能退化。人工清洗数据是劳动密集型和耗时的。以往的研究主要侧重于加强针对噪音标签的分类模式,而针对噪音标签的深度衡量学习(DML)的强度仍然不那么受到很好探讨。在本文中,我们通过提出DML(PRISM)的基于概率的排序和记忆(PRISM)选择标准的方法来弥补这一重要差距。PRISM计算出标签清洁的可能性,并过滤出潜在的噪音样本。具体地说,我们提出了一种新颖的方法,即von Mises-Fisher分布相似性(vMF-Sim),通过估计每个数据等级的冯Mises-Fisher(VMFMF)分布情况来计算这一可能性。与现有的平均相似方法(AvgSim)相比,VMF-Sim认为每个等级与平均相似性差异。在设计中,拟议的方法可以处理MDL具有挑战性的挑战性的情况,即,大多数样本都在其中具有噪音。在高压力。在拟议的合成和现实世界范围内进行最高级的实验,在进行最佳的深度数据上进行。

相关内容

度量学习

关注 3372

度量学习的目的为了衡量样本之间的相近程度，而这也正是模式识别的核心问题之一。大量的机器学习方法，比如K近邻、支持向量机、径向基函数网络等分类方法以及K-means聚类方法，还有一些基于图的方法，其性能好坏都主要有样本之间的相似度量方法的选择决定。度量学习通常的目标是使同类样本之间的距离尽可能缩小，不同类样本之间的距离尽可能放大。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日