普通 Jensen-Shannon 与噪音标签的学习差异损失 (Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels)

Prior works have found it beneficial to combine provably noise-robust loss functions e.g., mean absolute error (MAE) with standard categorical loss function e.g. cross entropy (CE) to improve their learnability. Here, we propose to use Jensen-Shannon divergence as a noise-robust loss function and show that it interestingly interpolate between CE and MAE with a controllable mixing parameter. Furthermore, we make a crucial observation that CE exhibit lower consistency around noisy data points. Based on this observation, we adopt a generalized version of the Jensen-Shannon divergence for multiple distributions to encourage consistency around data points. Using this loss function, we show state-of-the-art results on both synthetic (CIFAR), and real-world (WebVision) noise with varying noise rates.

翻译：先前的工程发现,将可察觉到的噪音-紫外线损失功能(例如,平均绝对误差(MAE))与标准的绝对损耗功能(例如,跨环对流(CE))结合起来,以提高其学习能力是有益的。在这里,我们提议使用詹森-沙农差异作为噪音-紫外线损失功能,并表明CE和MAE之间以可控混合参数进行有趣的内插。此外,我们提出一个关键意见,即CE在吵闹的数据点周围表现出较低的一致性。基于这一观察,我们采用了一个通用版本的Jensen-Shannon差异,用于多种分布,以鼓励数据点周围的一致性。我们利用这一损失功能,显示了合成(CIFAR)和真实世界(WebVision)噪音的不同率的最新结果。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

专知会员服务

39+阅读 · 2020年11月3日

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

【异构图迁移的零样本学习】Heterogeneous Graph-based Knowledge Transfer for Generalized Zero-shot Learning