Label smoothing (LS) is an arising learning paradigm that uses the positively weighted average of both the hard training labels and uniformly distributed soft labels. It was shown that LS serves as a regularizer for training data with hard labels and therefore improves the generalization of the model. Later it was reported LS even helps with improving robustness when learning with noisy labels. However, we observed that the advantage of LS vanishes when we operate in a high label noise regime. Intuitively speaking, this is due to the increased entropy of $\mathbb{P}(\text{noisy label}|X)$ when the noise rate is high, in which case, further applying LS tends to "oversmooth" the estimated posterior. We proceeded to discover that several learning-with-noisy-labels solutions in the literature instead relate more closely to negative/not label smoothing (NLS), which acts counter to LS and defines as using a negative weight to combine the hard and soft labels! We provide understandings for the properties of LS and NLS when learning with noisy labels. Among other established properties, we theoretically show NLS is considered more beneficial when the label noise rates are high. We provide extensive experimental results on multiple benchmarks to support our findings too.
翻译:label 滑动( LS) 是一个新兴的学习模式, 使用硬培训标签和统一分布软标签的正加权平均值, 显示LS是用硬标签培训数据的常规化器, 从而改进模型的概括化。 后来, LS 被报告在学习噪音标签时帮助提高稳健性。 然而, 我们观察到, 当我们在一个高标签噪音制度下运行时, LS 的优势会消失。 直觉地说, 这是因为当噪音率高时, LS 会增加 $\ mathbb{ P} (\ text{ noisy lab*X ) 的 倍增 。 在这种情况下, 进一步应用 LS 往往会“ 超高” 估计的外表。 我们发现, 文献中的一些与噪音有关的解决方案更接近于负面/ 不贴标签( NLS ) 。 这与LS 相对立, 并被定义为使用负重的重量来结合硬标签! 我们为LS 和 NLS 的属性提供了理解, 当学习高噪音标签时, 我们考虑过于高的NS 提供了高的 。 我们的实验性标签是高的特性。