Label Smoothing (LS) is an effective regularizer to improve the generalization of state-of-the-art deep models. For each training sample the LS strategy smooths the one-hot encoded training signal by distributing its distribution mass over the non ground-truth classes, aiming to penalize the networks from generating overconfident output distributions. This paper introduces a novel label smoothing technique called Pairwise Label Smoothing (PLS). The PLS takes a pair of samples as input. Smoothing with a pair of ground-truth labels enables the PLS to preserve the relative distance between the two truth labels while further soften that between the truth labels and the other targets, resulting in models producing much less confident predictions than the LS strategy. Also, unlike current LS methods, which typically require to find a global smoothing distribution mass through cross-validation search, PLS automatically learns the distribution mass for each input pair during training. We empirically show that PLS significantly outperforms LS and the baseline models, achieving up to 30% of relative classification error reduction. We also visually show that when achieving such accuracy gains the PLS tends to produce very low winning softmax scores.
翻译:Label 滑动( LS) 是一个有效的常规化工具, 目的是改进最先进的深层模型的普及性。 对于每个培训样本, LS 战略通过在非地面真相类中传播其分布质量, 从而通过在非地面真相类中传播其分配质量, 来惩罚网络产生过度自信的输出分布。 本文引入了一个名为 Pairwise Label 滑动( PLS) 的新标签滑动技术。 PLS 将一对样本作为输入。 与一对地面真相标签一起滑动, 使 PLS 能够保持两个真理标签之间的相对距离, 同时又进一步软化两个真理标签和其他目标之间的相对距离, 导致模型产生比 LS 战略更不那么自信的预测。 另外, 与 LS 方法不同, 目前的 LS 通常要求通过交叉校准搜索找到一个全球的平稳分布质量, PLS 自动学习每对输入的分布质量。 我们的经验显示, PLS 明显地超越了LS 和基线模型, 达到30% 的相对减少错误。 我们还显示, 当实现这样的精确度时会生成时, 。