Self-supervised Contrastive Learning (CL) has been recently shown to be very effective in preventing deep networks from overfitting noisy labels. Despite its empirical success, the theoretical understanding of the effect of contrastive learning on boosting robustness is very limited. In this work, we rigorously prove that the representation matrix learned by contrastive learning boosts robustness, by having: (i) one prominent singular value corresponding to each sub-class in the data, and significantly smaller remaining singular values; and (ii) {a large alignment between the prominent singular vectors and the clean labels of each sub-class. The above properties enable a linear layer trained on such representations to effectively learn the clean labels without overfitting the noise.} We further show that the low-rank structure of the Jacobian of deep networks pre-trained with contrastive learning allows them to achieve a superior performance initially, when fine-tuned on noisy labels. Finally, we demonstrate that the initial robustness provided by contrastive learning enables robust training methods to achieve state-of-the-art performance under extreme noise levels, e.g., an average of 27.18\% and 15.58\% increase in accuracy on CIFAR-10 and CIFAR-100 with 80\% symmetric noisy labels, and 4.11\% increase in accuracy on WebVision.
翻译:最近显示,自我监督的反竞争学习(CL)在防止深层次网络过度安装噪音标签方面非常有效。尽管取得了经验性的成功,但对于对比学习对增强稳健性的影响的理论理解非常有限。在这项工作中,我们严格地证明,通过对比学习学习所学的代表性矩阵增强强健性,其方法是:(一) 数据中每个小类对应的一个突出的单一值,其余的单数值要小得多;(二) 突出的单向矢量和每个小类的清洁标签之间有很大的匹配。上述属性使得经过这种展示培训的线性层能够有效学习清洁标签,而不会过度适应噪音。 }我们进一步表明,经过对比学习培训的深层次网络的雅各科比人低级结构,使得他们在最初能够取得优异性的表现,因为对吵闹的标签进行微调调整。 最后,我们证明通过对比学习所提供的初始强度使得强有力的培训方法能够使在极端噪音水平下实现状态的状态性表现,例如,平均27.18__和15.58_BAR的准确度提高了CRAS-10和CRIS-10的准确度。