Supervised learning can be viewed as distilling relevant information from input data into feature representations. This process becomes difficult when supervision is noisy as the distilled information might not be relevant. In fact, recent research shows that networks can easily overfit all labels including those that are corrupted, and hence can hardly generalize to clean datasets. In this paper, we focus on the problem of learning with noisy labels and introduce compression inductive bias to network architectures to alleviate this over-fitting problem. More precisely, we revisit one classical regularization named Dropout and its variant Nested Dropout. Dropout can serve as a compression constraint for its feature dropping mechanism, while Nested Dropout further learns ordered feature representations w.r.t. feature importance. Moreover, the trained models with compression regularization are further combined with Co-teaching for performance boost. Theoretically, we conduct bias-variance decomposition of the objective function under compression regularization. We analyze it for both single model and Co-teaching. This decomposition provides three insights: (i) it shows that over-fitting is indeed an issue for learning with noisy labels; (ii) through an information bottleneck formulation, it explains why the proposed feature compression helps in combating label noise; (iii) it gives explanations on the performance boost brought by incorporating compression regularization into Co-teaching. Experiments show that our simple approach can have comparable or even better performance than the state-of-the-art methods on benchmarks with real-world label noise including Clothing1M and ANIMAL-10N. Our implementation is available at https://yingyichen-cyy.github.io/CompressFeatNoisyLabels/.
翻译:监督性学习可被视为从输入数据中蒸馏到特性表达中的相关信息。 当监管非常吵时, 这一过程变得很困难, 因为蒸馏信息可能无关紧要。 事实上, 最近的研究显示, 网络可以很容易地超额安装所有标签, 包括那些腐败的标签, 因而几乎无法泛泛地清理数据集 。 在本文中, 我们关注的是使用噪音标签学习的问题, 并在网络架构中引入压缩感化偏差, 以减轻这个过分适应的问题 。 更准确地说, 我们重新审视一个古典的正规化, 名为“ 下降 ” 及其变异的“ 内斯泰德退出 ” 。 辍学可以作为其特性丢弃机制的压缩限制, 而 Nested 丢弃者进一步学习了命令的特征表达 w.r. t. 的特性重要性。 此外, 精练过精的简化性格规范模型会进一步结合“ 提高性能” 。 从理论上讲, 我们对目标功能进行偏差偏差分解配置。 我们为单一的模型和共同教学方法分析。 这种解提供了三种理解 。 这种分解会提供三种洞见 :(i) 它表明, 过度性调整确实化确实性能确实性能是一个问题, 以学习更难测修问题,,, 将它被引入的校正化的校正缩化的校正解释,,, 通过, 的校正化的校正解释。