Optimizing neural networks with noisy labels is a challenging task, especially if the label set contains real-world noise. Networks tend to generalize to reasonable patterns in the early training stages and overfit to specific details of noisy samples in the latter ones. We introduce Blind Knowledge Distillation - a novel teacher-student approach for learning with noisy labels by masking the ground truth related teacher output to filter out potentially corrupted knowledge and to estimate the tipping point from generalizing to overfitting. Based on this, we enable the estimation of noise in the training data with Otsus algorithm. With this estimation, we train the network with a modified weighted cross-entropy loss function. We show in our experiments that Blind Knowledge Distillation detects overfitting effectively during training and improves the detection of clean and noisy labels on the recently published CIFAR-N dataset. Code is available at GitHub.
翻译:优化带有噪音标签的神经网络是一项具有挑战性的任务,特别是如果标签集含有真实世界的噪音。网络倾向于在早期培训阶段采用合理的模式,在后一种阶段过度适应噪音样本的具体细节。我们引入了盲人知识蒸馏法,这是一种新颖的教师-学生方法,通过掩盖与地面真相有关的教师产出,用噪音标签学习,以过滤潜在的腐败知识,并估算从概括到过度配置的临界点。在此基础上,我们用Otsus算法对培训数据中的噪音进行了估计。根据这一估计,我们用经修改的加权跨作物损失功能对网络进行了培训。我们在实验中显示,盲人知识蒸馏法在培训期间检测到过量,并改进了最近出版的CIFAR-N数据集中清洁和噪音标签的检测。 GitHub 提供了代码。