Noisy labels are very common in deep supervised learning. Although many studies tend to improve the robustness of deep training for noisy labels, rare works focus on theoretically explaining the training behaviors of learning with noisily labeled data, which is a fundamental principle in understanding its generalization. In this draft, we study its two phenomena, clean data first and phase transition, by explaining them from a theoretical viewpoint. Specifically, we first show that in the first epoch training, the examples with clean labels will be learned first. We then show that after the learning from clean data stage, continuously training model can achieve further improvement in testing error when the rate of corrupted class labels is smaller than a certain threshold; otherwise, extensively training could lead to an increasing testing error.
翻译:在深层监督的学习中,噪音标签非常常见。虽然许多研究倾向于提高对噪音标签的深层培训的稳健性,但稀有的作品侧重于从理论上解释学习过程中的学习行为,用有噪音标签的数据来解释,这是理解其概括化的基本原则。在这份草案中,我们从理论角度来研究两种现象,即清洁数据第一和阶段过渡,我们从理论角度来解释它们。具体地说,我们首先显示,在第一个时代的培训中,清洁标签的例子将首先学习。我们然后表明,在从清洁数据阶段学习之后,当腐蚀类标签的速度小于某一阈值时,持续的培训模式可以在测试错误方面实现进一步的改进;否则,广泛的培训可能导致测试错误的增加。