Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Specifically, we model the label noise via another sparse over-parameterization term, and exploit implicit algorithmic regularizations to recover and separate the underlying corruptions. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets. Furthermore, our experimental results are corroborated by theory on simplified linear models, showing that exact separation between sparse noise and low-rank data can be achieved under incoherent conditions. The work opens many interesting directions for improving over-parameterized models by using sparse over-parameterization and implicit regularization.
翻译:最近,过度依赖的深层网络,其网络参数比培训样本越来越多,而且日益超出网络参数,在现代机器学习的绩效中占主导地位。然而,当培训数据被腐蚀时,众所周知,过度依赖的网络往往过于完善,而且没有普遍化。在这项工作中,我们提出了在分类任务方面对过度依赖的深层网络进行严格培训的原则性办法,因为在分类任务中,有一定比例的培训标签被腐蚀。主要想法仍然非常简单:标签噪音稀少,与从清洁数据中学习的网络不相容,因此我们模拟噪音,并学习将其与数据分开。具体地说,我们通过另一个稀有的超标度术语来模拟标签噪音,并利用隐含的算法规范来恢复和分离潜在的腐败。在实际中,我们用这种简单的方法培训时,我们展示了对各种真实数据集标签噪音的最先进的测试准确性。此外,关于简化线性模型的理论也证实了我们的实验结果,表明,稀有噪音和低级数据之间的精确分离可以在不连贯的条件下实现。具体分离,我们利用不相容的模型,从而打开许多令人感兴趣的方向。