Modern deep learning models with great expressive power can be trained to overfit the training data but still generalize well. This phenomenon is referred to as benign overfitting. Recently, a few studies have attempted to theoretically understand benign overfitting in neural networks. However, these works are either limited to neural networks with smooth activation functions or to the neural tangent kernel regime. How and when benign overfitting can occur in ReLU neural networks remains an open problem. In this work, we seek to answer this question by establishing algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise. We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk. Our result also reveals a sharp transition between benign and harmful overfitting under different conditions on data distribution in terms of test risk. Experiments on synthetic data back up our theory.
翻译:现代深层学习模式具有巨大的显性能,可以训练它们超额配置培训数据,但仍可以广泛推广。这一现象被称为良性超额配置。最近,一些研究试图从理论上理解神经网络中的良性超额配置。然而,这些工程要么局限于具有平稳激活功能的神经网络,要么局限于神经内核系统。ReLU神经网络中良性超额如何和何时发生仍是一个尚未解决的问题。在这项工作中,我们试图通过建立基于算法的风险界限来解决这一问题,学习带有标签反动噪音的两层ReLU神经网络。我们表明,在温和条件下,受梯度下降训练的神经网络可以达到近零培训损失和贝斯最佳测试风险。我们的结果还显示,在试验风险方面,在数据分配的不同条件下良性和有害性超标之间发生了急剧的转变。对合成数据进行实验可以支持我们的理论。</s>