在两层革命神经网络中进行渗透改造 (Benign Overfitting in Two-layer Convolutional Neural Networks)

Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there is still a lack of theoretical understanding about when and how benign overfitting occurs in neural networks. In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve a constant level test loss. These together demonstrate a sharp phase transition between benign overfitting and harmful overfitting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the first work that precisely characterizes the conditions under which benign overfitting can occur in training convolutional neural networks.

翻译：现代神经网络通常具有巨大的表达力,并且可以接受培训,以过度配置培训数据,同时仍能取得良好的测试性能。这一现象被称为“隐性过度装饰 ” 。最近,出现了从理论角度研究“ 隐性过度装饰” 特征模型的一连串工作。但是,它们局限于线性模型或内核/内核/内核特质模型,对于神经网络何时和如何发生良性过度装饰仍然缺乏理论理解。在本文中,我们在培训两层革命性神经网络(CNN)时研究适中过度装现象。我们发现,当信号-噪音比率达到一定条件时,由梯度下降所训练的两层CNN可以实现任意的小型培训和测试损失。另一方面,如果这种条件不能维持下去,过度装饰就会变得有害,获得的CNNC只能实现持续水平测试性损失。这些共同表明,在良性过度装饰和有害性过度配配配配制之间,在信号-噪音比率驱动下发生了一个尖锐的阶段的转变。我们最了解的是,这是在培训中准确界定良性超度网络的条件的首项工作。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日