Deep neural networks (DNNs) in the infinite width/channel limit have received much attention recently, as they provide a clear analytical window to deep learning via mappings to Gaussian Processes (GPs). Despite its theoretical appeal, this viewpoint lacks a crucial ingredient of deep learning in finite DNNs, laying at the heart of their success -- feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects. Applying this to a toy model of a two-layer linear convolutional neural network (CNN) shows good agreement with experiments. We further identify, both analytical and numerically, a sharp transition between a feature learning regime and a lazy learning regime in this model. Strong finite-DNN effects are also derived for a non-linear two-layer fully connected network. Our self consistent theory provides a rich and versatile analytical framework for studying feature learning and other non-lazy effects in finite DNNs.
翻译:无限宽度/通道限制的深神经网络(DNNs)最近受到极大关注,因为它们为通过绘制高斯进程图进行深层次学习提供了一个清晰的分析窗口。尽管这种观点具有理论吸引力,但它缺乏在有限的DNS深层学习的关键内容,处于其成功的核心 -- -- 特征学习。在这里,我们认为在大型培训组中接受过高斯过程理论培训的音频梯度下降的DNNs在大型培训组上得到了大量培训,并得出了一种自相一致的高斯过程理论,其中考虑到强大的有限DNN和特征学习效果。将这一理论应用到两层线线线性神经网络(CNN)的玩具模型中,显示了与实验的良好一致。我们进一步从分析上和数字上确定,特性学习制度与这一模型中的懒惰学习制度之间有着深刻的转变。强势的有限-DNNN效应还衍生出一个非线性双层网络。我们的自我一致理论为研究有限DNPS的特征学习和其他非锁效应提供了丰富和多功能分析框架。