Understanding the reasons for the success of deep neural networks trained using stochastic gradient-based methods is a key open problem for the nascent theory of deep learning. The types of data where these networks are most successful, such as images or sequences of speech, are characterised by intricate correlations. Yet, most theoretical work on neural networks does not explicitly model training data, or assumes that elements of each data sample are drawn independently from some factorised probability distribution. These approaches are thus by construction blind to the correlation structure of real-world data sets and their impact on learning in neural networks. Here, we introduce a generative model for structured data sets that we call the hidden manifold model (HMM). The idea is to construct high-dimensional inputs that lie on a lower-dimensional manifold, with labels that depend only on their position within this manifold, akin to a single layer decoder or generator in a generative adversarial network. We demonstrate that learning of the hidden manifold model is amenable to an analytical treatment by proving a "Gaussian Equivalence Property" (GEP), and we use the GEP to show how the dynamics of two-layer neural networks trained using one-pass stochastic gradient descent is captured by a set of integro-differential equations that track the performance of the network at all times. This permits us to analyse in detail how a neural network learns functions of increasing complexity during training, how its performance depends on its size and how it is impacted by parameters such as the learning rate or the dimension of the hidden manifold.
翻译:理解使用随机梯度方法培训的深层神经网络成功的原因,对于新生的深层学习理论来说,这是一个关键的开放问题。这些网络最成功的数据类型,例如图像或语音序列,具有复杂的关联性。然而,关于神经网络的大多数理论工作并不明确示范培训数据,或假设每个数据样本的元素是独立于某种因素化概率分布的。因此,这些方法无视真实世界数据集的关联结构及其对神经网络学习的影响。在这里,我们为结构化数据集引入了一个基因化模型,我们称之为隐藏的多维参数模型(HMMM ) 。其理念是构建高维投入,这种投入存在于低维度的多元上,其标签仅取决于它们在这个结构中的位置,类似于一个单层解码器或生成器,或假设每个数据样本样本的元素与某种因素性能分布分离的概率。我们证明,通过“Gaussian Equalvalence Propertyal” (GEP),我们使用GEP 来显示其深层化的网络的动态是如何通过一个不断的轨迹变的轨迹性网络的轨迹来分析其演化的轨迹。 我们用一个不断的轨迹学的网络的轨迹学的轨迹学的轨迹, 将它是如何通过一个变的轨迹变的轨迹变的轨迹变的轨迹学的轨迹变的轨迹变的轨迹学的轨迹学的轨迹, 。