Autoencoders are the simplest neural network for unsupervised learning, and thus an ideal framework for studying feature learning. While a detailed understanding of the dynamics of linear autoencoders has recently been obtained, the study of non-linear autoencoders has been hindered by the technical difficulty of handling training data with non-trivial correlations - a fundamental prerequisite for feature extraction. Here, we study the dynamics of feature learning in non-linear, shallow autoencoders. We derive a set of asymptotically exact equations that describe the generalisation dynamics of autoencoders trained with stochastic gradient descent (SGD) in the limit of high-dimensional inputs. These equations reveal that autoencoders learn the leading principal components of their inputs sequentially. An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights, and highlights the importance of training the bias in ReLU autoencoders. Building on previous results for linear networks, we analyse a modification of the vanilla SGD algorithm which allows learning of the exact principal components. Finally, we show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets such as CIFAR10.
翻译:自动读取器是用于不受监督学习的最简单的神经网络,因此是学习特征学习的理想框架。虽然最近对线性自动读取器的动态有了详细的了解,但对非线性自动读取器的研究却由于以非三角关系处理培训数据的技术困难而受到了阻碍,而非线性非三角关系是地貌提取的一个基本先决条件。在这里,我们研究非线性、浅层自动读取器中特征学习的动态。我们得出了一系列不精确的方程式,描述在高维投入限度内受过随机梯度梯度下降训练的自动读取器(SGD)的一般动态。这些方程式表明,自动解析器按顺序学习其投入的主要组成部分。对长期动态的分析说明了Sigmod性自动读取器无法用捆绑的重量来学习的问题,并强调了在RELU自动解析器中培训偏向性的重要性。在线性网络以前的结果的基础上,我们分析了香草性SGD算法的修改,使得能够学习精确的原性主要动力学要素。最后,我们展示了不精确的IFAR10号等式数据。