Implicit regularization is important for understanding the learning of neural networks (NNs). Empirical works show that input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) condense on isolated orientations with a small initialization. The condensation dynamics implies that the training implicitly regularizes a NN towards one with much smaller effective size. In this work, we utilize multilayer networks to show that the maximal number of condensed orientations in the initial training stage is twice the multiplicity of the activation function, where "multiplicity" is multiple roots of activation function at origin. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one with arbitrary dimension input, which contains many common activation functions, and the other is for the layer with one-dimensional input and arbitrary multiplicity. This work makes a step towards understanding how small initialization implicitly leads NNs to condensation at initial training stage, which lays a foundation for the future study of the nonlinear dynamics of NNs and its implicit regularization effect at a later stage of training.
翻译:隐含的正规化对于理解神经网络(NNs)的学习很重要。 经验性的工作表明,隐性神经元(隐性神经元的输入重量包括从输入层到隐性神经元的重量及其偏差术语)的输入重量(隐含的神经元的输入重量)通过一个小的初始初始化方向的偏向偏向偏向偏向偏向偏向偏向偏向偏向偏向,这意味着培训隐含地将NNN变成一个小得多的有效规模。在这项工作中,我们利用多层网络来显示初始培训阶段压缩方向的最大数量是激活功能的两倍,“多重性”是源头激活功能的多重根。我们的理论分析证实了两个案例的实验,一个是具有任意尺寸输入的多重激活功能,其中含有许多共同的激活功能,另一个是具有一维向一层的。在初始培训阶段,这为理解小型初始化如何隐含地引导非线性动态的未来研究及其在后一个培训阶段的隐含正规化效果奠定了基础。