Empirical works show that for ReLU neural networks (NNs) with small initialization, input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) condense on isolated orientations. The condensation dynamics implies that the training implicitly regularizes a NN towards one with a much smaller effective size. In this work, we illustrate the formation of the condensation in multi-layer fully connected NNs and show that the maximal number of condensed orientations in the initial training stage is twice the multiplicity of the activation function, where "multiplicity" indicates the multiple roots of activation function at origin. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one with arbitrary dimension input, which contains many common activation functions, and the other is for the layer with one-dimensional input and arbitrary multiplicity. This work makes a step towards understanding how small initialization leads NNs to condensation at the initial training stage.
翻译:实证工作表明,对于小初始化的ReLU神经网络(NNs)而言,隐性神经元(隐性神经元的输入重量包括从输入层到隐性神经元的重量及其偏差术语)的输入重量(隐性神经元的输入重量)浓缩到孤立的方向上。 凝结动态意味着培训隐含地将NNP规范到一个有效大小小得多的神经元网络。 在这项工作中,我们演示了多层完全连接的NNP的凝结形成,并显示初始培训阶段压缩方向的最大数量是激活功能的倍数,其中“ 多重” 表示启动功能在源头的多重根源。 我们的理论分析证实了两个案例的实验, 一个是包含许多共同激活功能的多重功能,另一个是具有一维特性的层,另一个是具有一维输入和任意多重的层。 这项工作迈出了一步,以了解小型初始化如何导致NN在初始培训阶段凝聚。