Substantial work indicates that the dynamics of neural networks (NNs) is closely related to their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with infinite width (Luo et al., 2021), we make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width. First, we derive a normalized gradient flow for three-layer ReLU NNs and obtain two key independent quantities to distinguish different dynamical regimes for common initialization methods. With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical regime. The criteria is the relative change of input weights (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) as the width approaches infinity during the training, which tends to $0$, $+\infty$ and $O(1)$, respectively. In addition, we also demonstrate that different layers can lie in different dynamical regimes in a training process within a deep NN. In the condensed regime, we also observe the condensation of weights in isolated orientations with low complexity. Through experiments under three-layer condition, our phase diagram suggests a complicated dynamical regimes consisting of three possible regimes, together with their mixture, for deep NNs and provides a guidance for studying deep NNs in different initialization regimes, which reveals the possibility of completely different dynamics emerging within a deep NN for its different layers.
翻译:大量工作表明神经网络(NN)的动态与参数初始化密切相关。根据两层RELU NN(无限宽度的两层RELU NN(Luo等人,2021年)的阶段图,我们迈出了一步,为三层RELU NN(无限宽度的三层RELU NN)绘制了阶段图(Luo等人,2021年)。首先,我们为三层RELU NNN(三层ReLU NN)得出了一个正常的梯度流,并获得两个关键独立数量,以区分通用初始化方法的不同动态机制。经过精心设计的实验和深度计算成本,合成数据集和真实数据集的深度计算,我们发现每个层的动态动态系统也可以分成一个直线化和压缩的系统。 标准是输入权重的相对变化(隐性神经的输入权重包括从输入层到隐藏的神经元及其偏差期的重量),作为培训过程中的宽度方法,分别约为0美元、 美元和美元(1美元)。此外,我们还发现每个层的动态系统在三个深度的深度系统中显示一个不同的动态方向。我们通过一个不同层次的深度的深度系统,在不同的动态系统中的深度系统中显示一个不同的结构中, 。在不同的结构中, 。在不同的深度的深度的深度分析中,在不同的结构中可以显示一个不同的结构中, 。在不同的深度的深度的深度分析。在不同的结构中,在不同的结构中, 。