Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers (where the alignment is defined as the euclidean product of the tangent features matrix and the data labels matrix). The curve of the alignment as a function of layer index (generally) exhibits an ascent-descent pattern where the maximum is reached for some hidden layer. In this work, we provide the first explanation for this phenomenon. We introduce the Equilibrium Hypothesis which connects this alignment pattern to signal propagation in deep neural networks. Our experiments demonstrate an excellent match with the theoretical predictions.
翻译:Baratin等人(2021年)最近的工作揭示了在深神经网络培训期间出现的一种令人感兴趣的模式:一些层与其他层(即对齐被定义为正切特征矩阵和数据标签矩阵的优clidean产物)相比,与其他层(即对齐被定义为正切特征矩阵和数据标签矩阵的euclidean产物)相比,与其他层(即对齐被定义为正切特征矩阵和数据标签矩阵的euclidean产物)相比,对齐的曲线作为层指数的函数(一般)的曲线显示出一种峰值模式,对于某些隐性层达到最大值。在这项工作中,我们为这一现象提供了第一个解释。我们引入了将这种对齐模式连接到深神经网络中信号传播的平衡质。我们的实验显示与理论预测非常匹配。