We introduce a new technique for gradient normalization during neural network training. The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture. These normalization nodes do not affect forward activity propagation, but modify backpropagation equations to permit a well-scaled gradient flow that reaches the deepest network layers without experimenting vanishing or explosion. Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm, allowing the update of weights in the deepest layers and improving network accuracy on several experimental conditions.
翻译:在神经网络培训期间,我们引入了一种梯度正常化的新技术。 梯度在后空转期间使用在网络架构内某些地方引入的正常化层进行重新定级。 这些正常化节点并不影响前方活动的传播,而是修改后方对等方程式,允许在不试验消失或爆炸的情况下进入最深层网络层的宽度梯度流动。 与非常深的神经网络进行的测试结果表明,新的技术可以有效控制梯度规范,允许更新最深层的权重,提高若干实验条件下的网络准确性。