Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the perspective of variance transmission by investigating the relationship between BN and Weights Normalization (WN). In this work, we demonstrate that the problem of the shift of the average gradient will amplify the variance of every convolutional (conv) layer. We propose Parametric Weights Standardization (PWS), a fast and robust to mini-batch size module used for conv filters, to solve the shift of the average gradient. PWS can provide the speed-up of BN. Besides, it has less computation and does not change the output of a conv layer. PWS enables the network to converge fast without normalizing the outputs. This result enhances the persuasiveness of the shift of the average gradient and explains why BN works from the perspective of variance transmission. The code and appendix will be made available on https://github.com/lyxzzz/PWSConv.
翻译:我们试图从差异传输的角度来解释批次正常化(BN)的真正影响。 在这项工作中,我们证明平均梯度的转变问题将扩大每个卷变(Conv)层的差异。我们提议了参数重力标准化(PWS),这是用于控制过滤器的快速和坚固的微型批次大小模块,用于解决平均梯度的转变。PWS可以提供BN的加速。此外,它可以减少计算量,不会改变螺旋层的输出。PWS使网络在不使输出正常化的情况下能够快速趋同。这一结果增强了平均梯度变化的说服力,并解释了为什么BN从差异传输的角度开展工作。代码和附录将在 https://github.com/lyxzz/PWS Conv上公布。