Batch normalization (BN) is comprised of a normalization component followed by an affine transformation and has become essential for training deep neural networks. Standard initialization of each BN in a network sets the affine transformation scale and shift to 1 and 0, respectively. However, after training we have observed that these parameters do not alter much from their initialization. Furthermore, we have noticed that the normalization process can still yield overly large values, which is undesirable for training. We revisit the BN formulation and present a new initialization method and update approach for BN to address the aforementioned issues. Experimental results using the proposed alterations to BN show statistically significant performance gains in a variety of scenarios. The approach can be used with existing implementations at no additional computational cost. We also present a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods. Source code is available at https://github.com/osu-cvl/revisiting-bn.
翻译:Batch 正常化(BN)由正常化部分组成,随后是石蜡转化,对于培训深层神经网络至关重要。每个BN在网络中的标准初始化将分别设定石蜡转换规模和向1和0的转变。然而,在培训后,我们发现这些参数与初始化没有多大的改变。此外,我们注意到,正常化进程仍可能产生过大的价值,这对培训来说是不可取的。我们重新审议BN的提法,为BN提出新的初始化方法和更新方法,以解决上述问题。使用对BN的拟议修改的实验结果显示在各种情景中取得了具有统计意义的重大绩效收益。在不增加计算成本的情况下,现有实施中可以使用这一方法。我们还提供了一个新的基于BN的在线输入正常化数据技术,以缓解对其他离线或固定方法的需求。源码可在https://github.com/osu-cvl/revisiting-bn查阅。