Batch normalization (BN) is comprised of a normalization component followed by an affine transformation and has become essential for training deep neural networks. Standard initialization of each BN in a network sets the affine transformation scale and shift to 1 and 0, respectively. However, after training we have observed that these parameters do not alter much from their initialization. Furthermore, we have noticed that the normalization process can still yield overly large values, which is undesirable for training. We revisit the BN formulation and present a new initialization method and update approach for BN to address the aforementioned issues. Experiments are designed to emphasize and demonstrate the positive influence of proper BN scale initialization on performance, and use rigorous statistical significance tests for evaluation. The approach can be used with existing implementations at no additional computational cost. Source code is available at https://github.com/osu-cvl/revisiting-bn-init.
翻译:Batch正常化(BN)由正常化部分组成,随后是草根转变,对于培训深层神经网络至关重要。每个BN在网络中的标准初始化分别设定了松变规模和向1和0的转变。但是,在培训后,我们发现这些参数与初始化没有多大的改变。此外,我们注意到,正常化进程仍可能产生过大的价值,这对培训来说是不可取的。我们重新考虑了BN的提法,为BN提出了新的初始化方法和更新方法,以解决上述问题。实验旨在强调和展示适当的BN规模初始化对业绩的积极影响,并使用严格的统计意义测试来进行评估。这种方法可以在不增加计算成本的情况下用于现有的实施工作。资料来源代码可在https://github.com/osu-cvl/revisiting-bn-init查阅。