Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the largest eigenvalues that ensures stability. Lastly, we prove that in the first training step, when normalization is needed most, BatchNorm satisfies the same optimization as Normalized Least Mean Square (NLMS), while it continues to approximate this condition in subsequent steps. The analyses provided in this paper lay the groundwork for gaining further insight into the operation of modern neural network structures using adaptive filter theory.
翻译:批量正常化( BatchNorm) 通常用于 进化神经网络( BatchNorm), 以提高培训速度和稳定性。 但是, 对于该技术为何有效, 仍然存在有限的共识。 本文使用传统适应过滤域的概念, 以深入了解批量Norm的动态和内运行。 首先, 我们显示, 批量重量更新具有自然模式, 其稳定性和趋同速度与输入自动关系矩阵的元值挂钩, 由BatchNorm通过进化层的频道结构加以控制。 此外, 我们的实验表明, 速度和稳定性的好处是截然不同的。 在低学习率下, 批量Norm对最小的精度值进行了放大, 提高了集速度和内运行。 在高学习率下, 批量Norm对最大电子元值的抑制可以确保稳定性。 最后, 我们证明, 在第一培训阶段, 由BatchNorm在最需要正常化时, 满足了与最常态最起码的平流层结构( NLMS) 相同的优化, 同时, 继续将这一理论推入后, 继续提供这一基础化的升级。