Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this paper, we propose a new method called Batch Normalization Preconditioning (BNP). Instead of applying normalization explicitly through a batch normalization layer as is done in BN, BNP applies normalization by conditioning the parameter gradients directly during training. This is designed to improve the Hessian matrix of the loss function and hence convergence during training. One benefit is that BNP is not constrained on the mini-batch size and works in the online learning setting. Furthermore, its connection to BN provides theoretical insights on how BN improves training and how BN is applied to special architectures such as convolutional neural networks.
翻译:批量正常化(BN)是深层学习中一种普遍流行的常见方法,它证明可以减少培训时间,改善神经网络的通用性能。尽管在理论上并不十分理解,但BN在理论上并不十分清楚。它不适合使用非常小的小批量尺寸或在线学习。在本文中,我们提出了一种叫做批量正常化预设(BNP)的新方法。BNP不是像在BN那样明确通过批次正常化层来应用正常化,而是在培训期间直接调整参数梯度来应用正常化。这是为了改进损失函数的赫森矩阵,从而在培训期间实现趋同。一个好处是,BNP不受小批量规模的限制,在网上学习环境中工作。此外,它与BN的连接提供了理论见解,说明BN如何改进培训,BN如何应用到特殊结构,例如革命神经网络。