Batch Normalization (BN) is a popular technique for training Deep Neural Networks (DNNs). BN uses scaling and shifting to normalize activations of mini-batches to accelerate convergence and improve generalization. The recently proposed Iterative Normalization (IterNorm) method improves these properties by whitening the activations iteratively using Newton's method. However, since Newton's method initializes the whitening matrix independently at each training step, no information is shared between consecutive steps. In this work, instead of exact computation of whitening matrix at each time step, we estimate it gradually during training in an online fashion, using our proposed Stochastic Whitening Batch Normalization (SWBN) algorithm. We show that while SWBN improves the convergence rate and generalization of DNNs, its computational overhead is less than that of IterNorm. Due to the high efficiency of the proposed method, it can be easily employed in most DNN architectures with a large number of layers. We provide comprehensive experiments and comparisons between BN, IterNorm, and SWBN layers to demonstrate the effectiveness of the proposed technique in conventional (many-shot) image classification and few-shot classification tasks.
翻译:普通化(BN)是培训深神经网络(DNN)的流行技术。BN使用缩放和转换,将微型插管的启动正常化,以加速趋同,改进一般化。最近提议的超常化(IterNorm)方法通过使用牛顿方法使激活的白化率(IterNorm)法更新这些属性。然而,由于牛顿的方法在每一培训步骤中独立初始化白化矩阵,因此在每一培训步骤中,连续步骤之间不能共享任何信息。在这项工作中,除了准确计算每个步骤的白化矩阵之外,我们使用我们提议的Stochastartic白批正常化(SWBNBN)算法(SWBN)算法(IterNorm)法(IterNorm),SWBNBN(SWB、ILERNORM和SWBNBS)级培训中,我们利用我们提议的SWBN、SAL-SAL-S、SL II 和SAL II 和S-SG II AL 和 AL AL 等 等 等 和SAL 和S 等 等 等 等 等 等 等 等 等 等 等 和 等 等 等 等 等 等 等 等 等 等 、 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等 等