A well-known issue of Batch Normalization is its significantly reduced effectiveness in the case of small mini-batch sizes. When a mini-batch contains few examples, the statistics upon which the normalization is defined cannot be reliably estimated from it during a training iteration. To address this problem, we present Cross-Iteration Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality. A challenge of computing statistics over multiple iterations is that the network activations from different iterations are not comparable to each other due to changes in network weights. We thus compensate for the network weight changes via a proposed technique based on Taylor polynomials, so that the statistics can be accurately estimated and batch normalization can be effectively applied. On object detection and image classification with small mini-batch sizes, CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique. Code is available at https://github.com/Howal/Cross-iterationBatchNorm .
翻译:众所周知的批量正常化问题是,在小型批量规模小的情况下,它的效果大大降低。当微型批量包含几个例子时,在培训迭代期间,无法可靠地从它那里估算确定正常化所依据的统计数据。为了解决这一问题,我们提出了交叉驱动批量正常化(CBN),其中从最近多次迭代中共同使用实例来提高估计质量。在多迭代中计算统计数据的一个难题是,由于网络重量的变化,不同迭代的网络激活无法相互比较。因此,我们通过基于泰勒多面体的拟议技术来补偿网络重量的变化,从而可以准确估计统计数据,并有效地应用批次正常化。在小型批量大小的物体探测和图像分类方面,CBNN发现,在不采用拟议补偿技术的情况下,CBN超越了原批次正常化,直接计算先前迭代数的统计。代码见https://github.com/Hhomal/Cross-iteriationBatchNorm。