Batch Normalization has become one of the essential components in CNN. It allows the network to use a higher learning rate and speed up training. And the network doesn't need to be initialized carefully. However, in our work, we find that a simple extension of BN can increase the performance of the network. First, we extend BN to adaptively generate scale and shift parameters for each mini-batch data, called DN-C (Batch-shared and Channel-wise). We use the statistical characteristics of mini-batch data ($E[X], Std[X]\in\mathbb{R}^{c}$) as the input of SC module. Then we extend BN to adaptively generate scale and shift parameters for each channel of each sample, called DN-B (Batch and Channel-wise). Our experiments show that DN-C model can't train normally, but DN-B model has very good robustness. In classification task, DN-B can improve the accuracy of the MobileNetV2 on ImageNet-100 more than 2% with only 0.6% additional Mult-Adds. In detection task, DN-B can improve the accuracy of the SSDLite on MS-COCO nearly 4% mAP with the same settings. Compared with BN, DN-B has stable performance when using higher learning rate or smaller batch size.
翻译:批量正常化已经成为CNN的基本组件之一。 它允许网络使用高学习率和加速培训。 网络不需要仔细初始化。 然而, 我们在工作中发现, 简单BN的扩展可以提高网络的性能。 首先, 我们将BN扩大到每个微型批量数据( 称为 DN- C ( 批量共享 和通道- 通道- ) 的适应性生成比例和转换参数。 在分类任务中, 我们使用小批量数据的统计特性( E[ X], Std[ X]\in\mathbb{ R ⁇ c}$) 作为SC 模块的输入。 然后, 我们扩展 BN 以适应方式生成每个样本的大小和转换参数, 称为 DN- B( 批量和频道- ) 。 我们的实验显示, DN- C 模型不能正常地进行培训, 但是 DN- B 模型非常坚固。 在分类任务中, DN- B 能够提高图像网- 100 的准确性, 仅增加0.6% Mult- Adds。 在测试中, 的 SSD- MIB 的精确度中, 的 级学习率可以提高 SSD- B- B- 的比例。