Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems --- BN's error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN's usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN's computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet-50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pre-training to fine-tuning. GN can outperform or compete with its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries.
翻译:批量正常化(BN)是发展深层次学习的一个里程碑技术,使各种网络能够培训。但是,批量的正常化带来了问题 -- -- 批量规模缩小后,由于批量统计估计不准确,BN的错误会迅速增加。这限制了BN用于培训大型模型和将特征转移到计算机愿景任务,包括检测、分解和视频,这需要小批量受内存消耗制约的计算机愿景任务。在本文件中,我们提出组级正常化(GN)作为BN的简单替代方案。GN将渠道分成一组,并在每个组内部对正常化的平均值和差异进行计算。GN的计算独立于批量规模,其准确性在批量规模上保持稳定。在图像网培训的ResNet-50中,GN的计算使用批量尺寸比BN的对口差10.6%;在使用典型的批量尺寸时,GN可以比B好,GN更容易地将GN的渠道分数分成。此外,GN可以自然地从培训前转移到精细的版本。GNGN可以有效地显示其测试和B级的升级的比对等点,在测试中,可以有效地显示其升级的CNNGNB分类。