Batch normalization (BN) allows training very deep networks by normalizing activations by mini-batch sample statistics which renders BN unstable for small batch sizes. Current small-batch solutions such as Instance Norm, Layer Norm, and Group Norm use channel statistics which can be computed even for a single sample. Such methods are less stable than BN as they critically depend on the statistics of a single input sample. To address this problem, we propose a normalization of activation without sample statistics. We present WeightAlign: a method that normalizes the weights by the mean and scaled standard derivation computed within a filter, which normalizes activations without computing any sample statistics. Our proposed method is independent of batch size and stable over a wide range of batch sizes. Because weight statistics are orthogonal to sample statistics, we can directly combine WeightAlign with any method for activation normalization. We experimentally demonstrate these benefits for classification on CIFAR-10, CIFAR-100, ImageNet, for semantic segmentation on PASCAL VOC 2012 and for domain adaptation on Office-31.
翻译:批量正常化(BN) 通过小型批量抽样统计的正常化,使得BN在小批量尺寸上不稳定,从而可以对非常深的网络进行培训。 目前的小批量解决方案,如Norm、Temple Norm和GroupNorm, 使用甚至可以计算单一样本的批量统计。这些方法比BN不稳定,因为它们严重依赖单一输入样本的统计。为了解决这个问题,我们提议在没有抽样统计的情况下使激活正常化。我们提出了 WeightAlign:一种通过在过滤器中计算的平均和规模标准衍生物使重量正常化的方法,在不计算任何样本统计数据的情况下,使启动正常化。我们提出的方法独立于批量规模,并且稳定在广泛的批量尺寸上。由于重量统计与抽样统计不完全一致,我们可以直接将WeightAlign与任何激活正常化方法结合起来。我们实验性地展示了这些效益,用于对CFAR-10、CIFAR-100、图像网络进行分类,用于对PASAL VOC 2012 和O-31的域进行精度调整。