We investigate the reasons for the performance degradation incurred with batch-independent normalization. We find that the prototypical techniques of layer normalization and instance normalization both induce the appearance of failure modes in the neural network's pre-activations: (i) layer normalization induces a collapse towards channel-wise constant functions; (ii) instance normalization induces a lack of variability in instance statistics, symptomatic of an alteration of the expressivity. To alleviate failure mode (i) without aggravating failure mode (ii), we introduce the technique "Proxy Normalization" that normalizes post-activations using a proxy distribution. When combined with layer normalization or group normalization, this batch-independent normalization emulates batch normalization's behavior and consistently matches or exceeds its performance.
翻译:我们调查了分批独立化导致性能退化的原因。我们发现,原型的层级正常化和例例正常化技术都导致神经网络的预激活中出现故障模式:(一) 层正常化导致向频道常态功能的崩溃;(二) 例正常化导致在实例统计数据中缺乏差异性,表现为表达力的改变。为了缓解故障模式 (一) 在不加重故障模式 (二) 的情况下,我们引入了“质正常化”技术,利用代理分配实现活动后活动正常化。在与层正常化或群体正常化相结合的情况下,这种分批独立的正常化模仿了分批正常化的行为,并始终匹配或超过其性能。