In this paper, we analyze batch normalization from the perspective of discriminability and find the disadvantages ignored by previous studies: the difference in $l_2$ norms of sample features can hinder batch normalization from obtaining more distinguished inter-class features and more compact intra-class features. To address this issue, we propose a simple yet effective method to equalize the $l_2$ norms of sample features. Concretely, we $l_2$-normalize each sample feature before feeding them into batch normalization, and therefore the features are of the same magnitude. Since the proposed method combines the $l_2$ normalization and batch normalization, we name our method $L_2$BN. The $L_2$BN can strengthen the compactness of intra-class features and enlarge the discrepancy of inter-class features. The $L_2$BN is easy to implement and can exert its effect without any additional parameters or hyper-parameters. We evaluate the effectiveness of $L_2$BN through extensive experiments with various models on image classification and acoustic scene classification tasks. The results demonstrate that the $L_2$BN can boost the generalization ability of various neural network models and achieve considerable performance improvements.
翻译:在本文中,我们从可分辨性的角度分析了批归一化,并发现之前的研究忽略了一些缺点:样本特征的$L_2$范数差异可能会妨碍批归一化获取更明显的类间特征和更紧凑的类内特征。为了解决这个问题,我们提出了一种简单而有效的方法,在输入批归一化之前等式化样本特征的$L_2$范数。具体地,我们将每个样本特征进行$L_2$归一化,因此特征具有相同的量级。由于所提出的方法是将$L_2$归一化和批归一化相结合,因此我们将方法命名为$L_2$BN。$L_2$BN 可以增强类内特征的紧凑性并放大类间特征的差异性。$L_2$BN 易于实现,并且可以在没有任何额外参数或超参数的情况下发挥作用。我们通过在图像分类和声学场景分类任务上对各种模型进行广泛实验来评估$L_2$BN的有效性。结果表明,$L_2$BN 可以增强各种神经网络模型的泛化能力并实现明显的性能提升。