Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling functions. As a result of homogeneity, functions specified by the networks are invariant to the shifting of weight scales between layers. The ineffective regularizers are sensitive to such shifting and thus poorly regularize the model capacity, leading to overfitting. To address this shortcoming, we propose an improved regularizer that is invariant to weight scale shifting and thus effectively constrains the intrinsic norm of a neural network. The derived regularizer is an upper bound for the input gradient of the network so minimizing the improved regularizer also benefits the adversarial robustness. Residual connections are also considered and we show that our regularizer also forms an upper bound to input gradients of such a residual network. We demonstrate the efficacy of our proposed regularizer on various datasets and neural network architectures at improving generalization and adversarial robustness.
翻译:使用重量腐蚀来惩罚神经网络中的L2重量规范,是规范网络复杂性的标准培训做法。在本文件中,我们表明,一个正规化者(包括重量腐蚀)组成的大家庭,在惩罚具有积极同质激活功能的网络的固有重量规范方面是无效的,这些功能包括线性、ReLU和最大集合功能。由于同质性,这些网络指定的功能与不同层次之间重量尺度的转移不相容。无效的正规化者对这种转移非常敏感,因而对模型能力不甚规范,导致过度适应。为解决这一缺陷,我们建议改进正规化器,使之不易变换,从而有效地限制神经网络的固有规范。衍生的正规化器是网络输入梯度的上限,从而最大限度地减少改良的正规化也有利于对抗性强势。还考虑了剩余连接,我们表明,我们的正规化器也形成了输入这种残余网络的梯度的上限。我们提出的正规化器在各种数据集和神经网络结构上展示了提高一般性和稳健性的效率。