Effective regularization techniques are highly desired in deep learning for alleviating overfitting and improving generalization. In this paper, we propose a new regularization scheme, based on the understanding that flat local minima of the empirical risk cause the model to generalize better. This scheme is referred to as adversarial model perturbation (AMP), where instead of directly minimizing the empirical risk, an alternative "AMP loss" is minimized via SGD. Specifically, the AMP loss is obtained from the empirical risk by applying the "worst" norm-bounded perturbation on each point in the parameter space. Comparing with most existing regularization schemes, AMP has strong theoretical justifications, in that minimizing the AMP loss can be shown theoretically to favour flat local minima of the empirical risk. Extensive experiments on various modern deep architectures establish AMP as a new state of the art among regularization schemes. Code is available at https://github.com/hiyouga/AMP-Regularizer.
翻译:在深层学习中,有效的正规化技术对于减轻过度适应和改进一般化极为可取。在本文件中,我们提议一项新的正规化计划,其依据是认识到当地经验风险的平板微粒使得模型更加普遍化。该计划被称为对抗性模型扰动(AMP),它不是直接将经验风险降到最低,而是通过SGD将替代的“AMP损失”降到最低。具体地说,AMP损失是通过在参数空间的每个点上应用“下限”规范的受规范干扰而从实证风险中获得的。与大多数现有的正规化计划相比,AMP具有很强的理论理由,因为从理论上讲,将AMP损失降到了对经验风险的平板块。关于各种现代深层结构的广泛实验将AMP作为正规化计划中的一种新状态,可在https://github.com/hiyouga/AMP-Regularizerer查阅《守则》。