Deep neural networks are often overparameterized and may not easily achieve model generalization. Adversarial training has shown effectiveness in improving generalization by regularizing the change of loss on top of adversarially chosen perturbations. The recently proposed sharpness-aware minimization (SAM) algorithm conducts adversarial weight perturbation, encouraging the model to converge to a flat minima. SAM finds a common adversarial weight perturbation per-batch. Although per-instance adversarial weight perturbations are stronger adversaries and can potentially lead to better generalization performance, their computational cost is very high and thus it is impossible to use per-instance perturbations efficiently in SAM. In this paper, we tackle this efficiency bottleneck and propose sharpness-aware minimization with dynamic reweighting (delta-SAM). Our theoretical analysis motivates that it is possible to approach the stronger, per-instance adversarial weight perturbations using reweighted per-batch weight perturbations. delta-SAM dynamically reweights perturbation within each batch according to the theoretically principled weighting factors, serving as a good approximation to per-instance perturbation. Experiments on various natural language understanding tasks demonstrate the effectiveness of delta-SAM.
翻译:深海内心网络往往被过分地过分测量,而且可能不易实现典型的概括化。反向培训通过在对抗性选择的扰动上使损失变化的改变正规化,表明在改进一般化方面的效力。最近提出的尖锐的觉醒最小化(SAM)算法是对抗性重量的振动,鼓励该模型与一个平坦的微粒相融合。SAM发现一种常见的对抗性重量按部就班地对齐。虽然每份的敌对性重量扰动是较强的对手,并有可能导致更好的一般化性能,但其计算成本非常高,因此无法在SAM高效地使用每份强度扰动干扰。在本文件中,我们处理这一效率瓶颈问题,提出以动态的再加权(delta-SAM)方式将敏锐度最小化。我们的理论分析表明,有可能利用每份重量重的重量再加权的对抗性重量来接近较强的对抗性重量对扰动。三角-SAM动态重的计算成本非常高,因此无法在每份内高效地使用每份上对质的精确度进行精确分析。