Deep neural networks are often overparameterized and may not easily achieve model generalization. Adversarial training has shown effectiveness in improving generalization by regularizing the change of loss on top of adversarially chosen perturbations. The recently proposed sharpness-aware minimization (SAM) algorithm conducts adversarial weight perturbation, encouraging the model to converge to a flat minima. SAM finds a common adversarial weight perturbation per-batch. Although per-instance adversarial weight perturbations are stronger adversaries and they can potentially lead to better generalization performance, their computational cost is very high and thus it is impossible to use per-instance perturbations efficiently in SAM. In this paper, we tackle this efficiency bottleneck and propose sharpness-aware minimization with dynamic reweighting ({\delta}-SAM). Our theoretical analysis motivates that it is possible to approach the stronger, per-instance adversarial weight perturbations using reweighted per-batch weight perturbations. {\delta}-SAM dynamically reweights perturbation within each batch according to the theoretically principled weighting factors, serving as a good approximation to per-instance perturbation. Experiments on various natural language understanding tasks demonstrate the effectiveness of {\delta}-SAM.
翻译:深心神经网络往往被过分地夸大,可能不易实现典型的概括化。 反向培训通过在对抗性选择的扰动上将损失变化常规化,表明在改进一般化方面的效力。 最近提议的敏锐觉知觉最小化(SAM)算法是对抗性重量的振动,鼓励模型向一个平坦的微缩。 SAM发现一个常见的对抗性重量对称扰动每批。 虽然对称性对称重量扰动是较强的对手,而且它们有可能导致更好的一般化性能,但其计算成本非常高,因此无法在SAM高效地使用每强度扰动干扰。 在本文中,我们处理这种效率瓶颈,提出以动态再加权( thdelta}-SAM) 最小化敏锐觉悟性最小化。 我们的理论分析表明,有可能利用重新加权的重量来接近较强的对称称的对称重的对称重的对称重量,它们可能会导致更好的超重性性性能。 {TAND - 动态分析 无法在每份上有效地使用一次对称性对正统性调整的逻辑性调整, 。