Gradient regularization, as described in \citet{barrett2021implicit}, is a highly effective technique for promoting flat minima during gradient descent. Empirical evidence suggests that this regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations, while also reducing test error. In this paper, we explore the per-example gradient regularization (PEGR) and present a theoretical analysis that demonstrates its effectiveness in improving both test error and robustness against noise perturbations. Specifically, we adopt a signal-noise data model from \citet{cao2022benign} and show that PEGR can learn signals effectively while suppressing noise. In contrast, standard gradient descent struggles to distinguish the signal from the noise, leading to suboptimal generalization performance. Our analysis reveals that PEGR penalizes the variance of pattern learning, thus effectively suppressing the memorization of noises from the training data. These findings underscore the importance of variance control in deep learning training and offer useful insights for developing more effective training approaches.
翻译:梯度正则化,如\citet{barrett2021implicit}所述,是促进梯度下降过程中平坦最小值的高效技术。实证证据表明,这种正则化技术可以显著提高深度学习模型对嘈杂扰动的鲁棒性,同时降低测试误差。在本文中,我们探索了每个样本梯度正则化(PEGR),并提出了一种理论分析,证明了它在提高测试误差和抵抗噪声扰动方面的有效性。具体而言,我们采用了\citet{cao2022benign}的信号噪声数据模型,并显示PEGR可以有效地学习信号并抑制噪声。相反,标准梯度下降难以区分信号和噪声,导致次优的泛化性能。我们的分析揭示了PEGR惩罚模式学习的方差,从而有效地抑制了对训练数据中噪声的记忆化。这些发现强调了深度学习训练中方差控制的重要性,并为开发更有效的训练方法提供了有用的见解。