Adversarial training has been considered an imperative component for safely deploying neural network-based applications to the real world. To achieve stronger robustness, existing methods primarily focus on how to generate strong attacks by increasing the number of update steps, regularizing the models with the smoothed loss function, and injecting the randomness into the attack. Instead, we analyze the behavior of adversarial training through the lens of response frequency. We empirically discover that adversarial training causes neural networks to have low convergence to high-frequency information, resulting in highly oscillated predictions near each data. To learn high-frequency contents efficiently and effectively, we first prove that a universal phenomenon of frequency principle, i.e., \textit{lower frequencies are learned first}, still holds in adversarial training. Based on that, we propose phase-shifted adversarial training (PhaseAT) in which the model learns high-frequency components by shifting these frequencies to the low-frequency range where the fast convergence occurs. For evaluations, we conduct the experiments on CIFAR-10 and ImageNet with the adaptive attack carefully designed for reliable evaluation. Comprehensive results show that PhaseAT significantly improves the convergence for high-frequency information. This results in improved adversarial robustness by enabling the model to have smoothed predictions near each data.
翻译:Aversari 培训被认为是安全向现实世界部署神经网络应用程序的必要组成部分。为了实现更强的稳健性,现有方法主要侧重于如何通过增加更新步骤的数量、将模型与平滑损失功能正规化和将随机性注入攻击中来产生强烈攻击。相反,我们通过反应频率的镜头分析对抗性培训的行为。我们从经验中发现,对抗性培训导致神经网络与高频信息的趋同程度较低,导致在每数据附近进行高度振荡的预测。为了高效率和高效力地学习高频内容,我们首先证明频率原则的普遍现象,即首先学习到\ textit{低频率},仍然在对抗性培训中进行。在此基础上,我们提议通过将这些频率移到快速趋同的低频范围来学习高频组成部分。我们进行了CIRF-10和图像网的实验,并仔细设计了适应性攻击以进行可靠的评价。全面结果显示,每阶段对准性培训将极大地改进高频数据的趋同性。