Deep neural networks are incredibly vulnerable to crafted, human-imperceptible adversarial perturbations. Although adversarial training (AT) has proven to be an effective defense approach, we find that the AT-trained models heavily rely on the input low-frequency content for judgment, accounting for the low standard accuracy. To close the large gap between the standard and robust accuracies during AT, we investigate the frequency difference between clean and adversarial inputs, and propose a frequency regularization (FR) to align the output difference in the spectral domain. Besides, we find Stochastic Weight Averaging (SWA), by smoothing the kernels over epochs, further improves the robustness. Among various defense schemes, our method achieves the strongest robustness against attacks by PGD-20, C\&W and Autoattack, on a WideResNet trained on CIFAR-10 without any extra data.
翻译:深神经网络极易受到人造的、人类无法察觉的对抗性扰动。 尽管对抗性培训(AT)已证明是一种有效的防御方法,但我们发现,AT培训模式在很大程度上依赖低频输入内容来判断,这是低标准精确度的原因。为了缩小AT期间标准与强力理解之间的巨大差距,我们调查了清洁和对抗性输入之间的频率差异,并提议了频率规范化(FR)以协调光谱域内的输出差异。此外,我们发现Stochatic Weight Averabiging(SWA),通过在时代之间平滑内核,进一步提高了稳健性。 在各种防御计划中,我们的方法在没有额外数据的情况下,在CZW和Autoffical Net培训的CIFAR-10号广域网上,实现了对PGD-20、CZW和Autreat的攻击最强烈的抵抗力。