Despite recent advancements, deep neural networks are not robust against adversarial perturbations. Many of the proposed adversarial defense approaches use computationally expensive training mechanisms that do not scale to complex real-world tasks such as semantic segmentation, and offer only marginal improvements. In addition, fundamental questions on the nature of adversarial perturbations and their relation to the network architecture are largely understudied. In this work, we study the adversarial problem from a frequency domain perspective. More specifically, we analyze discrete Fourier transform (DFT) spectra of several adversarial images and report two major findings: First, there exists a strong connection between a model architecture and the nature of adversarial perturbations that can be observed and addressed in the frequency domain. Second, the observed frequency patterns are largely image- and attack-type independent, which is important for the practical impact of any defense making use of such patterns. Motivated by these findings, we additionally propose an adversarial defense method based on the well-known Wiener filters that captures and suppresses adversarial frequencies in a data-driven manner. Our proposed method not only generalizes across unseen attacks but also beats five existing state-of-the-art methods across two models in a variety of attack settings.
翻译:尽管最近取得了一些进步,但深心神经网络对对抗性扰动并不强大。许多拟议的对抗性防御方法使用计算成本昂贵的培训机制,这些培训机制不及于复杂的现实世界任务,例如语义分割,只提供微小的改进。此外,关于对抗性扰动的性质及其与网络结构的关系的基本问题基本上没有得到充分研究。在这项工作中,我们从频率域的角度研究对抗性问题。更具体地说,我们分析若干对抗性图像的离散Fourier变形(DFT)光谱,并报告两个主要结论:首先,模型结构与在频率范围内可以观测和处理的对抗性扰动性质之间存在密切联系。第二,观察到的频率模式在很大程度上是图象和攻击型的,这对任何使用这种模式的防御的实际影响非常重要。受这些发现的影响,我们又根据广为人知的韦纳过滤器(DFT)分析一种对抗性防御方法,以数据驱动的方式捕捉和压制对抗性频率:首先,模型与在频率范围内可以观察到和处理的对抗性扰动性干扰性质有很强的联系。第二,我们提出的方法不仅在各种攻击模式中普遍地跨越了现有五种攻击模式。