Deep convolutional neural networks are susceptible to adversarial attacks. They can be easily deceived to give an incorrect output by adding a tiny perturbation to the input. This presents a great challenge in making CNNs robust against such attacks. An influx of new defense techniques have been proposed to this end. In this paper, we show that latent features in certain "robust" models are surprisingly susceptible to adversarial attacks. On top of this, we introduce a unified $\ell_\infty$-norm white-box attack algorithm which harnesses latent features in its gradient descent steps, namely LAFEAT. We show that not only is it computationally much more efficient for successful attacks, but it is also a stronger adversary than the current state-of-the-art across a wide range of defense mechanisms. This suggests that model robustness could be contingent on the effective use of the defender's hidden components, and it should no longer be viewed from a holistic perspective.
翻译:深相神经网络很容易受到对抗性攻击。 它们很容易被欺骗, 给输入输入增加微小的干扰, 从而产生不正确的输出。 这在使CNN对此类攻击变得强大起来方面是一个巨大的挑战。 已经为此提出了新的防御技术。 在本文中, 我们显示某些“ robust” 模型的潜在特征令人惊讶地容易受到对抗性攻击。 此外, 我们引入了一个统一的 $\ ell\ inty$- norm 白箱攻击算法, 它将其梯度下降步骤, 即 LAFEAT 中的潜在特征引领起来。 我们显示它不仅计算成功攻击的效率更高, 而且还比当前各种防御机制的状态更强。 这意味着模型的稳健性可能取决于对维权者的隐藏部件的有效使用, 并且不应该再从整体的角度来看待它。