Most adversarial attack defense methods rely on obfuscating gradients. These methods are successful in defending against gradient-based attacks; however, they are easily circumvented by attacks which either do not use the gradient or by attacks which approximate and use the corrected gradient. Defenses that do not obfuscate gradients such as adversarial training exist, but these approaches generally make assumptions about the attack such as its magnitude. We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack. Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images to find the class that generates the closest sample to the query image. We hypothesize that a potential source of brittleness against adversarial attacks is the high-to-low-dimensional nature of feed-forward classifiers which allows an adversary to find small perturbations in the input space that lead to large changes in the output space. On the other hand, a generative model is typically a low-to-high-dimensional mapping. While the method is related to Defense-GAN, the use of a conditional generative model and inversion in our model instead of the feed-forward classifier is a critical difference. Unlike Defense-GAN, which was shown to generate obfuscated gradients that are easily circumvented, we show that our method does not obfuscate gradients. We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks compared to naturally trained, feed-forward classifiers.
翻译:多数对抗性攻击防御方法依赖于模糊的梯度。 这些方法在防范基于梯度的攻击方面是成功的; 然而, 它们很容易被一些攻击所绕过, 这些攻击没有使用梯度, 也没有使用经更正的梯度。 存在一些没有混淆梯度的防御方法, 例如对抗性训练, 但是这些方法通常对攻击作出类似规模的假设。 我们建议了一个分类模型, 它不会混淆梯度, 并且通过建设而变得强大而无需事先假定对攻击的了解。 我们的方法将它分类为一个优化问题, 即我们“ 颠倒”一个有条件的、 接受过不透透透度的、 自然图像来找到产生与查询图像最接近的样本的类别。 我们假设, 对抗对抗对抗对抗对抗对抗对抗对抗对抗性攻击的梯度攻击的潜在易碎裂源是高到低维度的种子模型。 我们的种子化模型 显示的是, 我们的易腐蚀性攻击方法与我们 的变压性变压性模型 显示的是, 我们的变压性变压性模型, 我们的变压性模型显示的是, 我们的变压性的变的变式是 基的变式 。