Deep learning models are known to be vulnerable not only to input-dependent adversarial attacks but also to input-agnostic or universal adversarial attacks. Dezfooli et al. \cite{Dezfooli17,Dezfooli17anal} construct universal adversarial attack on a given model by looking at a large number of training data points and the geometry of the decision boundary near them. Subsequent work \cite{Khrulkov18} constructs universal attack by looking only at test examples and intermediate layers of the given model. In this paper, we propose a simple universalization technique to take any input-dependent adversarial attack and construct a universal attack by only looking at very few adversarial test examples. We do not require details of the given model and have negligible computational overhead for universalization. We theoretically justify our universalization technique by a spectral property common to many input-dependent adversarial perturbations, e.g., gradients, Fast Gradient Sign Method (FGSM) and DeepFool. Using matrix concentration inequalities and spectral perturbation bounds, we show that the top singular vector of input-dependent adversarial directions on a small test sample gives an effective and simple universal adversarial attack. For VGG16 and VGG19 models trained on ImageNet, our simple universalization of Gradient, FGSM, and DeepFool perturbations using a test sample of 64 images gives fooling rates comparable to state-of-the-art universal attacks \cite{Dezfooli17,Khrulkov18} for reasonable norms of perturbation. Code available at https://github.com/ksandeshk/svd-uap .
翻译:深层次的学习模式已知不仅易受基于投入的对抗性攻击,而且易受基于投入的对抗性攻击的伤害。 Dezfooli 等人(cite{Dezfooli17,Dezfooli17anal})通过查看大量培训数据点及其附近决定边界的几何性来构建对特定模式的普遍对抗性攻击。随后的工作 {cite{Khrulkov18}通过只查看测试范例和给定模型的中间层来构建普遍攻击。在本文中,我们建议一种简单的普及性技术,以采纳任何基于投入的对抗性攻击,并通过只看极少的对抗性试验实例来构建普遍攻击性攻击。我们并不需要特定模式的细节,而且对于普遍性的计算间接间接间接。我们从理论上用许多依赖投入的对抗性攻击性的光谱属性来证明我们的普及性攻击。 例如,梯度,快速梯度信号信号方法(FGSMSM) 和深层Fool 使用矩阵的深度浓度不平等和光谱的透度约束性攻击性攻击,我们用普通的基质的基质的基质的基质的基质的基质测试模型向上展示了一种普通的试导。