Universal Adversarial Perturbations (UAPs) are imperceptible, image-agnostic vectors that cause deep neural networks (DNNs) to misclassify inputs from a data distribution with high probability. Existing methods do not create UAPs robust to transformations, thereby limiting their applicability as a real-world attacks. In this work, we introduce a new concept and formulation of robust universal adversarial perturbations. Based on our formulation, we build a novel, iterative algorithm that leverages probabilistic robustness bounds for generating UAPs robust against transformations generated by composing arbitrary sub-differentiable transformation functions. We perform an extensive evaluation on the popular CIFAR-10 and ILSVRC 2012 datasets measuring robustness under human-interpretable semantic transformations, such as rotation, contrast changes, etc, that are common in the real-world. Our results show that our generated UAPs are significantly more robust than those from baselines.
翻译:通用对立扰动( UAPs) 是无法察觉的, 图像不可知的矢量导致深神经网络(DNNS) 错误地分类数据分布的输入, 概率很高。 现有方法不会产生对转换的强大UAPs, 从而限制其作为真实世界攻击的可适用性。 在这项工作中, 我们引入了强势通用对立扰动( UAPs)的新概念和配方。 根据我们的配方, 我们构建了一个新颖的、 迭代算法, 利用概率性强力来生成UAPs, 以抵御任意的亚差异性变换功能产生的变异。 我们对广受欢迎的 CIRFAR- 10 和 ILSVRC 2012 的数据集进行了广泛的评估, 这些数据测量人类交替变异异变的稳健性, 如旋转、 对比变异性等, 在现实世界中很常见。 我们的结果显示, 我们生成的UAPs比基线的强得多。