Almost all adversarial attacks are formulated to add an imperceptible perturbation to an image in order to fool a model. Here, we consider the opposite which is adversarial examples that can fool a human but not a model. A large enough and perceptible perturbation is added to an image such that a model maintains its original decision, whereas a human will most likely make a mistake if forced to decide (or opt not to decide at all). Existing targeted attacks can be reformulated to synthesize such adversarial examples. Our proposed attack, dubbed NKE, is similar in essence to the fooling images, but is more efficient since it uses gradient descent instead of evolutionary algorithms. It also offers a new and unified perspective into the problem of adversarial vulnerability. Experimental results over MNIST and CIFAR-10 datasets show that our attack is quite efficient in fooling deep neural networks. Code is available at https://github.com/aliborji/NKE.
翻译:几乎所有对抗性攻击都是为了在图像上添加一种无法察觉的扰动,以愚弄模型。 这里, 我们考虑相反的是, 对抗性的例子可以愚弄人而不是模型。 在图像中添加了足够大和可察觉的扰动, 使模型保持其最初的决定, 而如果人类被迫决定( 或选择不作决定 ), 则很可能犯错误。 现有的定向攻击可以重新组合这种对抗性的例子。 我们所谓的NKE 提议的攻击在本质上与愚弄的图像相似, 但由于它使用梯度下降而不是演化算法, 其效率更高。 它还为对抗性脆弱性问题提供了新的统一观点。 MNISIS和CIFAR- 10 数据集的实验结果表明,我们的攻击在愚弄深层神经网络方面相当有效。 代码可在https://github.com/aliborji/NKE查阅 https://github.com/ aliborji/NKE上查阅 。