Recent research has found that neural networks are vulnerable to several types of adversarial attacks, where the input samples are modified in such a way that the model produces a wrong prediction that misclassifies the adversarial sample. In this paper we focus on black-box adversarial attacks, that can be performed without knowing the inner structure of the attacked model, nor the training procedure, and we propose a novel attack that is capable of correctly attacking a high percentage of samples by rearranging a small number of pixels within the attacked image. We demonstrate that our attack works on a large number of datasets and models, that it requires a small number of iterations, and that the distance between the original sample and the adversarial one is negligible to the human eye.
翻译:最近的研究发现,神经网络很容易受到几种类型的对抗性攻击,输入样本的修改方式使模型产生错误的预测,错误地分类了对抗性攻击样本。 在本文中,我们侧重于黑盒对抗性攻击,这种攻击可以在不了解被攻击模型的内部结构或培训程序的情况下进行,我们建议进行新的攻击,通过在被攻击图像中重新排列少量像素,能够正确地攻击高比例的样品。我们证明,我们的攻击工作涉及大量数据集和模型,需要少量的迭代,原始样品和对抗性样品之间的距离对于人类眼睛来说是微不足道的。