Researchers have repeatedly shown that it is possible to craft adversarial attacks on deep classifiers (small perturbations that significantly change the class label), even in the "black-box" setting where one only has query access to the classifier. However, all prior work in the black-box setting attacks the classifier by repeatedly querying the same image with minor modifications, usually thousands of times or more, making it easy for defenders to detect an ensuing attack. In this work, we instead show that it is possible to craft (universal) adversarial perturbations in the black-box setting by querying a sequence of different images only once. This attack prevents detection from high number of similar queries and produces a perturbation that causes misclassification when applied to any input to the classifier. In experiments, we show that attacks that adhere to this restriction can produce untargeted adversarial perturbations that fool the vast majority of MNIST and CIFAR-10 classifier inputs, as well as in excess of $60-70\%$ of inputs on ImageNet classifiers. In the targeted setting, we exhibit targeted black-box universal attacks on ImageNet classifiers with success rates above $20\%$ when only allowed one query per image, and $66\%$ when allowed two queries per image.
翻译:研究人员一再表明,即使在“黑盒子”设置中,人们只能查询进入分类器的机会,因此有可能对深分类器进行对抗性攻击(小扰动,大大改变分类标签的标签),即使是在“黑盒子”设置中,只要查询分类器,也可以对深分类器进行对抗性攻击。然而,在黑盒子设置中,所有先前在黑盒子设置中对分类器攻击分类器进行的所有工作,都反复对同一图像进行反复的查询,通常有数千次或更多次的微小修改,使维权者很容易发现随后的攻击。在这项工作中,我们反而表明,在黑盒子设置中,只能通过查询不同图像序列一次,来进行(普遍)对抗性攻击(普遍)对立性攻击。在对分类器的任何输入应用时,这种攻击导致分类错误。在试验中,我们表明,遵守这种限制的攻击可能产生非有针对性的对抗性攻击,从而愚弄了绝大多数的MNIST和CIFAR-10分类器输入,以及图像网络分类器投入超过60-70 美元。在目标设置时,我们只展示针对图像网络普遍攻击的黑箱$,每次访问时,每摄取20美元,而允许的图像查询率超过20美元。