In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
翻译:在黑盒对抗性攻击的情况下,目标模型的参数并不为人所知,攻击者的目标是根据查询预算下的查询反馈找到一个成功的对抗性扰动。由于反馈信息有限,现有的基于查询的黑盒攻击方法往往要求对攻击每个良性例子进行许多询问。为了降低查询费用,我们提议在历史攻击中利用反馈信息,称为对抗性转移。具体地说,通过将攻击每个良性例子作为一项任务处理,我们开发了一个元学习框架,培训一个以良性实例为条件的元生成器来产生扰动。在攻击一个新的良性例子时,元生成器可以根据新任务的反馈信息迅速进行微调,同时根据少数历史攻击来产生有效的扰动。此外,由于元性程序消耗了许多查询来学习一个通用的可调动发电机,我们利用模型级的对抗性转移能力来训练以白箱代谢型模型模型为模型,然后将其转移到帮助攻击目标模型。在攻击新良性例子中,根据新任务反馈生成的元产生快速微调整,同时根据新任务反馈信息进行一些历史攻击性攻击性攻击性攻击性攻击性攻击性攻击性试验,此外,通过两种模拟式的推式的推式的推式推式推式推式推式的推式推式推式推式推式推式推式推式推式推式推式推式推。