This work studies black-box adversarial attacks against deep neural networks (DNNs), where the attacker can only access the query feedback returned by the attacked DNN model, while other information such as model parameters or the training datasets are unknown. One promising approach to improve attack performance is utilizing the adversarial transferability between some white-box surrogate models and the target model (i.e., the attacked model). However, due to the possible differences on model architectures and training datasets between surrogate and target models, dubbed "surrogate biases", the contribution of adversarial transferability to improving the attack performance may be weakened. To tackle this issue, we innovatively propose a black-box attack method by developing a novel mechanism of adversarial transferability, which is robust to the surrogate biases. The general idea is transferring partial parameters of the conditional adversarial distribution (CAD) of surrogate models, while learning the untransferred parameters based on queries to the target model, to keep the flexibility to adjust the CAD of the target model on any new benign sample. Extensive experiments on benchmark datasets and attacking against real-world API demonstrate the superior attack performance of the proposed method.
翻译:这份工作研究针对深神经网络(DNNs)的黑箱对抗性攻击,攻击者只能获得被攻击的DNN模式反馈的查询反馈,而其他信息,例如模型参数或培训数据集还不清楚。改进攻击性表现的一种有希望的方法是利用一些白箱替代模型和目标模型(即被攻击模型)之间的对抗性转移。然而,由于在模型结构和培训数据集方面可能存在的差异,即代用模型和目标模型(称为“代用偏差”),攻击者只能获得被攻击的DNNN模式反馈,而攻击者对改进攻击性表现的贡献可能会减弱。为了解决这一问题,我们创新地提出了黑箱攻击方法,即开发一种新的对抗性可转移性机制,该机制对代用偏差非常有力。一般的想法是转让代用模型有条件的对抗性分配(CAD)部分参数,同时根据查询目标模型学习未转让的参数,以保持调整任何新的良性样本的目标模型的CAD的灵活性。关于基准数据集的广泛实验和攻击真实性攻击性方法。