We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. Plenty of previous works investigated what kind of attacks to the surrogate model can generate more transferable adversarial examples, but their performances are still limited due to the mismatches between surrogate models and the target model. In this paper, we tackle this problem from a novel angle -- instead of using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models? We show that this goal can be mathematically formulated as a well-posed (bi-level-like) optimization problem and design a differentiable attacker to make training feasible. Given one or a set of surrogate models, our method can thus obtain an MSM such that adversarial examples generated on MSM enjoy eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet demonstrate that by attacking the MSM, we can obtain stronger transferable adversarial examples to fool black-box models including adversarially trained ones, with much higher success rates than existing methods. The proposed method reveals significant security challenges of deep models and is promising to be served as a state-of-the-art benchmark for evaluating the robustness of deep models in the black-box setting.
翻译:在这种环境下,许多方法直接攻击代用模型,并将获得的对抗性实例转移到其他模型?我们证明,这一目标可以用数学方式拟订,作为精心准备的(双级)最佳化问题,并设计一个不同的攻击者,使培训成为可行。考虑到一套或一组代用模型,我们的方法可以获得一个MSM,这样,在MSM上产生的对抗性例子就享有令人难以置信的可转移性。在Cifar-10和图像网络上的全面实验表明,通过攻击MSM,我们可以得到一个更强大的可转移的黑人对抗性例子,而成为一个可靠的(双级)最优化问题,并设计出一个不同的攻击者,使培训成为可行。在一种或一组代用模型与目标模型不匹配的情况下,我们的方法可以获得一个MSM,这样,在MSM上产生的对抗性范例就具有惊人的可转移性能。