Fine-tuning can be vulnerable to adversarial attacks. Existing works about black-box attacks on fine-tuned models (BAFT) are limited by strong assumptions. To fill the gap, we propose two novel BAFT settings, cross-domain and cross-domain cross-architecture BAFT, which only assume that (1) the target model for attacking is a fine-tuned model, and (2) the source domain data is known and accessible. To successfully attack fine-tuned models under both settings, we propose to first train an adversarial generator against the source model, which adopts an encoder-decoder architecture and maps a clean input to an adversarial example. Then we search in the low-dimensional latent space produced by the encoder of the adversarial generator. The search is conducted under the guidance of the surrogate gradient obtained from the source model. Experimental results on different domains and different network architectures demonstrate that the proposed attack method can effectively and efficiently attack the fine-tuned models.
翻译:微调很容易受到对抗性攻击。 微调模型( BAFT) 黑盒攻击黑盒攻击的现有工作受到强烈的假设的限制。 为了填补空白, 我们提议了两种新型的BAFT设置, 跨域和跨域跨结构跨结构BAFT, 仅假设:(1) 攻击的目标模式是一个微调模型, 以及(2) 源域数据是已知和可获取的。 为了在两种情况下成功攻击微调模型, 我们提议首先训练一个对源模型的对称生成器, 该源模型采用编码- 解码器结构, 并绘制对对抗性模型的清洁输入图。 然后我们搜索由对称生成的对称发电机编码器生成的低维潜在空间。 搜索是在从源模型获得的代谢梯度指导下进行的。 不同领域和不同网络结构的实验结果显示, 拟议的攻击方法能够有效和高效地攻击微调模型。