Self-supervised pre-training has drawn increasing attention in recent years due to its superior performance on numerous downstream tasks after fine-tuning. However, it is well-known that deep learning models lack the robustness to adversarial examples, which can also invoke security issues to pre-trained models, despite being less explored. In this paper, we delve into the robustness of pre-trained models by introducing Pre-trained Adversarial Perturbations (PAPs), which are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones without any knowledge of the downstream tasks. To this end, we propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models. Equipped with an enhanced noise augmentation strategy, L4A is effective at generating more transferable PAPs against fine-tuned models. Extensive experiments on typical pre-trained vision models and ten downstream tasks demonstrate that our method improves the attack success rate by a large margin compared with state-of-the-art methods.
翻译:近年来,由于经过微调后在众多下游任务上表现优异,自监督培训前的训练工作近年来引起越来越多的注意,然而,众所周知,深层次学习模式缺乏对对抗性实例的强力,而对抗性实例尽管探索较少,但也可以将安全问题引向预培训模式;在本文件中,我们深入研究预先培训的模型的稳健性,采用预先培训的反动(PAPs),这是为预先培训的模型所设计的普遍扰动,目的是在不了解下游任务的情况下攻击经过微调的模型时保持效力;为此,我们提议采用低层升压(L4A)方法,通过提升经过预先培训的模型的低层神经活动来产生有效的PAPs;采用强化噪音增强战略,L4A能够有效地产生更可转让的PAPs,对抗经过微调的模型。关于典型的预先培训的视觉模型和10项下游任务的广泛试验表明,我们的方法比州一级方法大大改进攻击成功率。