培训前模式的红色警报:普遍易受中下门攻击 (Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks)

Pre-trained models (PTMs) have been widely used in various downstream tasks. The parameters of PTMs are distributed on the Internet and may suffer backdoor attacks. In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks in arbitrary downstream tasks. Specifically, attackers can add a simple pre-training task, which restricts the output representations of trigger instances to pre-defined vectors, namely neuron-level backdoor attack (NeuBA). If the backdoor functionality is not eliminated during fine-tuning, the triggers can make the fine-tuned model predict fixed labels by pre-defined vectors. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA absolutely controls the predictions for trigger instances without any knowledge of downstream tasks. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising direction to resist NeuBA by excluding backdoored neurons. Our findings sound a red alarm for the wide use of PTMs. Our source code and models are available at \url{https://github.com/thunlp/NeuBA}.

翻译：预培训模型(PTMs)被广泛用于各种下游任务。 PTM的参数在互联网上分布,可能会受到幕后攻击。在这项工作中,我们展示了PTM的普遍脆弱性,在这种脆弱性中,微调的PTM可以很容易地通过任意下游任务中的幕后攻击来控制。具体地说,攻击者可以增加一个简单的培训前任务,将触发事件的产出表现限于预先定义的矢量,即神经级后门攻击(NeuBA)。如果在微调期间没有消除后门功能,触发器可以使微调模型预测预先定义的矢量的固定标签。在天然语言处理和计算机视觉的实验中,我们显示NeuBA绝对控制触发事件的预测,而不知道下游任务。最后,我们对NeBA采用几种防御方法,发现模型运行是抵抗NeuBA的有希望的方向,排除后门神经。我们的发现对PTM的广泛使用红色警报。我们的源码和模型在\urp@NUBA/MUB/MUs@Ms/Nurth@m@murth@m@mus@mus@m@mus/murth@mus@m@mus)。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

近期必读的六篇AAAI 2021【对抗攻击（Adversarial Attack）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月17日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日