Pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety of downstream language tasks. This significantly accelerates the development of language models. However, NLP models have been shown to be vulnerable to backdoor attacks, where a pre-defined trigger word in the input text causes model misprediction. Previous NLP backdoor attacks mainly focus on some specific tasks. This makes those attacks less general and applicable to other kinds of NLP models and tasks. In this work, we propose \Name, the first task-agnostic backdoor attack against the pre-trained NLP models. The key feature of our attack is that the adversary does not need prior information about the downstream tasks when implanting the backdoor to the pre-trained model. When this malicious model is released, any downstream models transferred from it will also inherit the backdoor, even after the extensive transfer learning process. We further design a simple yet effective strategy to bypass a state-of-the-art defense. Experimental results indicate that our approach can compromise a wide range of downstream NLP tasks in an effective and stealthy way.
翻译:受过培训的自然语言处理模式( NLP) 很容易适应各种下游语言任务。 这大大加快了语言模式的发展。 但是, NLP 模式被证明很容易受到后门攻击, 输入文本中预设的触发词导致模式错误。 以前的 NLP 后门攻击主要侧重于某些具体任务。 这使得这些攻击不那么一般,并适用于其他类型的自然语言处理模式和任务。 在此工作中, 我们提议\ Name, 第一次任务- 不可知性后门攻击先训练的NLP 模式。 我们攻击的关键特征是, 在将后门植入预培训模式时, 对手不需要关于下游任务的事先信息。 当这种恶意模式被释放时, 从中转移的任何下游模式也会继承后门, 即使在广泛的转移学习过程之后。 我们进一步设计一个简单有效的战略, 绕过国家艺术防御。 实验结果显示, 我们的方法可以有效、 和隐形地妥协下游下游国家语言任务。