Backdoor attacks are a kind of insidious security threat against machine learning models. After being injected with a backdoor in training, the victim model will produce adversary-specified outputs on the inputs embedded with predesigned triggers but behave properly on normal inputs during inference. As a sort of emergent attack, backdoor attacks in natural language processing (NLP) are investigated insufficiently. As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort. In this paper, we propose to use syntactic structure as the trigger in textual backdoor attacks. We conduct extensive experiments to demonstrate that the syntactic trigger-based attack method can achieve comparable attack performance (almost 100\% success rate) to the insertion-based methods but possesses much higher invisibility and stronger resistance to defenses. These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks. All the code and data of this paper can be obtained at https://github.com/thunlp/HiddenKiller.
翻译:后门攻击是对机器学习模式的一种隐蔽的安全威胁。 在被注入后门训练后门后门攻击后,受害者模型将在嵌入的输入中产生带有预先设计的触发器的对抗性特定产出,但在推断过程中对正常输入物的正确行为。作为一种突发攻击,自然语言处理(NLP)的后门攻击调查不够充分。据我们所知,几乎所有现有的文字后门攻击方法都将更多的内容作为触发器插入正常样本,从而导致触发式的样品被检测出来,后门攻击被阻断,而没有做出很大努力。在本文件中,我们提议使用合成结构作为文字后门攻击的触发器。我们进行了广泛的实验,以证明合成的触发器攻击方法能够达到与插入式方法相似的攻击性能(近100 ⁇ 成功率),但具有更高的可见度和对防御力。这些结果还揭示了文字后门攻击的重大阴险和有害性。本文的所有代码和数据都可以在 https://github.commlp/hidenKhiler.