Backdoor attacks are a kind of insidious security threat against machine learning models. After being injected with a backdoor in training, the victim model will produce adversary-specified outputs on the inputs embedded with predesigned triggers but behave properly on normal inputs during inference. As a sort of emergent attack, backdoor attacks in natural language processing (NLP) are investigated insufficiently. As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort. In this paper, we propose to use the syntactic structure as the trigger in textual backdoor attacks. We conduct extensive experiments to demonstrate that the syntactic trigger-based attack method can achieve comparable attack performance (almost 100% success rate) to the insertion-based methods but possesses much higher invisibility and stronger resistance to defenses. These results also reveal the significant insidiousness and harmfulness of textual backdoor attacks. All the code and data of this paper can be obtained at https://github.com/thunlp/HiddenKiller.
翻译:后门攻击是对机器学习模式的一种隐蔽的安全威胁。 受害者模型在被注入后门后门后门后门后门后门后门后门攻击后, 将在嵌入的带有预先设计的触发器的投入上产生对抗性产出, 但在推断期间对正常输入进行正确的行为。 作为一种突发性攻击, 对自然语言处理( NLP)的后门攻击没有进行充分的调查。 据我们所知, 几乎所有现有的文字后门攻击方法都将更多的内容作为触发器插入正常样本, 从而导致触发式后门攻击的样本被检测出来, 和后门攻击被屏蔽而没有太多努力。 在本文中, 我们提议使用合成结构作为文字后门攻击的触发器。 我们进行广泛的实验, 以证明合成性触发性攻击方法可以达到与插入式攻击方法相似的打击性能( 近100%的成功率), 但却具有更高得多的可视度和较强的防御性。 这些结果还揭示了文字后门攻击的重大隐蔽性和有害性。 本文的所有代码和数据可以在 https://githhuthub. unkdenp/ Hidliler.