Textual adversarial samples play important roles in multiple subfields of NLP research, including security, evaluation, explainability, and data augmentation. However, most work mixes all these roles, obscuring the problem definitions and research goals of the security role that aims to reveal the practical concerns of NLP models. In this paper, we rethink the research paradigm of textual adversarial samples in security scenarios. We discuss the deficiencies in previous work and propose our suggestions that the research on the Security-oriented adversarial NLP (SoadNLP) should: (1) evaluate their methods on security tasks to demonstrate the real-world concerns; (2) consider real-world attackers' goals, instead of developing impractical methods. To this end, we first collect, process, and release a security datasets collection Advbench. Then, we reformalize the task and adjust the emphasis on different goals in SoadNLP. Next, we propose a simple method based on heuristic rules that can easily fulfill the actual adversarial goals to simulate real-world attack methods. We conduct experiments on both the attack and the defense sides on Advbench. Experimental results show that our method has higher practical value, indicating that the research paradigm in SoadNLP may start from our new benchmark. All the code and data of Advbench can be obtained at \url{https://github.com/thunlp/Advbench}.
翻译:文本对立样本在NLP研究的多个子领域发挥着重要的作用,包括安全、评估、解释性和数据增强。然而,大多数工作混合了所有这些作用,掩盖了安全作用中旨在揭示NLP模型实际关切的难题定义和研究目标。在本文件中,我们重新思考了安全情景中文本对立样本的研究范式。我们讨论了以往工作中的缺陷,并提出了我们的建议,即关于面向安全的对抗性NLP(SoadNLP)的研究应当:(1)评估其在安全任务上的方法,以展示真实世界的关切;(2)考虑真实世界攻击者的目标,而不是制定不切实际的方法。为此,我们首先收集、处理并发布安全数据集收集Advbench。然后,我们重新调整任务,调整对SoadNLP的不同目标的强调。接下来,我们提出一个基于超自然规则的简单方法,可以轻易实现真实的对抗性攻击目标,以模拟真实世界的攻击方法。我们在Advbbench上对攻击者和防御方进行实验,而不是开发不切实际方法。我们从Advbench进行高级的实验性研究。所有方法都显示我们从实际的代码中获得的数值。