Currently, natural language processing (NLP) models are wildly used in various scenarios. However, NLP models, like all deep models, are vulnerable to adversarially generated text. Numerous works have been working on mitigating the vulnerability from adversarial attacks. Nevertheless, there is no comprehensive defense in existing works where each work targets a specific attack category or suffers from the limitation of computation overhead, irresistible to adaptive attack, etc. In this paper, we exhaustively investigate the adversarial attack algorithms in NLP, and our empirical studies have discovered that the attack algorithms mainly disrupt the importance distribution of words in a text. A well-trained model can distinguish subtle importance distribution differences between clean and adversarial texts. Based on this intuition, we propose TextDefense, a new adversarial example detection framework that utilizes the target model's capability to defend against adversarial attacks while requiring no prior knowledge. TextDefense differs from previous approaches, where it utilizes the target model for detection and thus is attack type agnostic. Our extensive experiments show that TextDefense can be applied to different architectures, datasets, and attack methods and outperforms existing methods. We also discover that the leading factor influencing the performance of TextDefense is the target model's generalizability. By analyzing the property of the target model and the property of the adversarial example, we provide our insights into the adversarial attacks in NLP and the principles of our defense method.
翻译:目前,自然语言处理模型(NLP)在各种情景中被疯狂地使用。然而,自然语言处理模型(NLP)在各种情景中被疯狂地使用。但是,像所有深层次模型一样,自然语言处理模型(NLP)在各种情景中都容易受对抗性生成的文本的影响。许多工作一直在致力于减轻对抗性攻击的脆弱程度。然而,在每种工作都针对特定攻击类别或受计算间接费用限制、无法适应性攻击等制约的现有工程中,没有全面的防御。在本文中,我们详尽地调查了国家语言处理中的对抗性攻击算法,我们的经验研究发现,攻击性算法主要扰乱了文本中文字的重要分布。经过良好训练的模型可以区分清洁文本和敌对性文本之间的微妙重要性分布差异。基于这一直觉,我们提出了TextDefence(Text Defence),这是一个新的对抗性示范性示范性测试框架,它利用目标模型来抵御对抗性攻击,而无需事先了解。在以往的做法中,我们使用目标模式的防御性模型和反向性分析方法。我们还发现,我们一般的防御性攻击性研究方法。我们还在分析方法中提供。