文本防御:基于 Word Einty Entropy 的反对文本探测 (TextDefense: Adversarial Text Detection based on Word Importance Entropy)

Currently, natural language processing (NLP) models are wildly used in various scenarios. However, NLP models, like all deep models, are vulnerable to adversarially generated text. Numerous works have been working on mitigating the vulnerability from adversarial attacks. Nevertheless, there is no comprehensive defense in existing works where each work targets a specific attack category or suffers from the limitation of computation overhead, irresistible to adaptive attack, etc. In this paper, we exhaustively investigate the adversarial attack algorithms in NLP, and our empirical studies have discovered that the attack algorithms mainly disrupt the importance distribution of words in a text. A well-trained model can distinguish subtle importance distribution differences between clean and adversarial texts. Based on this intuition, we propose TextDefense, a new adversarial example detection framework that utilizes the target model's capability to defend against adversarial attacks while requiring no prior knowledge. TextDefense differs from previous approaches, where it utilizes the target model for detection and thus is attack type agnostic. Our extensive experiments show that TextDefense can be applied to different architectures, datasets, and attack methods and outperforms existing methods. We also discover that the leading factor influencing the performance of TextDefense is the target model's generalizability. By analyzing the property of the target model and the property of the adversarial example, we provide our insights into the adversarial attacks in NLP and the principles of our defense method.

翻译：目前,自然语言处理模型(NLP)在各种情景中被疯狂地使用。然而,自然语言处理模型(NLP)在各种情景中被疯狂地使用。但是,像所有深层次模型一样,自然语言处理模型(NLP)在各种情景中都容易受对抗性生成的文本的影响。许多工作一直在致力于减轻对抗性攻击的脆弱程度。然而,在每种工作都针对特定攻击类别或受计算间接费用限制、无法适应性攻击等制约的现有工程中,没有全面的防御。在本文中,我们详尽地调查了国家语言处理中的对抗性攻击算法,我们的经验研究发现,攻击性算法主要扰乱了文本中文字的重要分布。经过良好训练的模型可以区分清洁文本和敌对性文本之间的微妙重要性分布差异。基于这一直觉,我们提出了TextDefence(Text Defence),这是一个新的对抗性示范性示范性测试框架,它利用目标模型来抵御对抗性攻击,而无需事先了解。在以往的做法中,我们使用目标模式的防御性模型和反向性分析方法。我们还发现,我们一般的防御性攻击性研究方法。我们还在分析方法中提供。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/