关于对神经文字分类的对抗性攻击的可转让性 (On the Transferability of Adversarial Attacksagainst Neural Text Classifier)

Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we present the first study to systematically investigate the transferability of adversarial examples for text classification models and explore how various factors, including network architecture, tokenization scheme, word embedding, and model capacity, affect the transferability of adversarial examples. Based on these studies, we propose a genetic algorithm to find an ensemble of models that can be used to induce adversarial examples to fool almost all existing models. Such adversarial examples reflect the defects of the learning process and the data bias in the training set. Finally, we derive word replacement rules that can be used for model diagnostics from these adversarial examples.

翻译：深神经网络很容易受到对抗性攻击,对输入的微小扰动改变了模型预测。在许多情况下,故意为一个模型设计的恶意输入会愚弄另一个模型。在本文中,我们提出第一项研究,以系统调查文本分类模型对抗性例子的可转让性,并探讨各种因素,包括网络结构、象征性计划、字嵌入和模型能力,如何影响对抗性例子的可转让性。根据这些研究,我们提出基因算法,以找到一套模型,用来诱导对抗性例子,以愚弄几乎所有现有模型。这些对抗性例子反映了学习过程的缺陷和成套培训中的数据偏差。最后,我们从这些对抗性例子中得出可用于模型诊断的换字规则。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/