以行为克隆进行多管性多管性文字反对性攻击 (Multi-granularity Textual Adversarial Attack with Behavior Cloning)

Recently, the textual adversarial attack models become increasingly popular due to their successful in estimating the robustness of NLP models. However, existing works have obvious deficiencies. (1) They usually consider only a single granularity of modification strategies (e.g. word-level or sentence-level), which is insufficient to explore the holistic textual space for generation; (2) They need to query victim models hundreds of times to make a successful attack, which is highly inefficient in practice. To address such problems, in this paper we propose MAYA, a Multi-grAnularitY Attack model to effectively generate high-quality adversarial samples with fewer queries to victim models. Furthermore, we propose a reinforcement-learning based method to train a multi-granularity attack agent through behavior cloning with the expert knowledge from our MAYA algorithm to further reduce the query times. Additionally, we also adapt the agent to attack black-box models that only output labels without confidence scores. We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets. Experimental results show that our models achieve overall better attacking performance and produce more fluent and grammatical adversarial samples compared to baseline models. Besides, our adversarial attack agent significantly reduces the query times in both attack settings. Our codes are released at https://github.com/Yangyi-Chen/MAYA.

翻译：最近,由于成功估算了NLP模型的稳健性,理论对抗性攻击模式由于成功估计了NLP模型的强健性而越来越受欢迎,但是,现有的工程有明显的缺陷。 (1) 它们通常只考虑一个修改战略的单一颗粒(如字级或判决级),这不足以探索整体的一代文字空间;(2) 它们需要询问受害者模型数百次,以便成功地进行攻击,而实际上这种攻击效率非常低。为了解决这些问题,我们在本文件中提议建立一个多面形攻击模式MAY,以有效生成高质量的对抗性试样,同时减少对受害者模型的查询。此外,我们建议一种基于强化学习的方法,通过行为克隆来培训多面形攻击剂,而这种方法又不足以探索一代整体的文字空间;(2) 它们需要询问受害者模型数百次,以便成功地进行攻击,而这种攻击实际上效率很高。为了解决这些问题,我们进行了全面实验,通过在两个不同的黑箱攻击环境中攻击BERT和RoBERTA 三个基准数据集来评估我们的攻击模式。实验结果显示,我们模型比我们的攻击性攻击基准/CRDRBRA的模型都比我们的攻击基准/C。