Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness. Existing techniques of generating such examples are typically driven by local heuristic rules that are agnostic to the context, often resulting in unnatural and ungrammatical outputs. This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs through a mask-then-infill procedure. CLARE builds on a pre-trained masked language model and modifies the inputs in a context-aware manner. We propose three contextualized perturbations, Replace, Insert and Merge, allowing for generating outputs of varied lengths. With a richer range of available strategies, CLARE is able to attack a victim model more efficiently with fewer edits. Extensive experiments and human evaluation demonstrate that CLARE outperforms the baselines in terms of attack success rate, textual similarity, fluency and grammaticality.
翻译:Aversarial 实例暴露了自然语言处理模式的脆弱性,并可用于评估和提高其稳健性。现有的生成这些范例的技术通常受当地超自然规则的驱动,这些规则对背景具有不可知性,往往导致非自然和非语法产出。本文介绍了CLARE,一种通过遮罩-时填充程序产生流畅和语法产出的CEUALLLLAL 反对等实例生成模型。CLARE以预先训练的隐蔽语言模型为基础,以符合背景的方式修改投入。我们建议了三种背景化的扰动、替换、插入和合并,允许产生不同长度的产出。随着现有战略的扩大,CLARE能够以较少的编辑来更有效地攻击受害者模型。广泛的实验和人类评价表明,CLARE在攻击成功率、文字相似性、流利性和语法性方面超过了基线。