Adversarial attacks expose important blind spots of deep learning systems. While word- and sentence-level attack scenarios mostly deal with finding semantic paraphrases of the input that fool NLP models, character-level attacks typically insert typos into the input stream. It is commonly thought that these are easier to defend via spelling correction modules. In this work, we show that both a standard spellchecker and the approach of Pruthi et al. (2019), which trains to defend against insertions, deletions and swaps, perform poorly on the character-level benchmark recently proposed in Eger and Benz (2020) which includes more challenging attacks such as visual and phonetic perturbations and missing word segmentations. In contrast, we show that an untrained iterative approach which combines context-independent character-level information with context-dependent information from BERT's masked language modeling can perform on par with human crowd-workers from Amazon Mechanical Turk (AMT) supervised via 3-shot learning.
翻译:单词和句级攻击情景大多涉及寻找愚弄NLP模型输入的语义参数,但字符级攻击通常在输入流中插入打字符。人们通常认为,这些都更容易通过拼写校正模块进行防御。在这项工作中,我们显示,标准拼写检查器和普鲁西等人(2019年)的做法,即训练防止插入、删除和互换,在艾格和本兹(20202020年)最近提出的性格基准上表现不佳,其中包括更具挑战性的攻击,如视觉和语音干扰以及缺失的字节。相反,我们显示一种未经训练的迭接方法,将背景独立的字符级信息与来自BERT的蒙面语言模型的背景信息结合起来,可以与亚马逊机械土耳其(Amazon Mechanical Turk)通过3点的学习监督的人类人群工人进行相同的工作。