坏字元: 无法察觉的 NLP 攻击 (Bad Characters: Imperceptible NLP Attacks)

Several years of research have shown that machine-learning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to attack text-based models in a black-box setting without making any human-perceptible visual modification to inputs. We use encoding-specific perturbations that are imperceptible to the human eye to manipulate the outputs of a wide range of Natural Language Processing (NLP) systems from neural machine-translation pipelines to web search engines. We find that with a single imperceptible encoding injection -- representing one invisible character, homoglyph, reordering, or deletion -- an attacker can significantly reduce the performance of vulnerable models, and with three injections most models can be functionally broken. Our attacks work against currently-deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook and IBM. This novel series of attacks presents a significant threat to many language processing systems: an attacker can affect systems in a targeted manner without any assumptions about the underlying model. We conclude that text-based NLP systems require careful input sanitization, just like conventional applications, and that given such systems are now being deployed rapidly at scale, the urgent attention of architects and operators is required.

翻译：几年的研究显示,在理论和实践上,机器学习系统都容易在理论和实践上成为对抗性实例。直到现在,这种攻击主要以视觉模型为目标,利用人类和机器感知之间的差距。虽然基于文本的模式也受到了对抗性例子的攻击,但这种攻击是为了保持语义含义和不可分性而斗争的。在本文中,我们探索了一大类对抗性例子,这些例子可以用来在黑箱环境中攻击基于文本的模式,而不会对投入作任何人能视得见的视觉修改。我们使用对人眼无法察觉的特定编码干扰,将一系列广泛的自然语言处理系统(NLP)的输出从神经机转换管道到网络搜索引擎。我们发现,用单一的无法察觉的编码输入,代表一种无形的特性,同义、重新排序或删除 -- 攻击者可以大大降低基于脆弱模型的性能,而大多数输入的模型也可以在功能上被打破。我们针对目前无法察觉到人类眼界的、目前无法察觉的特定的编码干扰性干扰,我们目前对各种自然语言处理(NLP)系统的系统进行操纵,从神经转换到网络操作的大规模威胁性操作系统。我们可以通过一个公开的系统向一个公开的SBMS和S-smamas的系统进行系统进行系统进行这种系统进行快速的系统, 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【如何做研究】How to research ，22页ppt

专知会员服务

113+阅读 · 2021年4月17日

近期必读的六篇AAAI 2021【对抗攻击（Adversarial Attack）】相关论文和代码

专知会员服务

55+阅读 · 2021年2月17日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日