Discovering the existence of universal adversarial perturbations had large theoretical and practical impacts on the field of adversarial learning. In the text domain, most universal studies focused on adversarial prefixes which are added to all texts. However, unlike the vision domain, adding the same perturbation to different inputs results in noticeably unnatural inputs. Therefore, we introduce a new universal adversarial setup - a universal adversarial policy, which has many advantages of other universal attacks but also results in valid texts - thus making it relevant in practice. We achieve this by learning a single search policy over a predefined set of semantics preserving text alterations, on many texts. This formulation is universal in that the policy is successful in finding adversarial examples on new texts efficiently. Our approach uses text perturbations which were extensively shown to produce natural attacks in the non-universal setup (specific synonym replacements). We suggest a strong baseline approach for this formulation which uses reinforcement learning. It's ability to generalise (from as few as 500 training texts) shows that universal adversarial patterns exist in the text domain as well.
翻译:在文本领域,大多数普遍研究都侧重于所有文本中添加的对抗性前缀,但与愿景领域不同,在不同投入中添加同样的扰动导致明显不自然的投入。因此,我们引入了新的普遍对抗性结构----一种普遍对抗性政策,它具有其他普遍攻击的许多优势,但也产生了有效的文本----从而使得它在实践中具有相关性。我们通过学习一套预先定义的保存许多文本变化的语义的单一搜索政策来实现这一目标。这种表述是普遍的,因为该政策成功地找到新文本的对抗性实例。我们的方法使用大量显示在非普遍设置中产生自然攻击的文字扰动(特定同义词替换)。我们建议对这一提法采取强有力的基线方法,以强化学习。我们能够(从为数500个培训文本中)概括化(从为数不多的500个培训文本),表明在文本领域也存在普遍对抗性对抗性模式。