The field of adversarial textual attack has significantly grown over the last few years, where the commonly considered objective is to craft adversarial examples (AEs) that can successfully fool the target model. However, the imperceptibility of attacks, which is also essential for practical attackers, is often left out by previous studies. In consequence, the crafted AEs tend to have obvious structural and semantic differences from the original human-written texts, making them easily perceptible. In this work, we advocate leveraging multi-objectivization to address such issue. Specifically, we formulate the problem of crafting AEs as a multi-objective optimization problem, where the imperceptibility of attacks is considered as auxiliary objectives. Then, we propose a simple yet effective evolutionary algorithm, dubbed HydraText, to solve this problem. To the best of our knowledge, HydraText is currently the only approach that can be effectively applied to both score-based and decision-based attack settings. Exhaustive experiments involving 44237 instances demonstrate that HydraText consistently achieves competitive attack success rates and better attack imperceptibility than the recently proposed attack approaches. A human evaluation study also shows that the AEs crafted by HydraText are more indistinguishable from human-written texts. Finally, these AEs exhibit good transferability and can bring notable robustness improvement to the target model by adversarial training.
翻译:过去几年来,对抗性文字攻击领域有了显著的发展,人们通常认为目标是编造能够成功愚弄目标模型的对抗性例子(AEs),然而,对实际攻击者来说同样至关重要的攻击的不可感知性往往被先前的研究所忽略。因此,所编成的AE往往与最初的人文案文有明显的结构性和语义性差异,使它们容易理解。在这项工作中,我们提倡利用多重客观性来解决这一问题。具体地说,我们把设计AE(AE)作为多目标优化问题,把攻击的不可感知性视为辅助目标。然后,我们提出一种简单而有效的演化算法,被装饰的HydalText,以解决该问题。据我们所知,目前,MydText是唯一能够有效适用于基于分和基于决定的攻击环境的方法。涉及44237个实例的探索性实验表明,SydryT(SydratT)始终能够取得竞争性的攻击成功率,而攻击的不可感知性强度则被视为辅助性的目标。最后提出的A-developy A-deforable A-travelyal A-vial 也可以使人类评价研究能够使这些成功的改进方法能够使人类的改进。