Natural language processing models based on neural networks are vulnerable to adversarial examples. These adversarial examples are imperceptible to human readers but can mislead models to make the wrong predictions. In a black-box setting, attacker can fool the model without knowing model's parameters and architecture. Previous works on word-level attacks widely use single semantic space and greedy search as a search strategy. However, these methods fail to balance the attack success rate, quality of adversarial examples and time consumption. In this paper, we propose BeamAttack, a textual attack algorithm that makes use of mixed semantic spaces and improved beam search to craft high-quality adversarial examples. Extensive experiments demonstrate that BeamAttack can improve attack success rate while saving numerous queries and time, e.g., improving at most 7\% attack success rate than greedy search when attacking the examples from MR dataset. Compared with heuristic search, BeamAttack can save at most 85\% model queries and achieve a competitive attack success rate. The adversarial examples crafted by BeamAttack are highly transferable and can effectively improve model's robustness during adversarial training. Code is available at https://github.com/zhuhai-ustc/beamattack/tree/master
翻译:以神经网络为基础的自然语言处理模型很容易受到对抗性实例的影响。 这些对抗性例子对于人类读者来说是无法理解的,但可以误导模型来做出错误预测。 在黑盒环境中,攻击者可以在不了解模型参数和结构的情况下愚弄模型。 先前的字级攻击工作广泛使用单一语义空间和贪婪搜索作为搜索策略。 但是,这些方法无法平衡攻击成功率、对抗性例子的质量和时间消耗。 在本文中,我们提议BeamAttack, 一种文本攻击算法, 使用混合语义空间,改进搜索, 以制作高质量的对抗性例子。 广泛的实验表明, BeamAttack 能够提高攻击成功率,同时节省许多查询和时间,例如,在攻击MS数据集的示例时,最多能改进7<unk> 攻击成功率,而不是贪婪搜索。 与超理论性搜索相比, BeamAttack 可以节省最多85<unk> 示范查询, 并实现竞争性攻击成功率。 BeamAttack 所制作的对抗性例子是高度可转让的, 并且可以有效地改进MABA/treabas/treal训练期间的模型。</s>