The research of adversarial attacks in the text domain attracts many interests in the last few years, and many methods with a high attack success rate have been proposed. However, these attack methods are inefficient as they require lots of queries for the victim model when crafting text adversarial examples. In this paper, a novel attack model is proposed, its attack success rate surpasses the benchmark attack methods, but more importantly, its attack efficiency is much higher than the benchmark attack methods. The novel method is empirically evaluated by attacking WordCNN, LSTM, BiLSTM, and BERT on four benchmark datasets. For instance, it achieves a 100\% attack success rate higher than the state-of-the-art method when attacking BERT and BiLSTM on IMDB, but the number of queries for the victim models only is 1/4 and 1/6.5 of the state-of-the-art method, respectively. Also, further experiments show the novel method has a good transferability on the generated adversarial examples.
翻译:在过去几年中,文本领域对对抗性攻击的研究吸引了许多兴趣,并提出了许多攻击成功率高的方法。然而,这些攻击方法效率低下,因为在起草案文对抗性例子时,这些攻击方法要求受害者模型进行大量查询。在本文中,提出了一个新的攻击模式,其攻击成功率超过了基准攻击方法,但更重要的是,其攻击效率大大高于基准攻击方法。新方法通过用四个基准数据集攻击WordCNN、LSTM、BilsTM和BERT, 进行了经验评估。例如,在攻击生物和生物和毒素武器数据库时,其攻击成功率比最新方法高出100 ⁇ 。但是,对受害者模型的查询次数分别仅为最新方法的四分之一和1/6.5。此外,进一步实验显示,新方法在生成的对抗性例子中具有良好的可转移性。