Deep neural networks have a wide range of applications in solving various real-world tasks and have achieved satisfactory results, in domains such as computer vision, image classification, and natural language processing. Meanwhile, the security and robustness of neural networks have become imperative, as diverse researches have shown the vulnerable aspects of neural networks. Case in point, in Natural language processing tasks, the neural network may be fooled by an attentively modified text, which has a high similarity to the original one. As per previous research, most of the studies are focused on the image domain; Different from image adversarial attacks, the text is represented in a discrete sequence, traditional image attack methods are not applicable in the NLP field. In this paper, we propose a word-level NLP sentiment classifier attack model, which includes a self-attention mechanism-based word selection method and a greedy search algorithm for word substitution. We experiment with our attack model by attacking GRU and 1D-CNN victim models on IMDB datasets. Experimental results demonstrate that our model achieves a higher attack success rate and more efficient than previous methods due to the efficient word selection algorithms are employed and minimized the word substitute number. Also, our model is transferable, which can be used in the image domain with several modifications.
翻译:深神经网络在解决各种现实世界任务方面有着广泛的应用,并且取得了令人满意的结果,例如在计算机视觉、图像分类和自然语言处理等领域。与此同时,神经网络的安全和稳健性变得势在必行,因为各种研究已经展示了神经网络的脆弱方面。在自然语言处理任务中,神经网络可能被一个精心修改的文本所愚弄,该文本与最初文本非常相似。根据以往的研究,大多数研究都集中在图像领域;与图像对抗攻击不同,文本以离散顺序表示,传统图像攻击方法不适用于NLP字段。在本文件中,我们提议了一个单词级NLP情绪分类攻击模型,其中包括一个基于自用机制的单词选择方法和一个贪婪的换字搜索算法。我们在IMDB数据集上用攻击GRU和1D-CNN受害者模型进行实验。实验结果显示,我们的模型取得了更高的攻击成功率,比先前的模型效率更高,而传统的图像攻击方法在NLP字段中不适用。在高效的单词选择中,我们使用了一些可变式的版本。此外,还使用了一些可变式的词代号。