Recent studies have shown that deep neural networks are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin.
翻译:最近的研究显示,深层神经网络很容易受到故意设计的对抗性实例的影响,而且提出了各种防范神经中子神经中子神经中子神经中子神经中子神经中子神经中子反应的对抗性用词替代攻击的方法,然而,对于在同一攻击背景下比较不同的防御方法缺乏系统的研究,在本文件中,我们力求通过全面研究来填补系统研究的空白,了解在有代表性的对抗性攻击中,通过各种防御方法培训的神经文本分类人员的行为。此外,我们提出了一种有效方法,以进一步提高神经文本分类人员对付这种攻击的稳健性,并大大地精确地利用GENEWS和IMDB数据集的清洁和对抗性例子。