In recent years, it has been seen that deep neural networks are lacking robustness and are vulnerable in case of adversarial perturbations in input data. Strong adversarial attacks are proposed by various authors for tasks under computer vision and Natural Language Processing (NLP). As a counter-effort, several defense mechanisms are also proposed to save these networks from failing. Defending the neural networks from adversarial attacks has its own importance, where the goal is to ensure that the model's prediction doesn't change if input data is perturbed. Numerous methods for adversarial defense in NLP are proposed of late, for different NLP tasks such as text classification, named entity recognition, natural language inferencing, etc. Some of these methods are not just used for defending neural networks from adversarial attacks, but also used as a regularization mechanism during training, saving the model from overfitting. The proposed survey is an attempt to review different methods proposed for adversarial defenses in NLP in recent years by proposing a novel taxonomy. This survey also highlights the fragility of the advanced deep neural networks in NLP and the challenges in defending them.
翻译:近些年来,人们看到,深层神经网络缺乏稳健性,在输入数据出现对抗性干扰时很容易发生。各种作者提议对计算机视觉和自然语言处理(NLP)下的任务进行强烈的对抗性攻击。作为反努力,还提议采用若干防御机制来防止这些网络失败。保护神经网络免受对抗性攻击有其自身的重要性,其目标是确保模型的预测不会在输入数据受到侵扰时发生变化。国家实验室的很多对抗性防御方法后来被提出,用于不同的国家实验室的文字分类、名称实体识别、自然语言推断等任务。其中一些方法不仅用于保护神经网络免受对抗性攻击,而且用作培训期间的规范机制,使模型免于过度完善。拟议调查的目的是通过提出新的分类法来审查近年来国家实验室内高级神经网络的脆弱性和捍卫这些网络的挑战。