In the past few years, it has become increasingly evident that deep neural networks are not resilient enough to withstand adversarial perturbations in input data, leaving them vulnerable to attack. Various authors have proposed strong adversarial attacks for computer vision and Natural Language Processing (NLP) tasks. As a response, many defense mechanisms have also been proposed to prevent these networks from failing. The significance of defending neural networks against adversarial attacks lies in ensuring that the model's predictions remain unchanged even if the input data is perturbed. Several methods for adversarial defense in NLP have been proposed, catering to different NLP tasks such as text classification, named entity recognition, and natural language inference. Some of these methods not only defend neural networks against adversarial attacks but also act as a regularization mechanism during training, saving the model from overfitting. This survey aims to review the various methods proposed for adversarial defenses in NLP over the past few years by introducing a novel taxonomy. The survey also highlights the fragility of advanced deep neural networks in NLP and the challenges involved in defending them.
翻译:在过去几年中,越来越明显地看出,深度神经网络无法承受输入数据中的对抗性扰动,使其容易受到攻击。许多作者已经提出了针对计算机视觉和自然语言处理(NLP)任务的强对抗性攻击。作为回应,也已经提出了许多防御机制,以防止这些网络发生故障。防御神经网络对抗攻击的重要性在于确保模型的预测即使输入数据被扰动也不会改变。已经提出了几种方法用于 NLP 中的对抗性防御,适用于不同的 NLP 任务,如文本分类、命名实体识别和自然语言推理。一些方法不仅防御神经网络对抗攻击,而且在训练期间还充当正则化机制,使模型免于过拟合。本调查旨在通过引入新的分类法回顾过去几年中在 NLP 中提出的各种对抗性防御方法。本调查还强调了先进的深度神经网络在 NLP 中的脆弱性以及防御的挑战。