Adversarial training, a method for learning robust deep neural networks, constructs adversarial examples during training. However, recent methods for generating NLP adversarial examples involve combinatorial search and expensive sentence encoders for constraining the generated instances. As a result, it remains challenging to use vanilla adversarial training to improve NLP models' performance, and the benefits are mainly uninvestigated. This paper proposes a simple and improved vanilla adversarial training process for NLP models, which we name Attacking to Training (A2T). The core part of A2T is a new and cheaper word substitution attack optimized for vanilla adversarial training. We use A2T to train BERT and RoBERTa models on IMDB, Rotten Tomatoes, Yelp, and SNLI datasets. Our results empirically show that it is possible to train robust NLP models using a much cheaper adversary. We demonstrate that vanilla adversarial training with A2T can improve an NLP model's robustness to the attack it was originally trained with and also defend the model against other types of word substitution attacks. Furthermore, we show that A2T can improve NLP models' standard accuracy, cross-domain generalization, and interpretability. Code is available at https://github.com/QData/Textattack-A2T .
翻译:Aversarial 培训是学习强健的深神经网络的一种方法,在培训期间构建了对抗性实例。然而,最近生成NLP对抗性实例的方法包括组合搜索和昂贵的句子编码器以限制生成的实例。因此,使用香草对抗性培训来改善NLP模型的性能,其好处主要是没有调查的。本文建议为NLP模型建立一个简单、改进的香草对抗性培训程序,我们称之为A2T。A2T的核心部分是为香草对抗性培训优化的一种新的、更廉价的替代词攻击。我们用A2T来培训BERT和ROBERTA模型在IMDB、Rotten Tomatoes、Yelp和SNLI数据集方面,仍然具有挑战性。我们的实验结果表明,利用一个更廉价的对手来培训强的NLP模型是可能的。我们用A2T的香草性对抗性培训可以改进NLP模型对攻击的稳健性。我们可以用NLP2号模型来保护它最初训练的模型,并且捍卫其他类型的单词替代性A2号标准。我们可以在AD2号上显示,我们可以用的一般标准。我们可以改进了A2号标准。