Pretrained language models (PLMs) perform poorly under adversarial attacks. To improve the adversarial robustness, adversarial data augmentation (ADA) has been widely adopted to cover more search space of adversarial attacks by adding textual adversarial examples during training. However, the number of adversarial examples for text augmentation is still extremely insufficient due to the exponentially large attack search space. In this work, we propose a simple and effective method to cover a much larger proportion of the attack search space, called Adversarial and Mixup Data Augmentation (AMDA). Specifically, AMDA linearly interpolates the representations of pairs of training samples to form new virtual samples, which are more abundant and diverse than the discrete text adversarial examples in conventional ADA. Moreover, to fairly evaluate the robustness of different models, we adopt a challenging evaluation setup, which generates a new set of adversarial examples targeting each model. In text classification experiments of BERT and RoBERTa, AMDA achieves significant robustness gains under two strong adversarial attacks and alleviates the performance degradation of ADA on the clean data. Our code is released at: https://github.com/thunlp/MixADA .
翻译:培训前语言模型在对抗性攻击下表现不佳。为了提高对抗性强力,广泛采用对抗性数据增强(ADA),在培训期间添加了文字对抗性攻击实例,以覆盖对抗性攻击的更多搜索空间;然而,由于攻击性攻击性攻击搜索空间极大,增强文本的对抗性实例数量仍然极为不足;在这项工作中,我们提出了一个简单而有效的方法,以涵盖攻击性搜索空间中大得多的部分,称为反向和混合数据增强(AMDA),具体地说,AMDA线性地将一对培训样本的展示用于形成新的虚拟样本,这些样本比传统ADA的单独文本对抗性攻击性例子丰富多样。此外,为了公平评估不同模型的健全性,我们采用了具有挑战性的评价设置,针对每一种模型产生了一套新的对抗性实例。在BERT和RoBERTA的文本分类实验中,AMDA在两次强烈的对抗性攻击下取得了显著的强力增益,并减轻了ADADA在清洁数据上的性退化。我们的代码公布于: https://github.com/unpl.comml。我们的代码在http://ADADADADADADADADADADADADADADA/unp.comp.comp.comp.comp.comp.