Gradient-based adversarial training is widely used in improving the robustness of neural networks, while it cannot be easily adapted to natural language processing tasks since the embedding space is discrete. In natural language processing fields, virtual adversarial training is introduced since texts are discrete and cannot be perturbed by gradients directly. Alternatively, virtual adversarial training, which generates perturbations on the embedding space, is introduced in NLP tasks. Despite its success, existing virtual adversarial training methods generate perturbations roughly constrained by Frobenius normalization balls. To craft fine-grained perturbations, we propose a Token-Aware Virtual Adversarial Training method. We introduce a token-level accumulated perturbation vocabulary to initialize the perturbations better and use a token-level normalization ball to constrain these perturbations pertinently. Experiments show that our method improves the performance of pre-trained models such as BERT and ALBERT in various tasks by a considerable margin. The proposed method improves the score of the GLUE benchmark from 78.3 to 80.9 using BERT model and it also enhances the performance of sequence labeling and text classification tasks.
翻译:在自然语言处理领域,引入了虚拟对抗性培训,因为文本是互不相连的,并且不会直接受到梯度的干扰。或者,虚拟对抗性培训在NLP的任务中引入了在嵌入空间上引起扰动的虚拟对抗性培训。尽管它取得了成功,但现有的虚拟对抗性培训方法仍然产生了受Frobenius正常化球制约的干扰。对于精细的扰动,我们建议采用Token-Aware虚拟反向培训方法。我们引入了一种象征性水平的累积扰动词汇,以更好地启动扰动,并使用象征性水平的正常化球来限制这些扰动。实验表明,我们的方法在相当大程度上改进了诸如BERT和ALBERT等经过预先训练的模型在各种任务中的性能。拟议方法将GLUE基准的分数从78.3提高到80.9,并用BERT模型提升了文本的等级。