Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks.
翻译:然而,最近的研究表明,这种基于BERT的模型在文本对抗性攻击的威胁面前十分脆弱。我们的目标是从信息理论的角度解决这一问题,并提出“InfoBERT”这一新的学习框架,以对预先培训的语言模型进行强有力的微调。InfoBERT包含两个基于相互信息的示范培训管理者:(一) 信息瓶颈常规化器,它压制了输入和特征代表之间的相互信息噪音;(二) 强性特征常规化器,它增加了当地强性特征和全球特征之间的相互信息。我们提供了一种原则性方法,从理论上分析和改进在标准培训和对抗性培训中语言模型代表学习的稳健性。广泛的实验表明,InfoBERT在自然语言推断和问题回答(QA)的若干对称数据系统中达到了最先进的精确度。