Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We aim to address this problem from an information-theoretic perspective, and propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models. InfoBERT contains two mutual-information-based regularizers for model training: (i) an Information Bottleneck regularizer, which suppresses noisy mutual information between the input and the feature representation; and (ii) a Robust Feature regularizer, which increases the mutual information between local robust features and global features. We provide a principled way to theoretically analyze and improve the robustness of representation learning for language models in both standard and adversarial training. Extensive experiments demonstrate that InfoBERT achieves state-of-the-art robust accuracy over several adversarial datasets on Natural Language Inference (NLI) and Question Answering (QA) tasks. Our code is available at https://github.com/AI-secure/InfoBERT.
翻译:然而,最近的研究表明,这种基于BERT的模型很容易面对文字对抗性攻击的威胁。我们的目标是从信息理论的角度解决这一问题,并提出InfoBERT,这是对预先培训的语言模型进行严格微调的新学习框架。InfoBERT包含两个基于相互信息的示范培训规范器:(一) 信息瓶式常规化器,它压制输入和特征代表之间的相互信息噪音;(二) 机械化功能常规化器,它增加了地方强势特征和全球特征之间的相互信息。我们提供了一种原则性方法,从理论上分析和改进标准培训和对抗性培训中语言模型代表学习的稳健性。广泛的实验表明,InfoBERT在自然语言推断(NLI)和问题解答(QA)的若干对抗性数据设置上达到了最先进的精确度。我们的代码可在 https://giuthub.AIcom网站上查阅。