Masked language modeling (MLM) plays a key role in pretraining large language models. But the MLM objective is often dominated by high-frequency words that are sub-optimal for learning factual knowledge. In this work, we propose an approach for influencing MLM pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks. We force the language model to prioritize informative words in a fully unsupervised way. Experiments demonstrate that the proposed approach can significantly improve the performance of pretrained language models on tasks such as factual recall, question answering, sentiment analysis, and natural language inference in a closed-book setting.
翻译:掩码语言建模(MLM)在预训练大型语言模型中扮演关键角色。但MLM目标通常被高频词占据,对于学习事实知识不够优化。在本研究中,我们提出了一种影响MLM预训练的方法,以提高语言模型在各种知识密集型任务上的性能。我们以完全无监督的方式强制语言模型优先考虑信息量大的词。实验表明,所提出的方法可以显著提高预训练语言模型在事实回忆,问答,情感分析和自然语言推断等封闭式任务中的性能。