Pre-trained language models (PLMs) like BERT have made significant progress in various downstream NLP tasks. However, by asking models to do cloze-style tests, recent work finds that PLMs are short in acquiring knowledge from unstructured text. To understand the internal behaviour of PLMs in retrieving knowledge, we first define knowledge-baring (K-B) tokens and knowledge-free (K-F) tokens for unstructured text and ask professional annotators to label some samples manually. Then, we find that PLMs are more likely to give wrong predictions on K-B tokens and attend less attention to those tokens inside the self-attention module. Based on these observations, we develop two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner. Experiments on knowledge-intensive tasks show the effectiveness of the proposed methods. To our best knowledge, we are the first to explore fully self-supervised learning of knowledge in continual pre-training.
翻译:BERT等经过事先培训的语言模型(PLM)在下游的NLP任务中取得了显著进展。然而,通过要求模型进行凝胶式测试,最近的工作发现,PLM在从无结构文本中获得知识方面做得不够。为了理解PLM在检索知识方面的内部行为,我们首先为无结构文本定义了知识约束符号和知识无源符号(K-B),并要求专业的告示员手工标出一些样本。然后,我们发现PLM更有可能对K-B标志作出错误的预测,而较少注意自省模块中的这些符号。基于这些观察,我们开发了两种解决方案,以帮助模型以完全自控的方式从未结构文本中学习更多的知识。关于知识密集型任务的实验显示了拟议方法的有效性。我们最了解的是,我们首先探索在持续培训前充分自我监督的知识学习。