Although pre-trained language models (PLMs) have achieved state-of-the-art performance on various natural language processing (NLP) tasks, they are shown to be lacking in knowledge when dealing with knowledge driven tasks. Despite the many efforts made for injecting knowledge into PLMs, this problem remains open. To address the challenge, we propose \textbf{DictBERT}, a novel approach that enhances PLMs with dictionary knowledge which is easier to acquire than knowledge graph (KG). During pre-training, we present two novel pre-training tasks to inject dictionary knowledge into PLMs via contrastive learning: \textit{dictionary entry prediction} and \textit{entry description discrimination}. In fine-tuning, we use the pre-trained DictBERT as a plugin knowledge base (KB) to retrieve implicit knowledge for identified entries in an input sequence, and infuse the retrieved knowledge into the input to enhance its representation via a novel extra-hop attention mechanism. We evaluate our approach on a variety of knowledge driven and language understanding tasks, including NER, relation extraction, CommonsenseQA, OpenBookQA and GLUE. Experimental results demonstrate that our model can significantly improve typical PLMs: it gains a substantial improvement of 0.5\%, 2.9\%, 9.0\%, 7.1\% and 3.3\% on BERT-large respectively, and is also effective on RoBERTa-large.
翻译:虽然事先培训的语言模型(PLM)在各种自然语言处理任务方面达到了最先进的成绩,但事实证明,在处理知识驱动的任务时,它们缺乏知识。尽管为将知识注入PLM作出了许多努力,但这一问题仍然未解决。为了应对这一挑战,我们建议采用新颖的方法,即提高PLM的字典知识,这种新颖方法比知识图更容易获得。在培训前,我们提出两项新的培训前任务,通过对比学习将字典知识注入PLM:\ Textit{字典输入预测}和\ textitle{描述歧视}。在微调中,我们使用预先培训的DictBERT作为插件知识库(KB),为输入序列中已确定的条目检索隐含的知识,并将检索的知识纳入投入中,通过新颖的额外关注机制加强它的代表性。我们评价了我们关于各种知识驱动和语言理解任务的方法,包括NER、关系提取模型、ComenseQA、0.5-BQA和MLMLMA的大幅改进。