Language models (LMs) pretrained on large corpora of text from the web have been observed to contain large amounts of various types of knowledge about the world. This observation has led to a new and exciting paradigm in knowledge graph construction where, instead of manual curation or text mining, one extracts knowledge from the parameters of an LM. Recently, it has been shown that finetuning LMs on a set of factual knowledge makes them produce better answers to queries from a different set, thus making finetuned LMs a good candidate for knowledge extraction and, consequently, knowledge graph construction. In this paper, we analyze finetuned LMs for factual knowledge extraction. We show that along with its previously known positive effects, finetuning also leads to a (potentially harmful) phenomenon which we call Frequency Shock, where at the test time the model over-predicts rare entities that appear in the training set and under-predicts common entities that do not appear in the training set enough times. We show that Frequency Shock leads to a degradation in the predictions of the model and beyond a point, the harm from Frequency Shock can even outweigh the positive effects of finetuning, making finetuning harmful overall. We then consider two solutions to remedy the identified negative effect: 1- model mixing and 2- mixture finetuning with the LM's pre-training task. The two solutions combined lead to significant improvements compared to vanilla finetuning.
翻译:在网上大量文本组合上预先培训的语言模型(LMS)已经观测到,它包含着大量关于世界的各种知识。这一观察已经导致知识图构建中出现了一种新的令人兴奋的新模式,在这个模式中,人们从LM的参数中提取知识。 最近,人们已经表明,对一组事实知识的微调LMS进行微调,使它们对不同组合的查询有更好的答案,从而使微调LMS成为知识提取的好选择,并因此成为知识图表构建的好选择。在本文中,我们分析微调LMS对事实知识提取的微调。我们发现,除了先前已知的积极效果外,微调还导致一种(潜在有害)现象,我们称之为频率震荡。 在测试时,模型超常预知了在培训组合中出现的稀有实体,而没有在培训中出现足够时间的普通实体。我们发现,频率震荡导致模型和超出某个点的预测的退化。我们发现,频率震荡的伤害甚至超过微调的两种解决办法,甚至超过微调的正面效果,对有害混合物进行微调前的调整。我们认为,两种解决办法比重调。