Recent days have witnessed a diverse set of knowledge injection models for pre-trained language models (PTMs); however, most previous studies neglect the PTMs' own ability with quantities of implicit knowledge stored in parameters. A recent study has observed knowledge neurons in the Feed Forward Network (FFN), which are responsible for expressing factual knowledge. In this work, we propose a simple model, Kformer, which takes advantage of the knowledge stored in PTMs and external knowledge via knowledge injection in Transformer FFN layers. Empirically results on two knowledge-intensive tasks, commonsense reasoning (i.e., SocialIQA) and medical question answering (i.e., MedQA-USMLE), demonstrate that Kformer can yield better performance than other knowledge injection technologies such as concatenation or attention-based injection. We think the proposed simple model and empirical findings may be helpful for the community to develop more powerful knowledge injection methods. Code available in https://github.com/zjunlp/Kformer.
翻译:近几天来,为经过培训的语言模式(PTMs)建立了一套多样化的知识注入模式;然而,大多数前几次研究忽视了PTMs自身的能力,将大量隐性知识储存在参数中;最近的一项研究发现,在Feed Front网络(FFN)中,知识神经元负责表达事实知识;在这项工作中,我们提出了一个简单的模式,即Kexer,利用在FFFFFFFD层次通过知识注入储存在PTMs和外部知识中的知识;两项知识密集型任务、常识推理(即社会QA)和医学问题解答(即MedQA-USMLE)的结果,表明Kexer比其他知识注入技术(如聚合或以注意力为基础的注射)能够产生更好的效果。我们认为,拟议的简单模型和经验发现可能有助于社区开发更强大的知识注入方法。可在 https://github.com/zjunlp/Kref中查阅的代码。