Large-scale pretrained language models are surprisingly good at recalling factual knowledge presented in the training corpus. In this paper, we explore how implicit knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons. Given a relational fact, we propose a knowledge attribution method to identify the neurons that express the fact. We present that the activation of such knowledge neurons is highly correlated to the expression of their corresponding facts. In addition, even without fine-tuning, we can leverage knowledge neurons to explicitly edit (such as update, and erase) specific factual knowledge for pretrained Transformers.
翻译:大规模预先培训的语言模型在回顾培训材料中提供的事实知识方面出人意料地十分出色。 在本文中,我们通过引入知识神经元的概念,探索隐含知识如何储存在预先培训的变异器中。基于一个关联性事实,我们提出了一个知识归属方法来识别能表达事实的神经元。我们提出,这种知识神经元的激活与其相应事实的表达密切相关。此外,即使不进行微调,我们也可以利用知识神经元来明确编辑(如更新和删除)预培训变异器的具体事实知识。