在 GPT 中定位和编辑事实知识 (Locating and Editing Factual Knowledge in GPT)

We investigate the mechanisms underlying factual knowledge recall in autoregressive transformer language models. First, we develop a causal intervention for identifying neuron activations capable of altering a model's factual predictions. Within large GPT-style models, this reveals two distinct sets of neurons that we hypothesize correspond to knowing an abstract fact and saying a concrete word, respectively. This insight inspires the development of ROME, a novel method for editing facts stored in model weights. For evaluation, we assemble CounterFact, a dataset of over twenty thousand counterfactuals and tools to facilitate sensitive measurements of knowledge editing. Using CounterFact, we confirm the distinction between saying and knowing neurons, and we find that ROME achieves state-of-the-art performance in knowledge editing compared to other methods. An interactive demo notebook, full code implementation, and the dataset are available at https://rome.baulab.info/.

翻译：我们调查了自动递减变压器语言模型中的事实知识回溯机制。首先,我们开发了一种因果干预机制,以识别能够改变模型事实预测的神经激活。在大型的GPT型模型中,这揭示了两组不同的神经元,我们假设这些神经元分别相当于了解抽象事实和说一个具体词。这种洞察力激励了ROME的发展,这是编辑存储在模型重量中的事实的新颖方法。为了评估,我们收集了由2万多个反事实组成的数据集,以及便于敏感测量知识编辑的工具。我们利用反事实,我们确认了对神经元的描述和了解之间的区别,我们发现ROME与其他方法相比,在知识编辑方面实现了最先进的表现。交互式的演示笔记本、完整的代码执行和数据集可在 https://rome.baulab.info/上查阅。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日