GPT 中定位和编辑事实协会 (Locating and Editing Factual Associations in GPT)

We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model's factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/

翻译：我们分析自动递减变压器语言模型中事实关联的存储和回顾情况,找到这些关联符合本地、直接编辑计算的证据。我们首先开发因果干预,以识别在模型事实预测中具有决定性意义的神经激活。这揭示了中层进化前方向模块中一套截然不同的步骤,在其中对事实预测进行调解,同时处理主题符号。为了检验我们的假设,即这些计算与事实关联相对应,我们修改进向前加权,以利用标准“一号模型编辑”更新具体的事实关联。我们发现,ROME在标准零点关系提取(zRE)模型编辑任务上是有效的,可以与现有方法相比。为了进行更敏感的评估,我们还评估了一套反事实指控的新数据集的ROME,同时保持了特性和概括性,而其他方法则牺牲了一种或另一种。我们的结果证实了中层进向前进模块在存储事实关联中的重要作用,并建议直接调整计算机制可能是模式编辑的一种可行方法。代码、数据设置、可视像化和互动的演示式,在 https_drobredudealations。