Neural language models (LMs) represent facts about the world described by text. Sometimes these facts derive from training data (in most LMs, a representation of the word banana encodes the fact that bananas are fruits). Sometimes facts derive from input text itself (a representation of the sentence "I poured out the bottle" encodes the fact that the bottle became empty). Tools for inspecting and modifying LM fact representations would be useful almost everywhere LMs are used: making it possible to update them when the world changes, to localize and remove sources of bias, and to identify errors in generated text. We describe REMEDI, an approach for querying and modifying factual knowledge in LMs. REMEDI learns a map from textual queries to fact encodings in an LM's internal representation system. These encodings can be used as knowledge editors: by adding them to LM hidden representations, we can modify downstream generation to be consistent with new facts. REMEDI encodings can also be used as model probes: by comparing them to LM representations, we can ascertain what properties LMs attribute to mentioned entities, and predict when they will generate outputs that conflict with background knowledge or input text. REMEDI thus links work on probing, prompting, and model editing, and offers steps toward general tools for fine-grained inspection and control of knowledge in LMs.
翻译:神经语言模型(LM)表示有关文本所描述的世界的事实。有时这些事实源自训练数据(在大多数LM中,对“香蕉”这个词的表示表明香蕉是水果)。有时事实来源于输入文本本身(对“我倒出瓶子”的表示表明瓶子变空了)。在LM使用的几乎所有领域,查询和修改LM事实表示的工具都将非常有用:可以用于在世界变化时更新事实,定位和清除偏见源,以及识别生成文本中的错误。我们描述了REMEDI,一种查询和修改LM中实际知识的方法。REMEDI学习了从文本查询到LM内部表示系统中的事实编码的映射。这些编码可以用作知识编辑器:通过将它们添加到LM隐藏表示中,我们可以修改下游生成以与新事实保持一致。REMEDI编码也可以用作模型探头:通过将它们与LM表示进行比较,我们可以确定LM属性归因于提到的实体,预测它们何时会生成与背景知识或输入文本冲突的输出。因此,REMEDI将探测,提示和模型编辑的工作联系起来,为LM的知识细粒度检查和控制提供了步骤。