Language models are trained on large volumes of text, and as a result their parameters might contain a significant body of factual knowledge. Any downstream task performed by these models implicitly builds on these facts, and thus it is highly desirable to have means for representing this body of knowledge in an interpretable way. However, there is currently no mechanism for such a representation. Here, we propose to address this goal by extracting a knowledge-graph of facts from a given language model. We describe a procedure for ``crawling'' the internal knowledge-base of a language model. Specifically, given a seed entity, we expand a knowledge-graph around it. The crawling procedure is decomposed into sub-tasks, realized through specially designed prompts that control for both precision (i.e., that no wrong facts are generated) and recall (i.e., the number of facts generated). We evaluate our approach on graphs crawled starting from dozens of seed entities, and show it yields high precision graphs (82-92%), while emitting a reasonable number of facts per entity.
翻译:语言模型对大量文本进行了培训,因此,它们的参数可能包含大量的事实知识。这些模型执行的任何下游任务都隐含着这些事实,因此极有必要以可解释的方式代表这些知识。然而,目前没有这种表述的机制。在这里,我们提议通过从一个特定语言模型中提取事实知识图来实现这一目标。我们描述了“绘制”一个语言模型的内部知识库的程序。具体地说,根据一个种子实体,我们扩展了它周围的知识图。爬动程序分解成子任务,通过专门设计的控制精确性(即不产生错误事实)和回忆(即产生事实的数量)的提示实现。我们从几十个种子实体开始,我们评估我们绘制图表的方法,并显示它产生高精度的图表(82-92%),同时每个实体公布合理数量的事实。