Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated. However, in a dynamic world, new entities constantly arise. We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained. We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity. We evaluate LMs' perplexity on masked spans within these sentences. We show that models more informed about the entities, such as those with access to a textual definition of them, achieve lower perplexity on this benchmark. Our experimental results demonstrate that making inferences about new entities remains difficult for LMs. Given its wide coverage on entity knowledge and temporal indexing, our dataset can be used to evaluate LMs and techniques designed to modify or extend their knowledge. Our automatic data collection pipeline can be easily used to continually update our benchmark.
翻译:语言模型(LMS)通常在大型实体中受过一次培训,并且使用多年而没有更新。然而,在一个动态的世界中,新实体不断出现。我们提议一个框架,分析LMS能够推断出哪些新实体在LMs培训前不存在。我们根据实体的起源日期和英文维基百科文章编制一套数据库,从中可以找到关于每个实体的句子。我们评估LMS在这些句子中蒙面宽度上的模糊性。我们显示,关于这些实体的模型,例如那些有文本定义的模型,在这种基准上实现更低的不易理解性。我们的实验结果显示,对新实体的推断仍然很难。鉴于其对实体知识和时间索引的广泛覆盖,我们的数据集可以用来评价LMs和旨在修改或扩展其知识的技术。我们的自动数据收集管道可以很容易地用来不断更新我们的基准。