Temporal concept drift refers to the problem of data changing over time. In NLP, that would entail that language (e.g. new expressions, meaning shifts) and factual knowledge (e.g. new concepts, updated facts) evolve over time. Focusing on the latter, we benchmark $11$ pretrained masked language models (MLMs) on a series of tests designed to evaluate the effect of temporal concept drift, as it is crucial that widely used language models remain up-to-date with the ever-evolving factual updates of the real world. Specifically, we provide a holistic framework that (1) dynamically creates temporal test sets of any time granularity (e.g. month, quarter, year) of factual data from Wikidata, (2) constructs fine-grained splits of tests (e.g. updated, new, unchanged facts) to ensure comprehensive analysis, and (3) evaluates MLMs in three distinct ways (single-token probing, multi-token generation, MLM scoring). In contrast to prior work, our framework aims to unveil how robust an MLM is over time and thus to provide a signal in case it has become outdated, by leveraging multiple views of evaluation.
翻译:时间概念的漂移是指随着时间而变化的数据问题。 在《国家劳工政策》中,这将要求语言(例如,新的表达方式,意谓变化)和事实知识(例如,新概念,最新事实)随着时间的变化而演变。以后者为重点,我们以一系列测试为基准,将经过预先训练的隐形语言模型(MLMs)基准为1,100美元,用于评价时间概念漂移的影响,因为广泛使用的语言模型必须跟上真实世界不断变化的事实更新。具体地说,我们提供了一个整体框架,这一框架将(1) 动态地创造出任何时间(例如,月、季度、一年)从Wikigata得到的事实数据的时间性(例如,新概念,不变事实)的时间性时间性测试成套时间性测试(例如,新数据),建立精细的测试分类(MLMms),以确保全面分析,(3) 以三种不同的方式评估MLMs(单向滚动、多向一代、MLM评分数)至关重要。与以前的工作不同,我们的框架的目的是要揭示MMMM的坚固度超过时间,从而在利用过时的信号。</s>