In our continuously evolving world, entities change over time and new, previously non-existing or unknown, entities appear. We study how this evolutionary scenario impacts the performance on a well established entity linking (EL) task. For that study, we introduce TempEL, an entity linking dataset that consists of time-stratified English Wikipedia snapshots from 2013 to 2022, from which we collect both anchor mentions of entities, and these target entities' descriptions. By capturing such temporal aspects, our newly introduced TempEL resource contrasts with currently existing entity linking datasets, which are composed of fixed mentions linked to a single static version of a target Knowledge Base (e.g., Wikipedia 2010 for CoNLL-AIDA). Indeed, for each of our collected temporal snapshots, TempEL contains links to entities that are continual, i.e., occur in all of the years, as well as completely new entities that appear for the first time at some point. Thus, we enable to quantify the performance of current state-of-the-art EL models for: (i) entities that are subject to changes over time in their Knowledge Base descriptions as well as their mentions' contexts, and (ii) newly created entities that were previously non-existing (e.g., at the time the EL model was trained). Our experimental results show that in terms of temporal performance degradation, (i) continual entities suffer a decrease of up to 3.1% EL accuracy, while (ii) for new entities this accuracy drop is up to 17.9%. This highlights the challenge of the introduced TempEL dataset and opens new research prospects in the area of time-evolving entity disambiguation.
翻译:在我们不断演变的世界中,实体随时间而变化,新的、以前不存在或未知的实体出现。我们研究了这一进化假想如何影响连接(EL)任务(EL)的成熟实体的绩效。我们的研究是,我们引入TemeEL,这是一个连接数据集的实体,该数据集由2013年至2022年经过时间批准的英国维基百科快照组成,我们从中收集了对实体和这些目标实体的首个提示,以及这些目标实体的描述。通过捕捉这些时间方面,我们新引入的TempEL资源与当前连接数据集的实体的开放性(数据集由固定提及与目标知识库的单一静态版本(例如,维基百科2010年的维基百科-AIDA)相链接。事实上,对于我们收集的每个时间缩略图中,TemeEL包含与持续运行的实体的链接,也就是说,从所有年份中,以及第一次出现的新实体。因此,我们能够量化当前电子定位模型的运行情况,即:(i)实体在17岁时的准确度描述中会发生变化,而新实体的运行状况显示的是E-时间的变化,而实体在以往的运行中(i)显示的是E-i)在时间里值中,在时间里,在时间里的数据是,在时间上,在时间里(显示的是,在时间上,在时间上,在时间里)实体的绩效中,在时间上的数据是,在时间上的数据是,在实验性实体。(显示的是,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上显示的是,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在实验上,在时间上,在实验上,在实验上,在实验上,在实验上,在实验上,在实验上,在时间上,在时间上,在时间上,在时间上,在时间上,在上,在上,在上,在实验上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,在时间上,