Named entity linking (NEL) in news is a challenging endeavour due to the frequency of unseen and emerging entities, which necessitates the use of unsupervised or zero-shot methods. However, such methods tend to come with caveats, such as no integration of suitable knowledge bases (like Wikidata) for emerging entities, a lack of scalability, and poor interpretability. Here, we consider person disambiguation in Quotebank, a massive corpus of speaker-attributed quotations from the news, and investigate the suitability of intuitive, lightweight, and scalable heuristics for NEL in web-scale corpora. Our best performing heuristic disambiguates 94% and 63% of the mentions on Quotebank and the AIDA-CoNLL benchmark, respectively. Additionally, the proposed heuristics compare favourably to the state-of-the-art unsupervised and zero-shot methods, Eigenthemes and mGENRE, respectively, thereby serving as strong baselines for unsupervised and zero-shot entity linking.
翻译:新闻链接(NEL)是一个挑战性的工作,因为隐形和新兴实体频频出现,因此需要使用无人监督或零射法。然而,这类方法往往附带一些警告,例如,对新兴实体没有适当的知识基础(如维基数据)的整合(如维基数据),缺乏可缩放性,以及解释性差。这里,我们认为,在引文库中,个人不清晰,这是大量来自新闻的语音引文,并调查直观、轻量和可扩展的超音率对网络规模公司中NEL的适宜性。我们最出色的超常不协调的词库和AIDA-CoNLL基准分别占到94%和63%。此外,拟议的超音率分别与最先进的、不受监管和零射线方法、Eigenextes和MGENRE相比是有利的,从而成为未受监管和零射线实体连接的强基线。