Entity linking (EL) is the task of automatically identifying entity mentions in text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. Throughout the past decade, a plethora of EL systems and pipelines have become available, where performance of individual systems varies heavily across corpora, languages or domains. Linking performance varies even between different mentions in the same text corpus, where, for instance, some EL approaches are better able to deal with short surface forms while others may perform better when more context information is available. To this end, we argue that performance may be optimised by exploiting results from distinct EL systems on the same corpus, thereby leveraging their individual strengths on a per-mention basis. In this paper, we introduce a supervised approach which exploits the output of multiple ready-made EL systems by predicting the correct link on a per-mention basis. Experimental results obtained on existing ground truth datasets and exploiting three state-of-the-art EL systems show the effectiveness of our approach and its capacity to significantly outperform the individual EL systems as well as a set of baseline methods.
翻译:实体链接(EL)是自动识别文本中提到的实体的任务,并在维基百科这样的参考知识库中将其解决给相应的实体。过去十年来,大量EL系统和管道已经出现,各个系统的业绩在公司、语言或领域之间有很大差异。连结性能甚至在同一文本体中不同提及的情况之间有差异,例如,有些EL方法能够更好地处理短表面形式,而另一些方法在获得更多背景信息时可能表现更好。为此,我们认为,通过利用不同EL系统在同一体上的不同EL系统的结果,从而在观察的基础上利用它们各自的优势,可以优化其绩效。在本文件中,我们采用了一种监督办法,利用多种简易EL系统的产出,办法是预测每组的正确联系。从现有的地面真相数据集获得的实验结果,并利用三种最先进的EL系统,显示了我们的方法的有效性及其显著超越单个EL系统的能力,以及一套基线方法。