Local models have recently attained astounding performances in Entity Disambiguation (ED), with generative and extractive formulations being the most promising research directions. However, previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title. Although certainly effective, this strategy presents a few critical issues, especially when titles are not sufficiently informative or distinguishable from one another. In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it. We thoroughly evaluate our approach against standard benchmarks in ED and find extractive formulations to be particularly well-suited to these representations: we report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns. We release our code, data and model checkpoints at https://github.com/SapienzaNLP/extend.
翻译:最近,地方模型在实体差异化(ED)中取得了惊人的成绩,最有希望的研究方向是基因化和采掘配方,然而,先前的工作将研究局限在每个候选人的文字表述中,仅使用其维基百科标题,虽然这一战略当然有效,但提出了几个关键问题,特别是标题不够丰富或彼此不尽相同时。在本文件中,我们处理这一局限性,并调查更清晰的文字表述能够在多大程度上减轻这一局限性。我们对照ED的标准基准彻底评估了我们的方法,发现采掘配方特别适合这些表述:我们报告了我们所考虑的6个基准中的2个基准的新艺术状况,并大力改进了对不可见模式的普遍化能力。我们在https://github.com/SapienzaNP/extend公布了我们的代码、数据和示范检查站。