We consider the task of document-level entity linking (EL), where it is important to make consistent decisions for entity mentions over the full document jointly. We aim to leverage explicit "connections" among mentions within the document itself: we propose to join the EL task with that of coreference resolution (coref). This is complementary to related works that exploit either (i) implicit document information (e.g., latent relations among entity mentions, or general language models) or (ii) connections between the candidate links (e.g, as inferred from the external knowledge base). Specifically, we cluster mentions that are linked via coreference, and enforce a single EL for all of the clustered mentions together. The latter constraint has the added benefit of increased coverage by joining EL candidate lists for the thus clustered mentions. We formulate the coref+EL problem as a structured prediction task over directed trees and use a globally normalized model to solve it. Experimental results on two datasets show a boost of up to +5% F1-score on both coref and EL tasks, compared to their standalone counterparts. For a subset of hard cases, with individual mentions lacking the correct EL in their candidate entity list, we obtain a +50% increase in accuracy.
翻译:我们认为,文件级实体联系(EL)的任务很重要,因为对于实体的一致决定必须在整个文件中共同提及整个文件。我们的目标是利用文件中提及的内容之间的明确“连接”作用:我们提议将EL任务与共同参考决议(coref)的任务合并起来。这是对以下相关工作的补充:(一) 利用隐含的文件信息(例如,实体之间的潜在关系,或通用语言模式)或(二) 候选人链接(例如,外部知识库推导的外部知识库)之间的联系。具体地说,我们分组提到,通过共同参照,对所有组别提及的内容实施单一的EL。后一种制约是,通过加入EL候选人名单,增加涵盖范围,从而提及集群。我们把核心f+EL问题作为结构化的预测任务,并使用全球标准化的模式加以解决。两个数据集的实验结果显示,核心f和EL任务之间均有+5%的连接,而核心和EL任务则通过相互独立对应的连接,并对所有集群提及的所有任务实施单一的EL。后一种制约是,通过加入EL候选人名单增加覆盖面而增加覆盖面的好处。我们把核心+EL问题作为结构上的预测任务组合,在单个实体中缺乏正确的EL。