通过基于星空的受监督的集群化连接和发现实体链接和发现 (Entity Linking and Discovery via Arborescence-based Supervised Clustering)

Previous work has shown promising results in performing entity linking by measuring not only the affinities between mentions and entities but also those amongst mentions. In this paper, we present novel training and inference procedures that fully utilize mention-to-mention affinities by building minimum arborescences (i.e., directed spanning trees) over mentions and entities across documents in order to make linking decisions. We also show that this method gracefully extends to entity discovery, enabling the clustering of mentions that do not have an associated entity in the knowledge base. We evaluate our approach on the Zero-Shot Entity Linking dataset and MedMentions, the largest publicly available biomedical dataset, and show significant improvements in performance for both entity linking and discovery compared to identically parameterized models. We further show significant efficiency improvements with only a small loss in accuracy over previous work, which use more computationally expensive models.

翻译：以往的工作表明,通过不仅衡量提及实体和实体之间的亲和关系,而且衡量其中提及的实体之间的亲和关系,使履约实体建立联系的工作取得了可喜的成果。在本文件中,我们介绍了新的培训和推论程序,通过在提及和文件之间建立最低限度的交融关系(即指向横贯树木)和实体,以便作出联系决定,充分利用提及和实体之间的交接关系,从而充分利用提及和推论关系。我们还表明,这种方法优雅地扩展到实体的发现,使提及在知识库中没有关联实体的组合得以进行。我们评估了我们在零热实体连接数据集和MedMention上的做法,这是最大的公开生物医学数据集,并表明与相同的参数化模型相比,实体连接和发现两方面的业绩都有显著改进。我们进一步显示,与以往工作相比,效率有了显著提高,但准确性仅略微损失,以往的工作使用成本较高的计算模型。