Entity Linking is the task of matching a mention to an entity in a given knowledge base (KB). It contributes to annotating a massive amount of documents existing on the Web to harness new facts about their matched entities. However, existing Entity Linking systems focus on developing models that are typically domain-dependent and robust only to a particular knowledge base on which they have been trained. The performance is not as adequate when being evaluated on documents and knowledge bases from different domains. Approaches based on pre-trained language models, such as Wu et al. (2020), attempt to solve the problem using a zero-shot setup, illustrating some potential when evaluated on a general-domain KB. Nevertheless, the performance is not equivalent when evaluated on a domain-specific KB. To allow for more accurate Entity Linking across different domains, we propose our framework: Cross-Domain Neural Entity Linking (CDNEL). Our objective is to have a single system that enables simultaneous linking to both the general-domain KB and the domain-specific KB. CDNEL works by learning a joint representation space for these knowledge bases from different domains. It is evaluated using the external Entity Linking dataset (Zeshel) constructed by Logeswaran et al. (2019) and the Reddit dataset collected by Botzer et al. (2021), to compare our proposed method with the state-of-the-art results. The proposed framework uses different types of datasets for fine-tuning, resulting in different model variants of CDNEL. When evaluated on four domains included in the Zeshel dataset, these variants achieve an average precision gain of 9%.
翻译:实体链接是一项任务,即在特定知识库(KB)中将提及与某个实体进行匹配。它有助于说明网络上存在的大量文件,以掌握与其相匹配的实体的新事实。然而,现有的实体链接系统侧重于开发典型地以域为依存和稳健的模型,这些模型只与培训它们所基于的特定知识库相匹配。在对不同领域的文件和知识库进行评估时,这种性能并不充分。基于Wu等人(202020年)等经过预先培训的语言模型的方法,试图使用零点显示的设置来解决问题,在对普通域 KB 进行评价时展示一些潜力。然而,在对特定域 KB 进行评价时,这种性能并不相等。为了更精确地将实体链接到不同的领域,我们提出了框架:跨度神经实体链接(CDNEL) 。我们的目标是建立一个单一的系统,以便能够同时连接到一般域模型KB和特定域 KB(202020年)。CDNEL通过从不同的域中学习这些知识库的联合代表空间,在不同的域中,通过外部数据库将数据格式链接到数据系统(20年) 数据系统,这些数据库的变异的计算。