Entity Alignment (EA) aims to match equivalent entities across different Knowledge Graphs (KGs) and is an essential step of KG fusion. Current mainstream methods -- neural EA models -- rely on training with seed alignment, i.e., a set of pre-aligned entity pairs which are very costly to annotate. In this paper, we devise a novel Active Learning (AL) framework for neural EA, aiming to create highly informative seed alignment to obtain more effective EA models with less annotation cost. Our framework tackles two main challenges encountered when applying AL to EA: (1) How to exploit dependencies between entities within the AL strategy. Most AL strategies assume that the data instances to sample are independent and identically distributed. However, entities in KGs are related. To address this challenge, we propose a structure-aware uncertainty sampling strategy that can measure the uncertainty of each entity as well as its impact on its neighbour entities in the KG. (2) How to recognise entities that appear in one KG but not in the other KG (i.e., bachelors). Identifying bachelors would likely save annotation budget. To address this challenge, we devise a bachelor recognizer paying attention to alleviate the effect of sampling bias. Empirical results show that our proposed AL strategy can significantly improve sampling quality with good generality across different datasets, EA models and amount of bachelors.
翻译:实体对齐(EA)旨在匹配不同知识图(KGs)的同等实体,这是KG聚合的一个重要步骤。目前的主流方法 -- -- 神经EA模型 -- -- 依赖种子对齐的培训,即一套对齐前实体的配对,对于注释来说成本很高。在本文中,我们为神经EA设计了一个全新的积极学习(AL)框架,目的是创造高度信息化种子对齐,以较低的批注成本获得更有效的EA模型。我们的框架应对了应用ALE时遇到的两大挑战:(1) 如何利用AL战略中各实体之间的依赖性。大多数AL战略假定样本中的数据实例是独立的,分布相同。然而,KGs的实体是相关的。为了应对这一挑战,我们提出了一种结构化的不确定性抽样战略,可以衡量每个实体的不确定性及其对KG的邻国实体的影响。(2) 如何识别一个KG中出现但另一个KG(即单身)的实体。 确定学士可能节省说明性预算。但是,大多数AL战略假设样本中的数据实例是独立的,分布相同。然而,KGGs的实体是相关的。为了应对这一挑战,我们提出了结构上的不确定性抽样分析结果,我们要展示一个不同的标准质量,从而展示一个不同的标准。