Entity Alignment (EA) identifies entities across databases that refer to the same entity. Knowledge graph-based embedding methods have recently dominated EA techniques. Such methods map entities to a low-dimension space and align them based on their similarities. With the corpus of EA methodologies growing rapidly, this paper presents a comprehensive analysis of various existing EA methods, elaborating their applications and limitations. Further, we distinguish the methods based on their underlying algorithms and the information they incorporate to learn entity representations. Based on challenges in industrial datasets, we bring forward $4$ research questions (RQs). These RQs empirically analyse the algorithms from the perspective of \textit{Hubness, Degree distribution, Non-isomorphic neighbourhood,} and \textit{Name bias}. For Hubness, where one entity turns up as the nearest neighbour of many other entities, we define an $h$-score to quantify its effect on the performance of various algorithms. Additionally, we try to level the playing field for algorithms that rely primarily on name-bias existing in the benchmarking open-source datasets by creating a low name bias dataset. We further create an open-source repository for $14$ embedding-based EA methods and present the analysis for invoking further research motivations in the field of EA.
翻译:以知识图表为基础的嵌入方法最近主导了EA技术。这些方法将实体映射到一个低尺寸空间,并根据相似之处加以调整。随着EA方法的迅速发展,本文件将对现有EA方法的各种现有方法进行综合分析,阐述其应用和局限性。此外,我们根据它们的基本算法以及它们为学习实体表现而纳入的信息,对方法进行区分。根据工业数据集的挑战,我们提出了4美元的研究问题(RQs)。这些RQs从\textit{Hubness,度分布,非线性邻里,}和\textit{偏差}的角度对算法进行了经验分析。对于一个实体作为许多其他实体的近邻出现Hubness,我们定义了以美元为基础的方法,以量化其对各种算法表现的影响。此外,我们试图通过创建当前低名称偏差数据库和14美元的现有基准开源数据集,对基准化的开源数据集进行模拟分析。我们进一步创建了用于在ESA 14 实地研究的开源数据库的开源数据库。