Seeking the equivalent entities among multi-source Knowledge Graphs (KGs) is the pivotal step to KGs integration, also known as \emph{entity alignment} (EA). However, most existing EA methods are inefficient and poor in scalability. A recent summary points out that some of them even require several days to deal with a dataset containing 200,000 nodes (DWY100K). We believe over-complex graph encoder and inefficient negative sampling strategy are the two main reasons. In this paper, we propose a novel KG encoder -- Dual Attention Matching Network (Dual-AMN), which not only models both intra-graph and cross-graph information smartly, but also greatly reduces computational complexity. Furthermore, we propose the Normalized Hard Sample Mining Loss to smoothly select hard negative samples with reduced loss shift. The experimental results on widely used public datasets indicate that our method achieves both high accuracy and high efficiency. On DWY100K, the whole running process of our method could be finished in 1,100 seconds, at least 10* faster than previous work. The performances of our method also outperform previous works across all datasets, where Hits@1 and MRR have been improved from 6% to 13%.
翻译:在多源知识图(KGs)中寻找等效实体是KGs集成的关键步骤,也称为 emph{entity 匹配(EA) 。 然而,大多数现有的EA方法效率低,可缩放性差。最近的一份摘要指出,其中一些方法甚至需要几天才能处理包含20万节点的数据集(DWY100K)。我们认为,过分复杂的图形编码器和低效的负抽样策略是两个主要原因。在本文件中,我们提议了一个新的 KG 编码器 -- -- 双重注意匹配网络(Dual-AMN) -- -- 不仅精巧地模拟内部和跨版信息,而且还大大降低了计算复杂性。此外,我们建议普通化的硬采样损失可以顺利地选择硬的负样品,减少损失的转移。广泛使用的公共数据集的实验结果表明,我们的方法既能达到很高的精确度,效率也很高。在DWY100K上,我们方法的整个运行过程可以在1100秒内完成,至少10秒内完成。我们的方法的性能超过先前的工作,至少10秒内,并且大大降低计算器的复杂性。我们的方法在13%和MRRs的性。