Entity alignment is the task of finding entities in two knowledge bases (KBs) that represent the same real-world object. When facing KBs in different natural languages, conventional cross-lingual entity alignment methods rely on machine translation to eliminate the language barriers. These approaches often suffer from the uneven quality of translations between languages. While recent embedding-based techniques encode entities and relationships in KBs and do not need machine translation for cross-lingual entity alignment, a significant number of attributes remain largely unexplored. In this paper, we propose a joint attribute-preserving embedding model for cross-lingual entity alignment. It jointly embeds the structures of two KBs into a unified vector space and further refines it by leveraging attribute correlations in the KBs. Our experimental results on real-world datasets show that this approach significantly outperforms the state-of-the-art embedding approaches for cross-lingual entity alignment and could be complemented with methods based on machine translation.
翻译:实体对齐是在两个知识库(KBs)中找到实体的任务,这两个知识库代表着相同的现实世界对象。当面对不同自然语言的KBs时,传统的跨语言实体对齐方法依靠机器翻译消除语言障碍。这些方法往往因语言翻译质量参差不齐而受到影响。虽然最近基于嵌入的技术将实体和关系编码在KBs,不需要机器翻译来进行跨语言实体对齐,但大量属性基本上仍未探讨。在本文件中,我们提议为跨语言实体对齐建立一个联合属性保留嵌入模型。它将两个KB的结构嵌入一个统一的矢量空间,并通过利用KBs中的属性关联来进一步完善它。我们在现实世界数据集上的实验结果显示,这一方法大大超越了跨语言实体对齐的最先进的嵌入方法,并且可以用基于机器翻译的方法加以补充。