Much research effort has been put to multilingual knowledge graph (KG) embedding methods to address the entity alignment task, which seeks to match entities in different languagespecific KGs that refer to the same real-world object. Such methods are often hindered by the insufficiency of seed alignment provided between KGs. Therefore, we propose an incidentally supervised model, JEANS , which jointly represents multilingual KGs and text corpora in a shared embedding scheme, and seeks to improve entity alignment with incidental supervision signals from text. JEANS first deploys an entity grounding process to combine each KG with the monolingual text corpus. Then, two learning processes are conducted: (i) an embedding learning process to encode the KG and text of each language in one embedding space, and (ii) a selflearning based alignment learning process to iteratively induce the matching of entities and that of lexemes between embeddings. Experiments on benchmark datasets show that JEANS leads to promising improvement on entity alignment with incidental supervision, and significantly outperforms state-of-the-art methods that solely rely on internal information of KGs.
翻译:在多语种知识图(KG)的嵌入方法方面已经做了大量研究,以解决实体协调任务,该方法力求将不同语言特定KG的实体与提及同一真实世界天体的实体相匹配,这些方法往往因各KG之间提供的种子协调不足而受到阻碍。因此,我们提议了一个附带监督的模式,JEANS,该模式在一个共同嵌入方案中共同代表多语言KG和文本公司,并力求改进实体与文本附带监督信号的配合。JEANS首先部署一个实体定位程序,将每个KG与单一语言文本材料结合起来。 然后,开展了两个学习程序:(一) 嵌入学习程序,将KG和每种语言的文本编码成一个嵌入空间,以及(二) 基于自学的调整学习过程,以迭接方式引导实体和嵌入者之间的词汇的匹配。对基准数据集的实验表明,JEANS导致有希望改进实体与附带监督的一致性,并大大超越完全依赖KG内部信息的状态方法。