Knowledge embeddings (KE) represent a knowledge graph (KG) by embedding entities and relations into continuous vector spaces. Existing methods are mainly structure-based or description-based. Structure-based methods learn representations that preserve the inherent structure of KGs. They cannot well represent abundant long-tail entities in real-world KGs with limited structural information. Description-based methods leverage textual information and language models. Prior approaches in this direction barely outperform structure-based ones, and suffer from problems like expensive negative sampling and restrictive description demand. In this paper, we propose LMKE, which adopts Language Models to derive Knowledge Embeddings, aiming at both enriching representations of long-tail entities and solving problems of prior description-based methods. We formulate description-based KE learning with a contrastive learning framework to improve efficiency in training and evaluation. Experimental results show that LMKE achieves state-of-the-art performance on KE benchmarks of link prediction and triple classification, especially for long-tail entities.
翻译:知识嵌入(KE)是通过将实体和关系嵌入连续矢量空间的知识图(KG) 。现有方法主要是基于结构或描述的方法。基于结构的方法学会保留KG的固有结构。这些方法不能很好地代表真实世界KG中丰富的长尾实体,其结构信息有限。基于说明的方法利用了文本信息和语言模型。以前朝这个方向采取的方法几乎不甚优于基于结构的信息和语言模型,并且遇到了昂贵的负面抽样和限制性描述需求等问题。我们在本文件中提议LMKE,采用语言模型来生成知识嵌入,目的是既丰富长尾实体的表述,又解决先前基于描述的方法问题。我们用对比性学习框架制定基于描述的KE学习方法,以提高培训和评价的效率。实验结果表明,LMKE在链接预测和三分级基准方面实现了最先进的业绩,特别是长尾实体。