Knowledge Graph Embedding models have become an important area of machine learning.Those models provide a latent representation of entities and relations in a knowledge graph which can then be used in downstream machine learning tasks such as link prediction. The learning process of such models can be performed by contrasting positive and negative triples. While all triples of a KG are considered positive, negative triples are usually not readily available. Therefore, the choice of the sampling method to obtain the negative triples play a crucial role in the performance and effectiveness of Knowledge Graph Embedding models. Most of the current methods fetch negative samples from a random distribution of entities in the underlying Knowledge Graph which also often includes meaningless triples. Other known methods use adversarial techniques or generative neural networks which consequently reduce the efficiency of the process. In this paper, we propose an approach for generating informative negative samples considering available complementary knowledge about entities. Particularly, Pre-trained Language Models are used to form neighborhood clusters by utilizing the distances between entities to obtain representations of symbolic entities via their textual information. Our comprehensive evaluations demonstrate the effectiveness of the proposed approach on benchmark Knowledge Graphs with textual information for the link prediction task.
翻译:知识嵌入模型已成为机器学习的一个重要领域。这些模型在知识嵌入模型中提供了实体和关系的潜在代表,然后可以用于下游机器学习任务,例如链接预测。这些模型的学习过程可以通过对比正负三重来进行。虽然KG的所有三重都被认为是积极的,但通常不能轻易获得负三重。因此,为获取负三重模型而选择抽样方法在知识嵌入模型的性能和有效性方面发挥着关键作用。目前大多数方法是从基本知识图表中随机分布的实体中获取负面样本,其中往往包括无意义的三重模型。其他已知方法使用对抗性技术或基因内线网络,从而降低这一过程的效率。在本文件中,我们建议了一种办法,根据对实体的现有补充知识来生成信息。特别是,使用预先培训的语言模型,利用实体之间的距离,通过文字信息对象征性实体进行表述,形成社区集群。我们的全面评价表明,拟议的将知识图形与链接预测任务文字信息作为基准的方法是有效的。