A knowledge graph (KG) is a data structure which represents entities and relations as the vertices and edges of a directed graph with edge types. KGs are an important primitive in modern machine learning and artificial intelligence. Embedding-based models, such as the seminal TransE [Bordes et al., 2013] and the recent PairRE [Chao et al., 2020] are among the most popular and successful approaches for representing KGs and inferring missing edges (link completion). Their relative success is often credited in the literature to their ability to learn logical rules between the relations. In this work, we investigate whether learning rules between relations is indeed what drives the performance of embedding-based methods. We define motif learning and two alternative mechanisms, network learning (based only on the connectivity of the KG, ignoring the relation types), and unstructured statistical learning (ignoring the connectivity of the graph). Using experiments on synthetic KGs, we show that KG models can learn motifs and how this ability is degraded by non-motif (noise) edges. We propose tests to distinguish the contributions of the three mechanisms to performance, and apply them to popular KG benchmarks. We also discuss an issue with the standard performance testing protocol and suggest an improvement. To appear in the proceedings of Complex Networks 2021.
翻译:知识图( KG) 是代表实体和关系的数据结构( KG), 代表实体和关系, 作为边缘类型的定向图表的顶点和边缘。 KG 是现代机器学习和人工智能中重要的原始元素。 嵌入模型, 如原始的TransE[ Bordes 等人, 2013年] 和最近的PairRE[Chao等人, 2020年] 是代表KGs和推断缺失边缘的最受欢迎和最成功的方法之一( 连接完成 ) 。 文献中往往将其相对成功归功于他们学习关系之间逻辑规则的能力。 在这项工作中, 我们调查的是, 学习关系规则是否真的驱动嵌入方法的绩效。 我们定义了“ motif” 学习和两个替代机制, 网络学习( 仅基于KG的连接, 忽略关系类型 2020 ) 和无结构的统计学习( 图表的连接) 是最为受欢迎的方法。 使用合成KG 的实验, 我们显示, KG 模型可以学习“ motifs” 和这种能力如何被非motif (noise) 所削弱的能力。 我们提议用“ 标准测试程序来区分业绩标准。