Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented. Dense embeddings trained from KG datasets benefit a variety of downstream tasks such as KG completion and link prediction. However, existing KG embedding methods fell short to provide a systematic solution for the global consistency of knowledge representation. We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra. By analyzing five distinct algebraic properties, we proved that the semigroup is the most reasonable algebraic structure for the relation embedding of a general knowledge graph. We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets. Moreover, we proposed a regularization-based method to integrate chain-like logic rules derived from human knowledge into embedding training, which further demonstrates the power of the developed language. As far as we know, by applying abstract algebra in statistical learning, this work develops the first formal language for general knowledge graphs, and also sheds light on the problem of neural-symbolic integration from an algebraic perspective.
翻译:知识图形(KG) 代表式学习旨在将实体和关系编码成密集的连续矢量空间,这样可以持续地代表数据集中包含的知识。从KG数据集中培训的密集嵌入有益于各种下游任务,例如KG完成和链接预测。然而,现有的KG嵌入方法不足,无法为全球知识代表性的一致性提供系统的解决办法。我们根据观察其固有的代数结构,为KG开发了一种数学语言,我们称之为知识布拉。通过分析五个不同的代数属性,我们证明半组是嵌入一般知识图表中关系最合理的代数结构。我们采用了一个即时模型,SemE,使用简单的矩阵半组,在标准数据集中展示了最先进的性能。此外,我们提出了一种基于正规化的方法,将人类知识产生的链式逻辑规则纳入嵌入培训,进一步展示了发达语言的力量。我们从统计学习中的抽象代数中知道,通过在统计学习中应用抽象的代数,这一半组从一般知识图学角度开发了第一个正式语言。