The models developed to date for knowledge base embedding are all based on the assumption that the relations contained in knowledge bases are binary. For the training and testing of these embedding models, multi-fold (or n-ary) relational data are converted to triples (e.g., in FB15K dataset) and interpreted as instances of binary relations. This paper presents a canonical representation of knowledge bases containing multi-fold relations. We show that the existing embedding models on the popular FB15K datasets correspond to a sub-optimal modelling framework, resulting in a loss of structural information. We advocate a novel modelling framework, which models multi-fold relations directly using this canonical representation. Using this framework, the existing TransH model is generalized to a new model, m-TransH. We demonstrate experimentally that m-TransH outperforms TransH by a large margin, thereby establishing a new state of the art.
翻译:迄今为知识基础嵌入而开发的模型都基于以下假设:知识基础中包含的关系是二元的。为培训和测试这些嵌入模型,多倍(或正)关系数据转换为三重(如FB15K数据集),并被解释为二元关系的例子。本文展示了包含多倍关系的知识基础的典型描述。我们显示,在流行的FB15K数据集中的现有嵌入模型与亚最佳建模框架相对应,导致结构信息损失。我们倡导一种新型建模框架,直接用这种金刚石代表来模拟多倍关系。使用这个框架,现有的TransH模型被普遍化为一个新的模型,即M-TransH。我们实验性地证明,M-TransH在很大的空间上超越了TransH,从而建立了新的艺术状态。