Knowledge graph embedding research has mainly focused on learning continuous representations of entities and relations tailored towards the link prediction problem. Recent results indicate an ever increasing predictive ability of current approaches on benchmark datasets. However, this effectiveness often comes with the cost of over-parameterization and increased computationally complexity. The former induces extensive hyperparameter optimization to mitigate malicious overfitting. The latter magnifies the importance of winning the hardware lottery. Here, we investigate a remedy for the first problem. We propose a technique based on Kronecker decomposition to reduce the number of parameters in a knowledge graph embedding model, while retaining its expressiveness. Through Kronecker decomposition, large embedding matrices are split into smaller embedding matrices during the training process. Hence, embeddings of knowledge graphs are not plainly retrieved but reconstructed on the fly. The decomposition ensures that elementwise interactions between three embedding vectors are extended with interactions within each embedding vector. This implicitly reduces redundancy in embedding vectors and encourages feature reuse. To quantify the impact of applying Kronecker decomposition on embedding matrices, we conduct a series of experiments on benchmark datasets. Our experiments suggest that applying Kronecker decomposition on embedding matrices leads to an improved parameter efficiency on all benchmark datasets. Moreover, empirical evidence suggests that reconstructed embeddings entail robustness against noise in the input knowledge graph. To foster reproducible research, we provide an open-source implementation of our approach, including training and evaluation scripts as well as pre-trained models in our knowledge graph embedding framework (https://github.com/dice-group/dice-embeddings).
翻译:嵌入知识图的研究主要侧重于学习持续显示实体和关系,以适应链接预测问题。最近的结果显示,当前基准数据集方法的预测能力日益增强。然而,这种有效性往往伴随着超参数化的成本和计算复杂性的提高。前者带来广泛的超参数优化,以减少恶意的过度配制。后者放大了赢得硬件彩票的重要性。这里,我们调查第一个问题的补救方法。我们提议基于克罗内克分解的技术,以减少知识图嵌入模型中的参数数量,同时保持其直观性。通过克罗内克分解,大型嵌入矩阵在培训过程中被分割成较小的嵌入矩阵。因此,嵌入知识图并非直接回收,而是在飞上重建。这种分解确保三个嵌入矢量之间的元素互动随着每个嵌入矢量内部的互动而扩大。这隐含地减少了嵌入病媒方法的冗余性,鼓励特性再利用。为了量化在嵌入矩阵中应用克罗内嵌入参数的影响,我们通过 Kronecker 进行一系列的嵌入模型实验,作为基准级数据库的模型,我们用一个更新的模型, 显示我们数据库的模型。我们将一个数据库的模型用于更新的模型的模型的模型的模型的模型, 显示。我们对数据库的模型的模型的模型的模型,我们将一个推入。我们作为基础的模型的模型的模型的模型的模型的模型的模型,我们进行推算。