Tensor factorization has become an increasingly popular approach to knowledge graph completion(KGC), which is the task of automatically predicting missing facts in a knowledge graph. However, even with a simple model like CANDECOMP/PARAFAC(CP) tensor decomposition, KGC on existing knowledge graphs is impractical in resource-limited environments, as a large amount of memory is required to store parameters represented as 32-bit or 64-bit floating point numbers. This limitation is expected to become more stringent as existing knowledge graphs, which are already huge, keep steadily growing in scale. To reduce the memory requirement, we present a method for binarizing the parameters of the CP tensor decomposition by introducing a quantization function to the optimization problem. This method replaces floating point-valued parameters with binary ones after training, which drastically reduces the model size at run time. We investigate the trade-off between the quality and size of tensor factorization models for several KGC benchmark datasets. In our experiments, the proposed method successfully reduced the model size by more than an order of magnitude while maintaining the task performance. Moreover, a fast score computation technique can be developed with bitwise operations.
翻译:光学系数化已成为一种日益流行的知识图形完成方法(KGC),这是在知识图中自动预测缺失事实的任务。然而,即使采用像CANDECOMP/PARAFAC(CP) 的简单模型,现有知识图解析,在资源有限的环境中,现有知识图解析的KGC是不切实际的,因为存储32位或64位浮点数的参数需要大量的记忆。由于现有的知识图已经非常庞大,因此这一限制预计将变得更加严格,并稳步扩大规模。为了减少记忆要求,我们提出了一个方法,通过对优化问题引入一个四分化功能,将CP 高压解析的参数二分解成一体。这个方法在培训后用二进制参数取代浮动点值参数,大大缩小了运行时的模型大小。我们调查了数个KGC基准数据集的数位系数化模型质量和大小之间的权衡。在实验中,拟议的方法在保持任务性能的同时,可以成功地将模型的大小缩小一个数量级。此外,快速的计算方法可以以略的方式发展。