Knowledge graph embedding research has mainly focused on learning continuous representations of knowledge graphs towards the link prediction problem. Recently developed frameworks can be effectively applied in research related applications. Yet, these frameworks do not fulfill many requirements of real-world applications. As the size of the knowledge graph grows, moving computation from a commodity computer to a cluster of computers in these frameworks becomes more challenging. Finding suitable hyperparameter settings w.r.t. time and computational budgets are left to practitioners. In addition, the continual learning aspect in knowledge graph embedding frameworks is often ignored, although continual learning plays an important role in many real-world (deep) learning-driven applications. Arguably, these limitations explain the lack of publicly available knowledge graph embedding models for large knowledge graphs. We developed a framework based on the frameworks DASK, Pytorch Lightning and Hugging Face to compute embeddings for large-scale knowledge graphs in a hardware-agnostic manner, which is able to address real-world challenges pertaining to the scale of real application. We provide an open-source version of our framework along with a hub of pre-trained models having more than 11.4 B parameters.
翻译:知识嵌入式图的研究主要侧重于学习知识图的连续表达方式,以了解如何预测联系的预测问题。最近制定的框架可以有效地应用于与研究有关的应用。然而,这些框架并不能满足现实世界应用的许多要求。随着知识图的大小不断增大,在这些框架中从商品计算机向计算机集群的计算变得更具挑战性。找到合适的超参数设置时间和计算预算留给了实践者。此外,知识图嵌入框架中的持续学习方面往往被忽视,尽管不断学习在许多现实世界(深)学习驱动的应用中起着重要作用。这些限制可以解释为什么缺少可供公众使用的知识图嵌入大知识图的模型。我们根据DASK、Pytorch Lightning和Hugging Face等框架制定了一个框架,以硬件-氮化方式将大型知识图嵌入成大型知识图,从而能够应对与实际应用规模相关的现实世界挑战。我们提供了框架的公开源版本,同时提供了一个具有超过11.4 B参数的预培训前模型中心。