Data in Knowledge Graphs often represents part of the current state of the real world. Thus, to stay up-to-date the graph data needs to be updated frequently. To utilize information from Knowledge Graphs, many state-of-the-art machine learning approaches use embedding techniques. These techniques typically compute an embedding, i.e., vector representations of the nodes as input for the main machine learning algorithm. If a graph update occurs later on -- specifically when nodes are added or removed -- the training has to be done all over again. This is undesirable, because of the time it takes and also because downstream models which were trained with these embeddings have to be retrained if they change significantly. In this paper, we investigate embedding updates that do not require full retraining and evaluate them in combination with various embedding models on real dynamic Knowledge Graphs covering multiple use cases. We study approaches that place newly appearing nodes optimally according to local information, but notice that this does not work well. However, we find that if we continue the training of the old embedding, interleaved with epochs during which we only optimize for the added and removed parts, we obtain good results in terms of typical metrics used in link prediction. This performance is obtained much faster than with a complete retraining and hence makes it possible to maintain embeddings for dynamic Knowledge Graphs.
翻译:知识图中的数据通常代表真实世界当前状态的一部分。 因此, 要不断更新图表数据需要经常更新。 要使用知识图中的信息, 许多最先进的机器学习方法使用嵌入技术。 这些技术通常计算嵌入, 即节点的矢量表示作为主要机器学习算法的输入。 如果图表更新晚些时 -- -- 特别是节点被添加或删除时 -- -- 培训必须再次完成。 这不可取, 因为它需要时间, 并且因为经过这些嵌入培训的下游模型如果发生重大变化, 就必须重新培训。 在本文中, 我们调查不需要全面再培训的嵌入更新, 并结合各种嵌入模型, 用于包含多种使用案例的真正动态知识图的嵌入。 我们研究根据当地信息对新显示节点的最佳方式, 但要注意这不会成功。 但是, 我们发现, 如果我们继续训练旧嵌入、 与那些经过这些嵌入这些嵌入的嵌入模型如果发生重大变化, 就必须重新培训。 我们调查这些新嵌入的嵌入过程比典型的图像要更快速, 因此, 将获得良好的结果, 用于更新。