In this work we propose a novel end-to-end multi-stage Knowledge Graph (KG) generation system from textual inputs, separating the overall process into two stages. The graph nodes are generated first using pretrained language model, followed by a simple edge construction head, enabling efficient KG extraction from the text. For each stage we consider several architectural choices that can be used depending on the available training resources. We evaluated the model on a recent WebNLG 2020 Challenge dataset, matching the state-of-the-art performance on text-to-RDF generation task, as well as on New York Times (NYT) and a large-scale TekGen datasets, showing strong overall performance, outperforming the existing baselines. We believe that the proposed system can serve as a viable KG construction alternative to the existing linearization or sampling-based graph generation approaches. Our code can be found at https://github.com/IBM/Grapher
翻译:在这项工作中,我们提出一个新的端到端多阶段知识图生成系统,把整个过程分为两个阶段。图形节点首先使用预先培训的语言模型生成,然后是简单的边缘构造头,以便从文本中高效提取 KG。我们考虑根据现有培训资源,在每一阶段都可采用几种建筑选择。我们评估了最近的WebNLG 2020挑战数据集中的模型,匹配了文本到RDF生成任务以及纽约时报(NYT)和大型TekGen数据集的最新性能,显示很强的总体性能,优于现有基线。我们认为,拟议的系统可以作为现有的线性化或基于取样的图形生成方法的一种可行的KG建筑替代。我们的代码可以在https://github.com/IBM/Grapher找到。我们的代码可以在https://github.IBM/Grapher找到。