With knowledge graphs (KGs) at the center of numerous applications such as recommender systems and question answering, the need for generalized pipelines to construct and continuously update such KGs is increasing. While the individual steps that are necessary to create KGs from unstructured (e.g. text) and structured data sources (e.g. databases) are mostly well-researched for their one-shot execution, their adoption for incremental KG updates and the interplay of the individual steps have hardly been investigated in a systematic manner so far. In this work, we first discuss the main graph models for KGs and introduce the major requirement for future KG construction pipelines. Next, we provide an overview of the necessary steps to build high-quality KGs, including cross-cutting topics such as metadata management, ontology development, and quality assurance. We then evaluate the state of the art of KG construction w.r.t the introduced requirements for specific popular KGs as well as some recent tools and strategies for KG construction. Finally, we identify areas in need of further research and improvement.
翻译:知识图(KGs)是许多应用的中心,例如建议系统和回答问题,因此需要通用管道来建造和不断更新这种KGs;虽然从结构化(例如文本)和结构化数据源(例如数据库)创建KGs所需的个别步骤大多是为其一次性执行进行充分研究的,但迄今很少系统地调查采用渐进式KG更新和个别步骤的相互作用。在这项工作中,我们首先讨论KGs的主要图表模型,并介绍未来KG建设管道的主要要求。接下来,我们概述了建立高质量KGs的必要步骤,包括元数据管理、肿瘤发展和质量保证等交叉主题。然后我们评估KG建设的艺术状况,然后我们评估对特定流行KGs提出的要求以及最近为KG的建设提出的一些工具和战略。最后,我们确定了需要进一步研究和改进的领域。