When spreadsheets are filled freely by knowledge workers, they can contain rather unstructured content. For humans and especially machines it becomes difficult to interpret such data properly. Therefore, spreadsheets are often converted to a more explicit, formal and structured form, for example, to a knowledge graph. However, if a data maintenance strategy has been missing and user-generated data becomes "messy", the construction of knowledge graphs will be a challenging task. In this paper, we catalog several of those challenges and propose an interactive approach to solve them. Our approach includes a graphical user interface which enables knowledge engineers to bulk-annotate spreadsheet cells with extracted information. Based on the cells' annotations a knowledge graph is ultimately formed. Using five spreadsheets from an industrial scenario, we built a 25k-triple graph during our evaluation. We compared our method with the state-of-the-art RDF Mapping Language (RML) attempt. The comparison highlights contributions of our approach.
翻译:当电子表格由知识工作者自由填充时,它们可以包含相当非结构化的内容。对于人类,特别是机器来说,很难正确解释这些数据。因此,电子表格往往被转换成更加清晰、正规和结构化的形式,例如,知识图。然而,如果数据维护战略缺失,用户生成的数据成为“迷思”,那么,构建知识图将是一项具有挑战性的任务。在本文中,我们将其中的一些挑战编集成目录,并提出解决这些挑战的交互方法。我们的方法包括一个图形用户界面,使知识工程师能够用提取的信息对电子表格单元格进行批量注。根据这些单元格的说明,最终将形成一个知识图表。在评估过程中,我们用一种工业情景的五个电子表格建立了一个25千字图。我们比较了我们的方法与最先进的RDF绘图语言(RML)的尝试。比较突出表明了我们的方法的贡献。