Tabular data plays an essential role in many data analytics and machine learning tasks. Typically, tabular data does not possess any machine-readable semantics. In this context, semantic table interpretation is crucial for making data analytics workflows more robust and explainable. This article proposes Tab2KG - a novel method that targets at the interpretation of tables with previously unseen data and automatically infers their semantics to transform them into semantic data graphs. We introduce original lightweight semantic profiles that enrich a domain ontology's concepts and relations and represent domain and table characteristics. We propose a one-shot learning approach that relies on these profiles to map a tabular dataset containing previously unseen instances to a domain ontology. In contrast to the existing semantic table interpretation approaches, Tab2KG relies on the semantic profiles only and does not require any instance lookup. This property makes Tab2KG particularly suitable in the data analytics context, in which data tables typically contain new instances. Our experimental evaluation on several real-world datasets from different application domains demonstrates that Tab2KG outperforms state-of-the-art semantic table interpretation baselines.
翻译:标签数据在许多数据分析和机器学习任务中发挥着必不可少的作用。 通常, 表单数据并不具有任何机器可读的语义学。 在这方面, 语义表解释对于使数据分析工作流程更加可靠和解释至关重要。 文章提出 Tab2KG, 这是一种新颖的方法, 该方法针对先前不为人知的数据对表格的解释, 并自动推断其语义, 以将其转换为语义数据图。 我们引入了原始的轻量语义剖面, 以丰富域内文学的概念和关系, 并代表域和表格特性。 我们建议了一种单张学习方法, 以这些剖面图为基础, 将包含先前不可见的事例的表格数据集映射成域内。 与现有的语义表解释方法不同, Tab2KG 仅依靠语义表解剖面图, 不需要任何实例外观。 此属性使 Tab2KG 特别适合数据分析环境, 其中数据表通常包含新的实例。 我们对来自不同应用域的多个真实世界数据集进行实验性评估。