We propose a novel approach to the problem of semantic heterogeneity where data are organized into a set of stratified and independent representation layers, namely: conceptual(where a set of unique alinguistic identifiers are connected inside a graph codifying their meaning), language(where sets of synonyms, possibly from multiple languages, annotate concepts), knowledge(in the form of a graph where nodes are entity types and links are properties), and data(in the form of a graph of entities populating the previous knowledge graph). This allows us to state the problem of semantic heterogeneity as a problem of Representation Diversity where the different types of heterogeneity, viz. Conceptual, Language, Knowledge, and Data, are uniformly dealt within each single layer, independently from the others. In this paper we describe the proposed stratified representation of data and the process by which data are first transformed into the target representation, then suitably integrated and then, finally, presented to the user in her preferred format. The proposed framework has been evaluated in various pilot case studies and in a number of industrial data integration problems.
翻译:我们提出了一种新颖的方法来解决语系异质问题,将数据编成一组分层和独立的代表层,即:概念(如果一组独特的语言识别特征在编纂其含义的图表中相互连接)、语言(如果各套同义词,可能来自多种语言,注解概念)、知识(以图表的形式显示,节点是实体类型,链接是属性)和数据(以体现先前知识图的实体图的形式),从而使我们能够将语系异质问题描述为代表多样性问题,因为不同类型不同种类的异义性,即概念、语言、知识和数据,在每一单一的层次内统一处理,而独立于其他层次。在本文件中,我们描述了拟议的数据分层表述以及数据首先转换成目标代表的过程,然后以她所偏好的格式适当整合,最后以她所喜欢的格式向用户介绍。拟议框架已在各种试点案例研究中和若干工业数据整合问题中进行了评价。