One of the main challenges of building an enterprise knowledge graph is guaranteeing interoperability between a single graph data model and the diverse and often changeable ecosystem of non-graph data models, languages, and tools which surround and support the graph. The simple and developer-friendly property graph family of data models lends itself to this task, yet the lack of a formal specification deprives the graph of well-defined semantics. In this paper, we observe that algebraic data types are a common foundation of most of the enterprise schema languages we deal with in practice, and are also a suitable basis for a property graph formalism. We introduce this formalism in terms of type theory, algebra, and category theory, also providing algorithms for query processing and data migration with guarantees of semantic consistency across supported languages and datasets. These results have clear connections to relational database theory, programming language theory, and graph theory, providing starting points for significant future work. Open research challenges described in the paper include adding constraints, query and schema languages, and logics on top of the basic type system, interfacing with specific graph and non-graph data models, and performing operations which are typically difficult or ill-defined on property graphs, such as graph merges.
翻译:建立企业知识图表的主要挑战之一是确保单一图表数据模型与非图表数据模型、语言和工具的多样化和经常变化的生态系统之间的互操作性。数据模型的简单和开发者友好的属性图表系列有助于完成这项任务,但缺乏正式的规格使定义明确的语义图丧失了。在本文中,我们观察到代数数据类型是我们在实践中所处理的大多数企业系统语言的共同基础,也是属性图表形式主义的合适基础。我们从类型理论、代数和类别理论等方面引入了这种形式主义,也为查询处理和数据迁移提供了算法,保证了支持的语文和数据集之间的语义一致性。这些结果与关系数据库理论、语言理论规划和图表理论有着明确的联系,为今后的重要工作提供了起点。本文描述的开放研究挑战包括增加制约、查询和方程式语言,以及基本类型系统顶部的逻辑,与具体图表和非图表模型的互换和运行,这些模型通常为困难或不易变的图表。