Graph processing has become an important part of multiple areas of computer science, such as machine learning, computational sciences, medical applications, social network analysis, and many others. Numerous graphs such as web or social networks may contain up to trillions of edges. Often, these graphs are also dynamic (their structure changes over time) and have domain-specific rich data associated with vertices and edges. Graph database systems such as Neo4j enable storing, processing, and analyzing such large, evolving, and rich datasets. Due to the sheer size of such datasets, combined with the irregular nature of graph processing, these systems face unique design challenges. To facilitate the understanding of this emerging domain, we present the first survey and taxonomy of graph database systems. We focus on identifying and analyzing fundamental categories of these systems (e.g., triple stores, tuple stores, native graph database systems, or object-oriented systems), the associated graph models (e.g., RDF or Labeled Property Graph), data organization techniques (e.g., storing graph data in indexing structures or dividing data into records), and different aspects of data distribution and query execution (e.g., support for sharding and ACID). 51 graph database systems are presented and compared, including Neo4j, OrientDB, or Virtuoso. We outline graph database queries and relationships with associated domains (NoSQL stores, graph streaming, and dynamic graph algorithms). Finally, we describe research and engineering challenges to outline the future of graph databases.
翻译:图表处理已成为计算机科学多个领域的一个重要部分,如机器学习、计算科学、医学应用、社会网络分析等计算机科学领域,以及许多其他领域。许多图表,如网络或社交网络等,可能包含多达数万亿的边缘。这些图表通常也是动态的(其结构随时间变化),并拥有与脊椎和边缘相关的具体领域的丰富数据。Neo4j等图表数据库系统能够储存、处理和分析如此大、不断演变和丰富的数据集。由于这类数据集的庞大规模,加上图表处理的不正常性质,这些系统面临着独特的设计挑战。为了便于了解这个新兴领域,我们提出了图形数据库系统的第一个调查和分类。我们侧重于确定和分析这些系统的基本类别(例如三家商店、图普尔仓库、本地图数据库系统或面向对象的系统)、相关的图表模型(例如,相关数据、RDF或标签属性的属性图表图 ),数据组织技术(例如,将图表结构中的图表数据储存或将数据流流流流分为图表记录),以及数据流和图表数据库的不同方面(我们提出的数据流数据库和图表数据库的最后分析和执行)。