We revisit column-oriented storage and query processing techniques in the context of contemporary graph database management systems (GDBMSs). Similar to column-oriented RDBMSs, GDBMSs support read-heavy analytical workloads that however have fundamentally different data access patterns than traditional analytical workloads. We first derive a set of desiderata for optimizing storage and query processors of GDBMS based on their access patterns. We then present the design of columnar storage, compression, and query processing techniques based on these desiderata. In addition to showing direct integration of existing techniques from columnar RDBMSs, we also propose novel ones that are optimized for GDBMSs. These include a novel list-based query processor, which avoids expensive data copies of traditional block-based processors under many-to-many joins, a new data structure we call single-indexed edge property pages and an accompanying edge ID scheme, and a new application of Jacobson's bit vector index for compressing NULL values and empty lists. We integrated our techniques into the GraphflowDB in-memory GDBMS. Through extensive experiments, we demonstrate the scalability and query performance benefits of our techniques.
翻译:在当代图表数据库管理系统(GDBMS)的背景下,我们重新审视以专列为导向的存储和查询处理技术。与以专列为导向的数据库管理系统(DDBMS)相似,GDBMS支持阅读重分析工作量,但与传统的分析工作量相比,数据访问模式有根本的不同。我们首先根据GDBMS的存取模式,为优化其存储和查询处理器而得出一套分流模型。然后,我们根据这些分层展示了专栏存储、压缩和查询处理技术的设计。除了显示从专栏RDBMS直接整合现有技术外,我们还提出了为GDBMS优化的新技术。其中包括基于列表的新查询处理器,该处理器避免了在多至多元组合下以块为基础的传统处理器的昂贵数据副本、我们称之为单指数边缘属性的新数据结构以及一个边际识别方案,以及Jacobson的位矢量指数用于压缩NULLL和空列表的新应用。我们把技术融入了GDBMMS中的图形流DB。我们通过广泛的实验,展示了我们的可测量性技术。