The development of scalable, representative, and widely adopted benchmarks for graph data systems have been a question for which answers has been sought for decades. We conduct an in-depth study of the existing literature on benchmarks for graph data management and processing, covering 20 different benchmarks developed during the last 15 years. We categorize the benchmarks into three areas focusing on benchmarks for graph processing systems, graph database benchmarks, and bigdata benchmarks with graph processing workloads. This systematic approach allows us to identify multiple issues existing in this area, including i) few benchmarks exist which can produce high workload scenarios, ii) no significant work done on benchmarking graph stream processing as well as graph based machine learning, iii) benchmarks tend to use conventional metrics despite new meaningful metrics have been around for years, iv) increasing number of big data benchmarks appear with graph processing workloads. Following these observations, we conclude the survey by describing key challenges for future research on graph data systems benchmarking.
翻译:为图表数据系统制定可扩展的、有代表性的和广泛采用的基准是几十年来一直寻求答案的一个问题。我们深入研究了关于图表数据管理和处理基准的现有文献,涵盖过去15年制定的20个不同基准。我们将这些基准分为三个领域,侧重于图表处理系统基准、图表数据库基准和大数据基准以及图表处理工作量。这种系统办法使我们能够查明这一领域存在的许多问题,包括:(一) 很少有可产生高工作量假设的基准;(二) 在基准图表流处理和基于图表的机器学习方面没有做大量工作;(三) 基准往往使用常规指标,尽管新的有意义的指标已经存在多年了;(四) 与图表处理工作量有关的大数据基准越来越多。在进行这些观察之后,我们通过说明对图表数据系统基准的未来研究面临的主要挑战来结束调查。