The rapidly growing number of large network analysis problems has led to the emergence of many parallel and distributed graph processing systems---one survey in 2014 identified over 80. Since then, the landscape has evolved; some packages have become inactive while more are being developed. Determining the best approach for a given problem is infeasible for most developers. To enable easy, rigorous, and repeatable comparison of the capabilities of such systems, we present an approach and associated software for analyzing the performance and scalability of parallel, open-source graph libraries. We demonstrate our approach on five graph processing packages: GraphMat, the Graph500, the Graph Algorithm Platform Benchmark Suite, GraphBIG, and PowerGraph using synthetic and real-world datasets. We examine previously overlooked aspects of parallel graph processing performance such as phases of execution and energy usage for three algorithms: breadth first search, single source shortest paths, and PageRank and compare our results to Graphalytics.
翻译:大量网络分析问题迅速增多,导致2014年发现的许多平行和分布式图解处理系统 -- -- 2014年的一号调查 -- -- 出现。 自此以来,地貌发展演变;一些包子在开发更多数据的同时变得不活动。确定解决特定问题的最佳办法对大多数开发者来说是不可行的。为了便于、严格和重复比较这些系统的能力,我们提出了一个方法和相关软件,用以分析平行、开放源码图库的性能和可缩放性。我们用五个图解处理包展示了我们的方法:图马特、图500、图表Algorithm平台基准套件、GreabBIG和PowerGraph 使用合成和真实世界数据集。我们研究了以往忽视的平行图处理性,例如执行阶段和三种算法的能源使用:宽度第一搜索、单一源最短路径和PageRank,并将我们的结果与图表分析方法进行比较。