From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use.
翻译:从社交网络到语言建模,图形数据规模和重要性的日益扩大驱动了众多新的图形平行系统(例如Pregel、Greab)的开发。通过限制可以表达的计算和采用新的方法分割和分布图形,这些系统能够有效地执行比一般数据平行系统更快的迭接图形算法数量顺序;然而,由于同样限制,使性能增益难以在典型的图形分析管道中表达许多重要阶段:构建图形,修改其结构,或表达跨越多个图形的计算。因此,现有的图表分析管道利用外部储存系统组成图形平行和数据平行系统,导致广泛的数据移动和复杂的编程模式。为了应对这些挑战,我们引入了图X,一个分布式的图形计算框架,使图形平面图和数据平行计算系统变得精细、图表平面操作器表示可以执行Pregel和Pow Graph的数学抽象数据,但足够简单,可以将图形平衡管道组成成图形平面图操作器,从而实现直径的平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面平面图。我们通过直地平面平面平面平面平面平平面平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平平