This paper proposes Kudu, a distributed execution engine with a well-defined abstraction that can be integrated with existing single-machine graph pattern mining (GPM) systems to provide efficiency and scalability at the same time. The key novelty is the extendable embedding abstraction which can express pattern enumeration algorithms, allow fine-grained task scheduling, and enable low-cost GPM-specific data reuse to reduce communication cost. The effective BFSDFS hybrid exploration generates sufficient concurrent tasks for communication-computation overlapping with bounded memory consumption. Two scalable distributed GPM systems are implemented by porting Automine and GraphPi on Kudu. Our evaluation shows that Kudu based systems significantly outperform state-of-the-art distributed GPM systems with partitioned graphs by up to 75.5x (on average 19.0x), achieve similar or even better performance compared with the fastest distributed GPM systems with replicated graph, and scale to massive graphs with more than one hundred billion edges with a commodity cluster.
翻译:本文提出Kudu, 这是一个分布式执行引擎,具有定义明确的抽象,可以与现有的单机图样采矿(GPM)系统相结合,同时提供效率和可缩放性。 关键的新颖之处是扩展式嵌入式抽象,可以表达模式查算算算法,允许细微区分任务列表,并使得低成本的GPM特定数据再利用以降低通信成本。 有效的BFSDFS混合探索为通信- 计算与封闭式内存消耗重叠创造了足够的并行任务。 两个可缩放式的分布式GPM系统是通过在Kudu上移植自动采矿和图形Pi来实施的。 我们的评估表明,Kudu基系统大大优于以75.5x(平均为19.0x)的分布式图解析式GPM系统,其性能与最迅速分布式的GPM系统相近或甚至好于复制的图形,与商品集群超过100亿边缘的大规模图表。