This paper proposes Khuzdul, a distributed execution engine with a well-defined abstraction that can be integrated with existing single-machine graph pattern mining (GPM) systems to provide efficiency and scalability at the same time. The key novelty is the extendable embedding abstraction which can express pattern enumeration algorithms, allow fine-grained task scheduling, and enable low-cost GPMspecific data reuse to reduce communication cost. The effective BFS-DFS hybrid exploration generates sufficient concurrent tasks for communication-computation overlapping with bounded memory consumption. Two scalable distributed GPM systems are implemented by porting Automine and GraphPi on Khuzdul. Our evaluation shows that Khuzdul based systems significantly outperform state-of-the-art distributed GPM systems with partitioned graphs by up to 75.5x (on average 19.0x), achieve similar or even better performance compared with the fastest distributed GPM systems with replicated graph, and scale to massive graphs with more than one hundred billion edges with a commodity cluster.
翻译:本文提出Khuzdul,这是一个分布式执行引擎,具有定义明确的抽象,可以与现有的单机图样采矿系统相结合,同时提供效率和可缩放性。关键的新颖之处是可扩展的嵌入式抽象,可以表达模式查算算法,允许细微区分任务列表,并使得低成本的GPM特定数据再利用以降低通信成本。有效的BFS-DFS混合探索为与封闭式内存消耗重叠的通信-计算产生了足够的并行任务。两个可缩放式的GPM系统通过在Khuzdul上移植自动采矿和GapPi来实施。我们的评估表明,基于Khuzdul的系统明显超越了分布式GPM系统,其分布式GPM系统以75.5x(平均为19.0x)表示,其性能与采用复制式图形的最快分布式GPM系统相比接近或甚至更好,其规模与商品集群超过100亿边缘的大规模图表。