Graph Pattern Mining (GPM) is an important, rapidly evolving, and computation demanding area. GPM computation relies on subgraph enumeration, which consists in extracting subgraphs that match a given property from an input graph. Graphics Processing Units (GPUs) have been an effective platform to accelerate applications in many areas. However, the irregularity of subgraph enumeration makes it challenging for efficient execution on GPU due to typical uncoalesced memory access, divergence, and load imbalance. Unfortunately, these aspects have not been fully addressed in previous work. Thus, this work proposes novel strategies to design and implement subgraph enumeration efficiently on GPU. We support a depth-first search style search (DFS-wide) that maximizes memory performance while providing enough parallelism to be exploited by the GPU, along with a warp-centric design that minimizes execution divergence and improves utilization of the computing capabilities. We also propose a low-cost load balancing layer to avoid idleness and redistribute work among thread warps in a GPU. Our strategies have been deployed in a system named DuMato, which provides a simple programming interface to allow efficient implementation of GPM algorithms. Our evaluation has shown that DuMato is often an order of magnitude faster than state-of-the-art GPM systems and can mine larger subgraphs (up to 12 vertices).
翻译:GPM 计算依靠子集查, 包括从输入图中提取符合特定属性的子集。 图形处理单位( GPU) 是一个有效平台, 在许多领域可以加速应用。 然而, 子集查的不规则性使得在 GPU 上高效执行具有挑战性, 原因是典型的未分割的内存存存存取、 差异和负载不平衡。 不幸的是, 这些方面在以前的工作中没有得到充分处理 。 因此, 这项工作提出了设计和实施 GPU 子计的新战略 。 我们支持深度第一搜索风格搜索( 整个 DFS ), 以最大限度地实现存储性能, 同时提供足够的平行性, 供 GPU 利用, 以及一个以扭曲为中心的设计, 最大限度地减少执行差异, 提高计算能力的利用率。 我们还提议了一个低成本的负载平衡层, 以避免闲置, 并在GPU 中将工作重新分配。 我们的战略被部署在一个名为 DuMato 的系统中, 提供简单的程序界面, 以便高效地执行 GPM 12 算算法 。 我们的子系统往往显示一个更大的系统。