Counting k-cliques in a graph is an important problem in graph analysis with many applications such as community detection and graph partitioning. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. Parallelizing k-clique counting has been well-studied on CPUs and many solutions exist. However, there are no performant solutions for k-clique counting on GPUs. Parallelizing k-clique counting on GPUs comes with numerous challenges such as the need for extracting fine-grain multi-level parallelism, sensitivity to load imbalance, and constrained physical memory capacity. While there has been work on related problems such as finding maximal cliques and generalized sub-graph matching on GPUs, k-clique counting in particular has yet to be explored in depth. In this paper, we present the first parallel GPU solution specialized for the k-clique counting problem. Our solution supports both graph orientation and pivoting for eliminating redundant clique discovery. It incorporates both vertex-centric and edge-centric parallelization schemes for distributing work across thread blocks, and further partitions work within each thread block to extract fine-grain multi-level parallelism while tolerating load imbalance. It also includes optimizations such as binary encoding of induced sub-graphs and sub-warp partitioning to limit memory consumption and improve the utilization of execution resources. Our evaluation shows that our best GPU implementation outperforms the best state-of-the-art parallel CPU implementation by a geometric mean of 12.39x, 6.21x, and 18.99x for k=4, 7, and 10, respectively. We also perform a detailed evaluation of the trade-offs involved in the choice of parallelization scheme, and the incremental speedup of each optimization to provide an in-depth understanding of the optimization space. ...
翻译:在图形中计算 k- clicks 是图形分析中的一个重要问题, 包括社区检测和图形分割等许多应用程序。 计算 kclicks通常是通过从图形中每个顶端开始的搜索树来完成的。 平行的 kcloque 计数已经在 CPU 上进行了很好的研究, 许多解决方案也存在。 但是, 在 K- cluque 计数 GPU 上没有计算 Kclicks 的运行解决方案。 平行的 kclickr 计数在 GPU 上, 伴随着许多挑战, 比如需要提取细微grain 多级平行化, 敏感度不平衡, 以及有限的物理内存能力。 虽然在相关问题上做了一些工作, 如在图形顶端的 Clicques 上找到最高分级的搜索 。 在本文中, 我们的第一个平行的... clocque 解决方案是专门用于计算 kclickral 问题的。 我们的解决方案既支持图形方向, 也支持对消除冗余的 Clodical 发现 。 它的上和边缘平行的分级计划, 既包括垂直的分级的分级计划, 分级分级分级分级分级分级,,, 分级分级的分级的分级的分级的分级, 分级的分级的分级的分级的分级的分级和分级, 分级的分级的分级, 分级的分级, 等的分级的分级的分级, 等的分级的分级的分级的分级, 。