Counting k-cliques in a graph is an important problem in graph analysis with many applications. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. An important optimization is to eliminate search tree branches that discover the same clique redundantly. Eliminating redundant clique discovery is typically done via graph orientation or pivoting. Parallel implementations for both of these approaches have demonstrated promising performance on CPUs. In this paper, we present our GPU implementations of k-clique counting for both the graph orientation and pivoting approaches. Our implementations explore both vertex-centric and edge-centric parallelization schemes, and replace recursive search tree traversal with iterative traversal based on an explicitly-managed shared stack. We also apply various optimizations to reduce memory consumption and improve the utilization of parallel execution resources. Our evaluation shows that our best GPU implementation outperforms the best state-of-the-art parallel CPU implementation by a geometric mean speedup of 12.39x, 6.21x, and 18.99x for k = 4, 7, and 10, respectively. We also evaluate the impact of the choice of parallelization scheme and the incremental speedup of each optimization. Our code will be open-sourced to enable further research on parallelizing k-clique counting on GPUs.
翻译:在图形中计算 k- clicks 是图解分析的许多应用中的一个重要问题。 计算 k- clocies 通常是通过从图表中每个顶端开始的搜索树来完成。 一个重要的优化是消除发现相同球形的搜索树分支。 消除多余的球状发现通常通过图形方向或分流完成。 这两种方法的平行实施都显示在CPU上表现良好。 在本文中, 我们展示了我们的K- clocies执行 k- clocies 的情况, 计算出图形方向和支流方法。 我们的实施工作通常通过在图形方向和支流两种方法中开始的搜索树边中心与边缘平行计划, 并用基于明确管理的共享堆叠合体来取代循环搜索树边际计划。 我们还应用各种优化来减少记忆消耗, 改善平行执行资源的利用。 我们的评估表明, 我们的最佳 GPUPU 实施方式比目前最先进的平行的CPUPUP 实施速度要好于12.39x、 6.21x 和 18.99x 来探索 循环搜索树边边圈, 取代基于 4 7 和10 平行同步计划, 我们还将进一步评估平行同步计划的影响。 我们的同步的同步计划将进一步的同步化, 。 我们的同步计划将进一步的同步计划将进一步的同步化到 将进一步的同步化到 Kx 。