基于组件感知并行分支的GPU快速顶点覆盖算法 (Faster Vertex Cover Algorithms on GPUs with Component-Aware Parallel Branching)

Algorithms for finding minimum or bounded vertex covers in graphs use a branch-and-reduce strategy, which involves exploring a highly imbalanced search tree. Prior GPU solutions assign different thread blocks to different sub-trees, while using a shared worklist to balance the load. However, these prior solutions do not scale to large and complex graphs because their unawareness of when the graph splits into components causes them to solve these components redundantly. Moreover, their high memory footprint limits the number of workers that can execute concurrently. We propose a novel GPU solution for vertex cover problems that detects when a graph splits into components and branches on the components independently. Although the need to aggregate the solutions of different components introduces non-tail-recursive branches which interfere with load balancing, we overcome this challenge by delegating the post-processing to the last descendant of each branch. We also reduce the memory footprint by reducing the graph and inducing a subgraph before exploring the search tree. Our solution substantially outperforms the state-of-the-art GPU solution, finishing in seconds when the state-of-the-art solution exceeds 6 hours. To the best of our knowledge, our work is the first to parallelize non-tail-recursive branching patterns on GPUs in a load balanced manner.

翻译：寻找图中最小或有界顶点覆盖的算法采用分支归约策略，该方法需要探索高度不平衡的搜索树。现有的GPU解决方案将不同线程块分配给不同子树，同时使用共享工作列表实现负载均衡。然而，由于这些方案未能识别图何时分裂为独立连通分量，导致对相同分量进行冗余求解，因此无法扩展至大规模复杂图。此外，其高内存占用量限制了可并发执行的工作线程数量。本文提出一种创新的GPU顶点覆盖求解方案，该方案能够检测图分裂为连通分量的时机，并对各分量进行独立分支处理。尽管聚合不同分量解的过程会产生干扰负载平衡的非尾递归分支，我们通过将后处理任务委托给各分支的末代子节点来克服这一挑战。同时，我们通过在探索搜索树前对图进行归约并诱导子图的方式降低内存占用。实验表明，本方案性能显著优于当前最先进的GPU解决方案：当现有方案需耗时6小时以上时，本方案可在数秒内完成求解。据我们所知，本研究首次在GPU上以负载均衡方式实现了非尾递归分支模式的并行化。