Breadth-First Search (BFS) is a building block used in a wide array of graph analytics and is used in various network analysis domains: social, road, transportation, communication, and much more. Over the last two decades, network sizes have continued to grow. The popularity of BFS has brought with it a need for significantly faster traversals. Thus, BFS algorithms have been designed to exploit shared-memory and shared-nothing systems -- this includes algorithms for accelerators such as the GPU. GPUs offer extremely fast traversals at the cost of processing smaller graphs due to their limited memory size. In contrast, CPU shared-memory systems can scale to graphs with several billion edges but do not have enough compute resources needed for fast traversals. This paper introduces ButterFly BFS, a multi-GPU traversal algorithm that allows analyzing significantly larger networks at high rates. ButterFly BFS scales to the similar-sized graphs processed by shared-memory systems while improving performance by more than 10X compared to CPUs. We evaluate our new algorithm on an NVIDIA DGX-2 server with 16 V100 GPUS and show that our algorithm scales with an increase in the number of GPUS. We show that we can achieve a roughly $70\%$ performance linear speedup, which is non-trivial for BFS. For a scale 29 Kronecker graph and edge factor of 8, our new algorithm traverses the graph at a rate of over 300 GTEP/s. That is a high traversal rate for a single server.
翻译:Breadth- First Search (BFS) 是一个建筑块, 用于一系列广泛的图形分析, 用于各种网络分析领域: 社会、 道路、 交通、 通信, 以及更多的网络分析领域。 在过去20年中, 网络规模继续扩大。 BFS 的普及使得需要快速的穿行。 因此, BFS 算法的设计是为了利用共享的模量和共享的无用系统, 包括诸如 GPU 等加速器的算法。 GPU 提供极快的翻转成本, 其成本是处理较小的图表( 社会、 道路、 交通、 交通、 通讯等)。 相比之下, CPU 共享的模拟系统可以缩放成数十亿的图表, 但是没有足够计算快速穿行所需要的资源。 本文介绍Trifly BFS, 一种多GPS 的跨轨算法, 能够以高速分析大得多的网络。 由共享的平流缩图的 BFSSS 比例, 和我们的直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直图,, 而比不到10X, 而比不到100的运行的运行的运行的运行速度直的运行速度直径直径直径直的运行速度速度速度速度速度速度速度速度速度速度速度可提高。