Sparse, irregular graphs show up in various applications like linear algebra, machine learning, engineering simulations, robotic control, etc. These graphs have a high degree of parallelism, but their execution on parallel threads of modern platforms remains challenging due to the irregular data dependencies. The execution performance can be improved by efficiently partitioning the graphs such that the communication and thread synchronization overheads are minimized without hurting the utilization of the threads. To achieve this, this paper proposes GRAPHOPT, a tool that models the graph parallelization as a constrained optimization problem and uses the open Google OR-Tools solver to find good partitions. Several scalability techniques are developed to handle large real-world graphs with millions of nodes and edges. Extensive experiments are performed on the graphs of sparse matrix triangular solves (linear algebra) and sum-product networks (machine learning), respectively, showing a mean speedup of 2.0X and 1.8X over previous state-of-the-art libraries, demonstrating the effectiveness of the constrained-optimization-based graph parallelization.
翻译:为了达到这一目的,本文件提议GRAPHOPT, 这个工具将图形平行化模拟成一个有限的优化问题,并使用开放的Google OR-Tools解析器找到良好的分区。 开发了几种可缩放技术来处理具有数百万节点和边缘的大型真实世界图形。 分别在稀薄矩阵三角分辨率(线代数)和总产品网络(机械学习)的图形上进行了广泛的实验, 显示了基于节点的图形平行化的效果。