用于硬件有效对称分解矩阵变异变异变异的递递性代数色化技术 (A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication)

The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring techniques and Intel MKL on two modern multicore processors. RACE outperforms all other approaches substantially and behaves in accordance with the Roofline model. Outliers are discussed and analyzed in detail. While we focus on SymmSpMV in this paper, our algorithm and software is applicable to any sparse matrix operation with data dependencies that can be resolved by distance-k coloring.

翻译：对称性分散矩阵- 矢量倍增( SymmSpMV) 是许多数字线性代数内色内核操作或图形穿行应用程序的重要基石。由于需要管理对结果矢量的矛盾更新, 今天多核心平台上高达100个核心的 SymmSpMV 很难平行, 因为需要管理对结果矢量的矛盾更新。颜色方法可以用来解决这个问题, 没有数据重复, 但现有的色彩算法并不考虑负负负平衡和深度内存分级, 妨碍可缩放和全芯性能。在这项工作中, 我们建议重现的代数变色色引擎( RACE ) 、新的色谱算法和开源库实施, 从而消除以前在硬件效率和平行管理顶部上涂色方法的缺陷。我们描述RACEE 的级别构造、远程色谱和负载平衡步骤, 使用SymmSpMVMV, 将其31种稀释矩阵的性能与其他状态- 颜色技术进行比较, Intel MKL 和两种现代多核心的代色调调调调算法, 和SyROCL 在Smlex 中, 中, 中, 我们用Smacrocromocromax 将所有的重重重重重重。