The multiplication of a sparse matrix with a dense vector (SpMV) is a key component in many numerical schemes and its performance is known to be severely limited by main memory access. Several numerical schemes require the multiplication of a sparse matrix polynomial with a dense vector, which is typically implemented as a sequence of SpMVs. This results in low performance and ignores the potential to increase the arithmetic intensity by reusing the matrix data from cache. In this work we use the recursive algebraic coloring engine (RACE) to enable blocking of sparse matrix data across the polynomial computations. In the graph representing the sparse matrix we form levels using a breadth-first search. Locality relations of these levels are then used to improve spatial and temporal locality when accessing the matrix data and to implement an efficient multithreaded parallelization. Our approach is independent of the matrix structure and avoids shortcomings of existing "blocking" strategies in terms of hardware efficiency and parallelization overhead. We quantify the quality of our implementation using performance modelling and demonstrate speedups of up to 3$\times$ and 5$\times$ compared to an optimal SpMV-based baseline on a single multicore chip of recent Intel and AMD architectures. As a potential application, we demonstrate the benefit of our implementation for a Chebyshev time propagation scheme, representing the class of polynomial approximations to exponential integrators. Further numerical schemes which may benefit from our developments include $s$-step Krylov solvers and power clustering algorithms.
翻译:含有密度矢量的稀薄矩阵的倍增是许多数字方案中的一个关键组成部分,已知其性能受到主要内存访问的严重限制。若干数字计划要求将稀少的矩阵与稠密矢量的多元矢量的倍增,通常作为SpmV的序列加以实施。这导致性能低,忽视了通过重复使用缓存数据增加算术强度的可能性。在这项工作中,我们使用循环代数色素引擎(RACE),以便在多数值计算中屏蔽稀薄的矩阵数据。在代表稀薄矩阵的图表中,我们通过宽度第一搜索形成水平。然后,在访问矩阵数据时,利用这些水平的本地关系来改善空间和时间范围,并实行高效的多读平行化。我们的方法独立于矩阵结构,避免现有“阻塞”战略在硬件效率和平行间接管理方面出现缺陷。我们使用性能模型来量化我们执行的质量,并展示最高至3美元和5美元的加速度数据。 相对于最佳的Spmlmlmlalalalalalalalalalal 计划, 利用我们最新的SmmlMlav IMV 的Salalalalalalal-alal-Smoal-al-motional-mocal-s,我们可以进一步展示一个基础, 的Smalal-listalal-listal-listal-lipal-lipal-lipal-sal-sal-sal-smal-smalvial