Sparse 内核的基于差异的矢量化 (Differentiating-based Vectorization for Sparse Kernels)

Sparse computations frequently appear in scientific simulations and the performance of these simulations rely heavily on the optimization of the sparse codes. The compact data structures and irregular computation patterns in sparse matrix computations introduce challenges to vectorizing these codes. Available approaches primarily vectorize regular regions of computations in the sparse code. They also reorganize data and computations, at a cost, to increase the number of regular regions. In this work, we propose a novel polyhedral model, called the partially strided codelets (PSC), that enables the vectorization of computation regions with irregular data access patterns. PSCs also improve data locality in sparse computation. Our DDF inspector-executor framework efficiently mines the memory accesses in the sparse computation, using an access function differentiation approach, to find PSC codelets. It generates vectorized code for the sparse matrix multiplication kernel (SpMV), a kernel with parallel outer loops, and for kernels with carried dependence, specifically the sparse triangular solver (SpTRSV). We demonstrate the performance of the DDF-generated code on a set of 60 large and small matrices (0.05-130M nonzeros). DDF outperforms the highly specialized library MKL with an average speedup of 1.93 and 4.5X for SpMV and SpTRSV, respectively. For the same matrices, DDF outperforms the state-of-the-art inspector-executor framework Sympiler [1] for the SpTRSV kernel by up to 11X and the work by Augustine et. al [2] for the SpMV kernel by up to 12X.

翻译：科学模拟中经常出现粗化的计算,这些模拟的性能在很大程度上依赖于对稀有代码的优化。紧凑的数据结构和稀疏矩阵计算中的不规则计算模式给控制这些代码带来了挑战。可用的方法主要是在稀疏代码中对常规计算区域进行矢量。它们还以成本调整数据和计算,以增加常规区域的数量。在这项工作中,我们提出了一个新型的多面模型,称为部分斜面的代码(PSC),使使用不规则数据访问模式的计算区域实现矢量化。PSC还改进了稀释计算中的数据位置。我们的DDF检查员-执行器框架有效地在稀散计算中存储访问,使用访问功能差异方法,以找到PSC的代码。它们还以成本重组数据和计算数据和计算数据,以增加正常区域的数量。在稀薄的矩阵倍增内(SpmVMV)中生成了矢量代码,一个带有平行外环的内层,以及具有持续依赖性的内层,特别是低端的三角求解码(Sptra)-Syber Syker Spex-Spotels,我们展示DF的代码在一个大型和小型Sl-Slock-Slock-Slock-Slock-Slock-S-Slex-Slex-Slock-Slock-Slex-Slex-Slex-S-S-S-S-S-S-Slex-Slex-Slex-Slock-Slex-Slex-Slock-Slock-Slex-Slex-Slock-Sl-Slock-Sl-S-Sl-Sl-Slor-S-Sl-S-S-S-S-S-S-S-S-S-S-S-Sl-S-S-S-S-S-S-S-S-Sl-S-Sl-Sl-Sl-Sl-Sl-Sl-S-S-Sl-Sl-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S