利用能性便携式线性代数操作的动态稀少矩阵 (Exploiting dynamic sparse matrices for performance portable linear algebra operations)

Sparse matrices and linear algebra are at the heart of scientific simulations. More than 70 sparse matrix storage formats have been developed over the years, targeting a wide range of hardware architectures and matrix types. Each format is developed to exploit the particular strengths of an architecture, or the specific sparsity patterns of matrices, and the choice of the right format can be crucial in order to achieve optimal performance. The adoption of dynamic sparse matrices that can change the underlying data-structure to match the computation at runtime without introducing prohibitive overheads has the potential of optimizing performance through dynamic format selection. In this paper, we introduce Morpheus, a library that provides an efficient abstraction for dynamic sparse matrices. The adoption of dynamic matrices aims to improve the productivity of developers and end-users who do not need to know and understand the implementation specifics of the different formats available, but still want to take advantage of the optimization opportunity to improve the performance of their applications. We demonstrate that by porting HPCG to use Morpheus, and without further code changes, 1) HPCG can now target heterogeneous environments and 2) the performance of the SpMV kernel is improved up to 2.5x and 7x on CPUs and GPUs respectively, through runtime selection of the best format on each MPI process.

翻译：光谱矩阵和线性代数是科学模拟的核心。多年来,已经开发了70多个稀薄的矩阵存储格式,针对各种硬件架构和矩阵类型。每种格式的开发都是为了利用某个结构的特殊优势或特定矩阵的宽度模式,选择正确的格式对于实现最佳性能至关重要。采用动态的稀释矩阵可以改变基本数据结构,使其在不引入令人望而却步的间接费用的情况下与运行时的计算相匹配,这有可能通过动态格式选择优化性能。在本文中,我们引入了Morpheus,这是一个为动态稀释矩阵提供高效抽象的图书馆。采用动态矩阵的目的是提高开发者和终端用户的生产率,他们不需要了解和理解现有不同格式的具体实施方式,但是仍然希望利用优化机会来改进其应用的绩效。我们证明,通过将HPCG移植使用MPG,不用进一步代码修改,1 HPCG现在可以针对不同的环境,2 SpMV 内流式软件的性能通过CPU 和CPI 的每个最佳格式改进到2.5和7x 。