Solving differential equations is a critical task in scientific computing. Domain-specific languages (DSLs) have been a promising direction in achieving performance and productivity, but the current state of the art only supports stencil computation, leaving solvers requiring loop-carried dependencies aside. Alternatively, sparse matrices can represent such equation solvers and are more general than existing DSLs, but the performance is sacrificed. This paper points out that sparse matrices can be represented as programs instead of data, having both the generality from the matrix-based representation and the performance from program optimizations. Based on the idea, we propose the Staged Sparse Row (SSR) sparse matrix representation that can efficiently cover applications on structured grids. With SSR representation, users can intuitively define SSR matrices using generator functions and use SSR matrices through a concise object-oriented interface. SSR matrices can then be chained and applied to construct the algorithm, including those with loop-carried dependences. We then apply a set of dedicated optimizations, and ultimately simplify the SSR matrix-based codes into straightforward matrix-free ones, which are efficient and friendly for further analysis. Implementing BT pseudo application in the NAS Parallel Benchmark, with less than $10\%$ lines of code compared with the matrix-free reference FORTRAN implementation, we achieved up to $92.8\%$ performance. Implementing a matrix-free variant for the High-Performance Conjugate Gradient benchmark, we achieve $3.29\times$ performance compared with the reference implementation, while our implementation shares the same algorithm on the same programming abstraction, which is sparse matrices.
翻译:解决差异方程式是科学计算中的关键任务。 特定域语言( DSLs) 是实现绩效和生产率的一个有希望的方向, 但目前科技状态只能支持快速计算, 使解决问题者需要环形依赖, 使解决者需要环形依赖而置之不理。 或者, 分散的矩阵可以代表这种方程式解决问题, 并且比现有的DSLs更一般, 但业绩会牺牲。 本文指出, 稀疏的矩阵可以作为程序而不是数据来代表, 具有基于矩阵的表述和来自方案优化的业绩。 基于这个想法, 我们提议Sprearse Row( SSR) 分散的矩阵代表可以有效地覆盖结构电网中的应用程序。 有了 SS 代表, 用户可以直截地定义使用环形驱动器的配置矩阵, 并通过一个简明的目标导向界面界面界面来使用。 然后, 我们应用一套免费的优化, 最终将基于安保部矩阵的代码简化成直截面的矩阵, 能够高效和友好地覆盖结构网格中的应用程序 。 与我们相对基准执行的运行的运行基准中的BTRRRBRBR2 。