A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of memory hierarchies, and solves it using parallel cyclic reduction on both distributed and shared memory. This method has a lower communication footprint across distributed memory partitions compared to conventional algorithms involving data transpose or re-partitioning. The algorithm developed in this work is generalized to cyclic compact banded systems with flexible data decompositions. For cyclic compact banded systems, the method is a direct solver with a deterministic operation and communication counts depending on the matrix size, its bandwidth, and the partition strategy. The implementation and runtime configuration details are discussed for performance optimization. Scalability is demonstrated on the linear solver as well as on a representative fluid mechanics application problem, in which the dominant computational cost is solving the cyclic tridiagonal linear systems of compact numerical schemes on a 3D periodic domain. The algorithm is particularly useful for solving the linear systems arising from the application of compact finite difference operators to a wide range of partial differential equation problems, such as but not limited to the numerical simulations of compressible turbulent flows, aeroacoustics, elastic-plastic wave propagation, and electromagnetics. It alleviates obstacles to their use on modern high performance computing hardware, where memory and computational power are distributed across nodes with multi-threaded processing units.
翻译:演示了用于解决分布式内存结构中分布式内存结构的紧凑带宽线性系统的可伸缩算法。 拟议的方法将原系统分为两个级别的内存等级, 并使用分布式内存和共享内存的平行周期性减少来解决这个问题。 这种方法在分布式内存分区中的通信足迹低于涉及数据转换或再分配的常规算法。 这项工作中开发的算法普遍适用于具有灵活数据分解功能的循环式紧凑带式系统。 对于循环式紧凑带式内存系统, 方法是一个直接的解析器, 其确定性操作和通信计数取决于矩阵大小、 其带宽度、 带宽度内存战略。 为优化性能, 讨论执行和运行时间配置的细节。 在线性求解器以及具有代表性的流体流体流中, 主要的计算成本正在解决3D定期域内集压式数字计划的循环三维线性线性线性线性系统。 该算法对于解决线性系统特别有用, 由缩定式差异操作员操作器应用到一个广泛的部分差异方位方块式方块式方块式方程式问题,, 并且不局限于的内存的内存的内存不局限于式内化的内化的内存的内存的内存的内存,,, 至不局限于式内存的内存的内存的内存的内存的内存的内存的内存到可不局限于性平的内存到可不局限于式的内存至高的内存至高的机, 。