A scalable algorithm for solving compact banded linear systems on distributed memory architectures is presented. The proposed method factorizes the original system into two levels of memory hierarchies, and solves it using parallel cyclic reduction on both distributed and shared memory. This method has a lower communication footprint across distributed memory partitions compared to conventional algorithms involving data transpose or re-partitioning. The algorithm developed in this work is generalized to cyclic compact banded systems with flexible data decompositions. For cyclic compact banded systems, the method is a direct solver with a deterministic operation and communication counts depending on the matrix size, its bandwidth, and the partition strategy. The implementation and runtime configuration details are discussed for performance optimization. Scalability is demonstrated on the linear solver as well as on a representative fluid mechanics application problem, in which the dominant computational cost is solving the cyclic tridiagonal linear systems of compact numerical schemes on a 3D periodic domain. The algorithm is particularly useful for solving the linear systems arising from the application of compact finite difference operators to partial differential equation problems, and it alleviates obstacles to their use on modern high performance computing hardware, where memory and computational power are distributed across nodes with multi-threaded processing units.
翻译:演示了用于解决分布式内存结构上的紧凑带宽线系统的可伸缩算法。 拟议的方法将原系统分为两个级的内存等级, 并使用分布式和共享内存的平行环状缩小来解决这个问题。 与涉及数据转换或再分配的常规算法相比,该方法在分布式内存分区之间的通信足迹较低。 这项工作中开发的算法普遍适用于具有灵活数据分解功能的环状紧凑线状系统。 对于循环式紧凑带系系系系统,该方法是一个直接解决器,根据矩阵大小、其带宽和分区战略,对原有系统进行确定性操作和通信计分数。 为优化性能,讨论了执行和运行时间配置的细节。 在线性求求解器上以及具有代表性的流体力机械应用问题上显示可缩放的可缩放性。 其中,主要计算成本正在解决3D定期域内压缩数字仪的循环三角线性线性系统。 该算法特别有助于解决由于应用压缩定式差异操作者到部分差异方程式问题而形成的线性系统, 并用在高效计算器上, 将存储器用于多功能的存储器。