Krylov subspace methods are extensively used in scientific computing to solve large-scale linear systems. However, the performance of these iterative Krylov solvers on modern supercomputers is limited by expensive communication costs. The $s$-step strategy generates a series of $s$ Krylov vectors at a time to avoid communication. Asymptotically, the $s$-step approach can reduce communication latency by a factor of $s$. Unfortunately, due to finite-precision implementation, the step size has to be kept small for stability. In this work, we tackle the numerical instabilities encountered in the $s$-step GMRES algorithm. By choosing an appropriate polynomial basis and block orthogonalization schemes, we construct a communication avoiding $s$-step GMRES algorithm that automatically selects the optimal step size to ensure numerical stability. To further maximize communication savings, we introduce scaled Newton polynomials that can increase the step size $s$ to a few hundreds for many problems. An initial step size estimator is also developed to efficiently choose the optimal step size for stability. The guaranteed stability of the proposed algorithm is demonstrated using numerical experiments. In the process, we also evaluate how the choice of polynomial and preconditioning affects the stability limit of the algorithm. Finally, we show parallel scalability on more than 14,000 cores in a distributed-memory setting. Perfectly linear scaling has been observed in both strong and weak scaling studies with negligible communication costs.
翻译:Krylov子空间方法广泛应用于解决大规模线性系统的科学计算中。然而,这些迭代Krylov求解器在现代超级计算机上的性能受到昂贵的通信成本限制。$s$步策略一次生成一系列$s$个Krylov向量以避免通信。从渐近意义上讲,$s$步方法可以将通信延迟减少$s$倍。不幸的是,由于有限精度实现,必须保持步长较小以保持稳定性。在这项工作中,我们解决了$s$步GMRES算法中遇到的数值不稳定性问题。通过选择适当的多项式基础和块正交化方案,我们构造了一种避免通信的$s$步GMRES算法,该算法自动选择最佳的步长以确保数值稳定性。为了进一步最大化通信节省,我们介绍了缩放的Newton多项式,可以将步长$s$增加到数百个,适用于许多问题。还开发了一个初始步长估计器,以便高效地选择最佳步长以保持稳定性。使用数值实验证明了所提出算法的稳定性保证,并评估了多项式基础和预处理对算法稳定性限制的影响。最后,我们在分布式内存设置中的超过14,000个内核上展示并行可扩展性。在强弱扩展研究中观察到了完美的线性可扩展性,通信成本微不足道。