Krylov subspace methods are an essential building block in numerical simulation software. The efficient utilization of modern hardware is a challenging problem in the development of these methods. In this work, we develop Krylov subspace methods to solve linear systems with multiple right-hand sides, tailored to modern hardware in high-performance computing. To this end, we analyze an innovative block Krylov subspace framework that allows to balance the computational and data-transfer costs to the hardware. Based on the framework, we formulate commonly used Krylov methods. For the CG and BiCGStab methods, we introduce a novel stabilization approach as an alternative to a deflation strategy. This helps us to retain the block size, thus leading to a simpler and more efficient implementation. In addition, we optimize the methods further for distributed memory systems and the communication overhead. For the CG method, we analyze approaches to overlap the communication and computation and present multiple variants of the CG method, which differ in their communication properties. Furthermore, we present optimizations of the orthogonalization procedure in the GMRes method. Beside introducing a pipelined Gram-Schmidt variant that overlaps the global communication with the computation of inner products, we present a novel orthonormalization method based on the TSQR algorithm, which is communication-optimal and stable. For all optimized method, we present tests that show their superiority in a distributed setting.
翻译:Krylov 子空间方法是数字模拟软件的基本构件。 高效使用现代硬件是开发这些方法的一个棘手问题。 在这项工作中, 我们开发了 Krylov 子空间方法, 用多右侧解决线性系统, 适合高性能计算中的现代硬件。 为此, 我们分析了一个创新的块 Krylov 子空间框架, 能够平衡计算成本和数据传输成本。 基于这个框架, 我们制定了通用的 Krylov 方法。 对于 CG 和 BiCGStab 方法, 我们引入了一种新的稳定化方法, 以替代通缩战略。 这有助于我们保留区块的大小, 从而导致更简单和更有效的执行。 此外, 我们优化了分布记忆系统和通信顶端的各种方法。 对于CGrylov 方法, 我们分析了将通信和计算方法的多种变体进行重叠, 而这些变体在通信特性上有所不同。 此外, 我们介绍了在 GMRest 方法中优化了正解化程序。 除了引入管道- Schmidopt 变体外, 将所有全球通信方法与稳定地计算方法相重叠。