Krylov GPU 集中溶解器的双级双级高斯- 赛德预修器和滑动器 (Two-Stage Gauss--Seidel Preconditioners and Smoothers for Krylov Solvers on a GPU cluster)

Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-stage variants, replacing the direct triangular solve with a fixed number of inner Jacobi-Richardson (JR) iterations. When a small number of inner iterations is sufficient to maintain the Krylov convergence rate, the two-stage GS (GS2) often outperforms the traditional algorithm on many-core architectures. We also compare GS2 with JR. When they perform the same number of flops for SpMV (e.g. three JR sweeps compared to two GS sweeps with one inner JR sweep), the GS2 iterations, and the Krylov solver preconditioned with GS2, may converge faster than the JR iterations. Moreover, for some problems (e.g. elasticity), it was found that JR may diverge with a damping factor of one, whereas two-stage GS may improve the convergence with more inner iterations. Finally, to study the performance of the two-stage smoother and preconditioner for a practical problem, %(e.g. using tuned damping factors), these were applied to incompressible fluid flow simulations on GPUs.

翻译：Gaus- Seidel (GS) 放松常被用作 Krylov 解析器的先决条件, 或用于 Algebraic Multigrid (AMG) 的平滑。然而, 必要的稀薄三角解决方案很难在像图形处理器( GPUs) 这样的许多核心结构中平行。在本研究中, 基于三角解决方案的传统GS放松的性能与两阶段变量相比, 以固定数量的内部 Jacobi- Richardson (JR) 版本取代直接三角解决方案。当少量内部循环足以维持 Krylov 的趋同率时, 两阶段的GS( GS 2) 通常会超过多个核心结构的传统算法。我们还将 GS2 与 JR 比较。当它们执行相同数量的 Spmmlvad( 例如, 三个 JR 扫荡器) 和两个GS 扫荡器的问题相比, 以一个内部 JR 扫荡器( JS2 ) 的平流和 Krylov 预设的流可能比 JR 的平流要快一些。。最后, 和在两个阶段中找到了的变压中找到了。。。。对于某些的的的,, 的变压的的变动的变动的变动的变变变变变变变变变的。