ILU smoothers are effective in the algebraic multigrid (AMG) V-cycle for reducing high-frequency components of the residual error. However, direct triangular solves are comparatively slow on GPUs. Previous work by Chow and Patel (2015) and Antz et al. (2015) demonstrated the advantages of Jacobi relaxation as an alternative. Depending on the threshold and fill-level parameters chosen, the factors are highly non-normal and Jacobi is unlikely to converge in a low number of iterations. The Ruiz algorithm applies row or row/column scaling to U in order to reduce the departure from normality. The inherently sequential solve is replaced with a Richardson iteration. There are several advantages beyond the lower compute time. Scaling is performed locally for a diagonal block of the global matrix because it is applied directly to the factor. An ILUT Schur complement smoother maintains a constant GMRES iteration count as the number of MPI ranks increases and thus parallel strong-scaling is improved. The new algorithms are included in hypre, and achieve improved time to solution for several Exascale applications, including the Nalu-Wind and PeleLM pressure solvers. For large problem sizes, GMRES+AMG with iterative triangular solves execute at least five times faster than with direct on massively-parallel GPUs.
翻译:ILU 滑动在代数多格(AMG) V 周期中有效,可以减少剩余错误的高频部件。 但是, 直接三角解决方案在 GPU 上相对缓慢。 Chow 和 Patel (2015) 和 Antz 等人(2015) 以往的工作展示了Jacobi 放松作为一种替代方法的优点。 根据所选择的临界值和填充值参数, 这些因素极不正常, Jacobi 不太可能在低迭代数中聚合。 Ruiz 算法将行或行/ 栏缩放到 U, 以减少偏离常态。 内在的连续解决方案被 Richardson 迭代为替换。 在较低的计算时间以外, 还有一些优势。 缩放是本地为全球矩阵的对角块, 因为它直接应用到因素 。 ILUT Schur 补充器保持恒定的 GRES 升调, 计为MPI 级数, 从而平行的加缩缩。 新的算法包含在 Hyprepe, 中, 改进了多个 Exscalal 解式应用程序的解决方案,, 包括直径 GMGLV- LDRW 和 。