ILU smoothers can be effective in the algebraic multigrid $V$-cycle. However, direct triangular solves are comparatively slow on GPUs. Previous work by Chow and Patel \cite{ChowPatel2015} and Antz et al. \cite{Anzt2015} proposed an iterative approach to solve these systems. Unfortunately, when the Jacobi iteration is applied to highly non-normal upper or lower triangular factors, the iterations will diverge. An ILU smoother is introduced for classical Ruge-St\"uben C-AMG that applies row and/or column scaling to mitigate the non-normality of the upper triangular factor. Our approach facilitates the use of Jacobi iteration in place of the inherently sequential triangular solve. Because the scaling is applied to the upper triangular factor, it can be done locally for a diagonal block of the global matrix. An ILUT Schur complement smoother, that solves the Schur system along subdomain (MPI rank) boundaries using GMRES, maintains a constant iteration count and improves strong-scaling. Numerical results and parallel performance are presented for the Nalu-Wind and PeleLM \cite{PeleLM} pressure solvers. For large problem sizes, GMRES$+$AMG with iterative triangular solves executes at least five times faster than when using direct solves on the NREL Eagle supercomputer.
翻译:ILU 滑动器在代数多方格 $V 周期中可以有效。 但是, 直接三角解决方案在 GPU 上相对比较慢。 Chow 和 Patel {ChowPatel2015} 和 Antz et al.\ cite{Anzt2015} 和 Antz 和 Antz 等 提议了一个迭接方法来解决这些系统。 不幸的是, 当 Jacobi 迭代应用到高度非正常的上三角或下三角因素时, 循环将产生差异。 在古典 Ruge- St\'uben C- AMG 中引入了 ILUU 平滑动器, 应用行和/ 列缩放来减轻上三角要素的不常态性。 我们的方法有助于使用 cocobi 校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校外校内。