We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting NVIDIA Graphics Processing Unit (GPU) accelerators. The work extends our previous efforts in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the single GPU kernels. Strong and weak scalability results on well-known benchmark test cases of the new version of the library are discussed. Comparisons with the Nvidia AmgX solution show an improvement of up to 2.0x in the solve phase.
翻译:我们以开放源格式提出并发布一个稀薄的线性求解器,它有效地利用了多种平行计算机。解答器可以很容易地纳入科学应用,这些应用需要解决由NIVIDAA图形处理股(GPU)加速器的混合节点制造的现代平行计算机上的大型和稀散线性系统。这项工作扩大了我们以前在利用单一的GPU加速器方面所作的努力,并提议根据混合的MPI-CUDA软件环境,实施Krylov型线性求解器,依靠在BoutCMatchG库中已有的高效高热量多Grid(AMG)先决条件。我们混合执行的设计是受以下最佳做法驱动的:在使用多个GPU时尽量减少数据通信的间接费用,同时保持单一的GPU内核的效率。讨论了关于新版本图书馆的著名基准测试案例的强弱可缩度和可缩度结果。与Nvidia AmgX解决方案的比较显示在解决阶段改进到2.0x。</s>