Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for mixed precision strategies at the linear algebra algorithms level if we want to exploit the hardware to its full potential while meeting the accuracy requirements. In this paper, we focus on preconditioned sparse iterative linear solvers, a key kernel in several CSE applications. We present a study of mixed precision strategies for accelerating this kernel on an NVIDIA V$100$ GPU with a Power 9 CPU. We seek the best methods for incorporating multiple precisions into the GMRES linear solver; these include iterative refinement and parallelizable preconditioners. Our work presents strategies to determine when mixed precision GMRES will be effective and to choose parameters for a mixed precision iterative refinement solver to achieve better performance. We use an implementation that is based on the Trilinos library and employs Kokkos Kernels for performance portability of linear algebra kernels. Performance results demonstrate the promise of mixed precision approaches and demonstrate even further improvements are possible by optimizing low-level kernels.
翻译:在加速器硬件中,由于电力使用减少、数据流动减少和计算性能提高,对低精度计算的支持越来越常见。然而,计算科学和工程(CSE)问题在一些领域需要双精度精确度。硬件趋势和应用需求之间的这种冲突导致需要线性代数算法层面的混合精准战略,如果我们想充分利用硬件充分发挥其潜力,同时满足准确性要求。在本文件中,我们侧重于一些CSE应用中的关键内核,即稀薄的迭余线性线性解压器。我们在NVIDIA V100美元GPU上提出加速这一内核的混合精密战略研究,并用PPP9CPU来加速这一内核的混合精密战略。我们寻求最佳的方法将多精度纳入GMRES线性求解器;其中包括迭接改进和可平行的先决条件。我们的工作提出了确定混合精度GMRES何时有效,并选择混合迭性迭性精度精度改进解求解的参数,以取得更好的性能。我们使用Trilinos图书馆,并使用Kkos Knels Knels 来提高线性精度精度的精准性精度的精度。我们通过混合精度,展示结果展示可能显示结果。