Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strategies at the linear algebra algorithms level if we want to exploit the hardware to its full potential while meeting the accuracy requirements. In this paper, we focus on preconditioned sparse iterative linear solvers, a key kernel in several CSE applications. We present a study of multiprecision strategies for accelerating this kernel on GPUs. We seek the best methods for incorporating multiple precisions into the GMRES linear solver; these include iterative refinement and parallelizable preconditioners. Our work presents strategies to determine when multiprecision GMRES will be effective and to choose parameters for a multiprecision iterative refinement solver to achieve better performance. We use an implementation that is based on the Trilinos library and employs Kokkos Kernels for performance portability of linear algebra kernels. Performance results demonstrate the promise of multiprecision approaches and demonstrate even further improvements are possible by optimizing low-level kernels.
翻译:在加速器硬件中,由于电力使用减少、数据流动减少和计算性能提高,对低精度计算的支持越来越普遍,加速器硬件中越来越普遍,然而,计算科学和工程(CSE)问题需要若干领域的双精度精确度。硬件趋势与应用需要之间的这种冲突导致需要线性代数算法层面的多精度战略,如果我们想充分利用硬件充分发挥其潜力,同时满足准确性要求。在本文件中,我们侧重于一些中央电子系统应用中的关键内核,即稀薄的迭式线性线性求解器。我们介绍了加速GPRES内核这一内核速度的多精度战略研究。我们寻求将多重精度纳入GMRES线性求解器的最佳方法;其中包括迭接改进和可平行的前提条件。我们的工作提出了确定多精度GRES何时有效并选择多精度迭性迭性求精度求精的参数的战略,以便取得更好的性能。我们利用Trilinos图书馆的操作,并利用Kokos Kernels Kernels 来加速Gegebrains内核内核的性可达性定位。我们可能实现低水平的改进。