Low precision arithmetic, in particular half precision (16-bit) floating point arithmetic, is now available in commercial hardware. Using lower precision can offer significant savings in computation and communication costs with proportional savings in energy. Motivated by this, there have recently emerged a number of new iterative refinement schemes for solving linear systems $Ax=b$, both based on standard LU factorization and GMRES solvers, that exploit multiple different precisions. Each particular algorithm and each combination of precisions leads to different condition number-based constraints for convergence of the backward and forward errors, and each has different performance costs. Given that the user may not necessarily know the condition number of their matrix a priori, it may be difficult to select the optimal variant for their problem. In this work, we develop a three-stage mixed precision iterative refinement solver which aims to combine existing mixed precision approaches to balance performance and accuracy and improve usability. For a given combination of precisions, the algorithm begins with the least expensive approach and convergence is monitored via inexpensive computations with quantities produced during the iteration. If slow convergence or divergence is detected using particular stopping criteria, the algorithm switches to use more expensive, but more reliable GMRES-based refinement approaches. After presenting the algorithm and its details, we perform extensive numerical experiments on a variety of random dense problems and problems from real applications. Our experiments demonstrate that the theoretical constraints derived in the literature are often overly strict in practice, further motivating the need for a multistage approach.
翻译:商业硬件现在可以提供低精密算术,特别是半精密(16比特)浮动点算术。使用低精度可以节省大量计算和通信成本,同时节省能源。为此,最近出现了一些新的迭代精细计划,以解决线性系统$Ax=b$,其基础是标准的LU因数化和GMRES溶液,使用多种不同的精确度。每种特定算法和每种精确度组合都会导致不同的条件数字限制,导致后向和前向错误的趋同,而且每个错误都有不同的性能成本。鉴于用户不一定知道其矩阵的条件数目,因此可能很难选择问题的最佳变方。在这项工作中,我们开发了一个三阶段混合精密的迭代精细精度解决方案,目的是将现有的混合精度方法结合起来,平衡性能和准确性,提高可用性能。对于某种特定的精度组合,算法从最便宜的计算法和趋同方法开始,通过反复计算所产生的数量来监测。如果发现缓慢的趋同或差异,则可能很难选择特定标准,因此很难选择最佳的算法转换到更精确的精确的试算方法。我们进行更昂贵的更精确的试算方法,然后进行更精确的试算。在更精确的、更精确的试算方法。