The locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm is a popular approach for computing a few smallest eigenvalues and the corresponding eigenvectors of a large Hermitian positive definite matrix A. In this work, we propose a mixed precision variant of LOBPCG that uses a (sparse) Cholesky factorization of A computed in reduced precision as the preconditioner. To further enhance performance, a mixed precision orthogonalization strategy is proposed. To analyze the impact of reducing precision in the preconditioner on performance, we carry out a rounding error and convergence analysis of PINVIT, a simplified variant of LOBPCG. Our theoretical results predict and our numerical experiments confirm that the impact on convergence remains marginal. In practice, our mixed precision LOBPCG algorithm typically reduces the computation time by a factor of 1.4--2.0 on both CPUs and GPUs.
翻译:在这项工作中,我们提出了LOBPCG的混合精确变式,即以低精度计算的A(Sparse)孔雀因子化为先决条件。为了进一步提高性能,提出了混合精确或分解战略。为了分析降低先决条件精确度对性能的影响,我们分析了PINVIT(LOBPCG的简化变式)的四舍五入的错误和趋同分析。我们的理论结果预测和数字实验证实,对趋同的影响仍然微不足道。在实践中,我们混合精确LOBPCG的算法通常将CPU和GPU的计算时间减少1.4-2.0倍。</s>