With the recent realization of exascale performace by Oak Ridge National Laboratory's Frontier supercomputer, reducing communication in kernels like QR factorization has become even more imperative. Low-synchronization Gram-Schmidt methods, first introduced in [K. \'{S}wirydowicz, J. Langou, S. Ananthan, U. Yang, and S. Thomas, Low Synchronization Gram-Schmidt and Generalized Minimum Residual Algorithms, Numer. Lin. Alg. Appl., Vol. 28(2), e2343, 2020], have been shown to improve the scalability of the Arnoldi method in high-performance distributed computing. Block versions of low-synchronization Gram-Schmidt show further potential for speeding up algorithms, as column-batching allows for maximizing cache usage with matrix-matrix operations. In this work, low-synchronization block Gram-Schmidt variants from [E. Carson, K. Lund, M. Rozlo\v{z}n\'{i}k, and S. Thomas, Block Gram-Schmidt algorithms and their stability properties, Lin. Alg. Appl., 638, pp. 150--195, 2022] are transformed into block Arnoldi variants for use in block full orthogonalization methods (BFOM) and block generalized minimal residual methods (BGMRES). An adaptive restarting heuristic is developed to handle instabilities that arise with the increasing condition number of the Krylov basis. The performance, accuracy, and stability of these methods are assessed via a flexible benchmarking tool written in MATLAB. The modularity of the tool additionally permits generalized block inner products, like the global inner product.
翻译:随着最近Oak Ridge National实验室的Forest Servicial Extarace的实现,降低QR因子化等内核内核的通信已变得更加迫切。低同步格-Schmidt方法首先在[K.\'{S}wirydowicz、J. Langou、S. Ananthan、U.Yang和S.Thomas,低同步格-Schmidt和普遍化最低残留离子 Algoral Algorithms,Nummer.Lg.Lg.Appl.,vol.28(2), e2343,2020]中出现。低同步格-Schmidt方法首先在[K. 28(2), e2343, e2343, 2020]中出现。低同步格-Schmidrimal-Smidalalalalalality 方法的变异性化性化。在这项工作中,其内部稳定-alvidual-ral-ralalal listralalal lixal lixal dal dal dalxlations dislations dislations dislations dism,这些方法在[E.] 和Som-revalxalxalxalxalxalxalxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,这些,这些,这些,这些,这些,这些法, 和Sxxx,