Standard rank-revealing factorizations such as the singular value decomposition and column pivoted QR factorization are challenging to implement efficiently on a GPU. A major difficulty in this regard is the inability of standard algorithms to cast most operations in terms of the Level-3 BLAS. This paper presents two alternative algorithms for computing a rank-revealing factorization of the form $A = U T V^*$, where $U$ and $V$ are orthogonal and $T$ is triangular. Both algorithms use randomized projection techniques to cast most of the flops in terms of matrix-matrix multiplication, which is exceptionally efficient on the GPU. Numerical experiments illustrate that these algorithms achieve an order of magnitude acceleration over finely tuned GPU implementations of the SVD while providing low-rank approximation errors close to that of the SVD.
翻译:单值分解和列柱分解 QR 系数化等标准分解因子化因素化,对于在GPU上高效实施来说,具有挑战性。在这方面,一个主要困难是标准算法无法在3级BLAS上实施大多数操作。本文介绍了两种替代算法,用于计算表A=U T V ⁇ $的分解因子化,其中美元和V$为正方位和美元为三角。两种算法都使用随机预测技术,在矩阵矩阵矩阵矩阵乘法方面将大多数Flops投放到GPU上,这是特别高效的。数字实验表明,这些算法在微调的GPU执行SVD之后实现了数量级加速,同时提供了与SVD相近的低级近差。