Randomized numerical linear algebra - RandNLA, for short - concerns the use of randomization as a resource to develop improved algorithms for large-scale linear algebra computations. The origins of contemporary RandNLA lay in theoretical computer science, where it blossomed from a simple idea: randomization provides an avenue for computing approximate solutions to linear algebra problems more efficiently than deterministic algorithms. This idea proved fruitful in the development of scalable algorithms for machine learning and statistical data analysis applications. However, RandNLA's true potential only came into focus upon integration with the fields of numerical analysis and "classical" numerical linear algebra. Through the efforts of many individuals, randomized algorithms have been developed that provide full control over the accuracy of their solutions and that can be every bit as reliable as algorithms that might be found in libraries such as LAPACK. Recent years have even seen the incorporation of certain RandNLA methods into MATLAB, the NAG Library, NVIDIA's cuSOLVER, and SciKit-Learn. For all its success, we believe that RandNLA has yet to realize its full potential. In particular, we believe the scientific community stands to benefit significantly from suitably defined "RandBLAS" and "RandLAPACK" libraries, to serve as standards conceptually analogous to BLAS and LAPACK. This 200-page monograph represents a step toward defining such standards. In it, we cover topics spanning basic sketching, least squares and optimization, low-rank approximation, full matrix decompositions, leverage score sampling, and sketching data with tensor product structures (among others). Much of the provided pseudo-code has been tested via publicly available MATLAB and Python implementations.
翻译:随机数值线性代数(RandNLA)关注利用随机化作为一种资源,开发用于大规模线性代数计算的改进算法。现代RandNLA的由来可以追溯到理论计算机科学,其中它从一个简单的想法中得以发展:随机化提供了一种途径,使得近似解得以更高效地计算而不需要借助确定性算法。该想法在开发可扩展算法用于机器学习和统计数据分析应用方面证明了其成果。然而,RandNLA的真正潜力只有在与数值分析和“传统”数值线性代数领域的融合中才得以显现。在许多个人的努力下,已经开发出了随机算法,这些算法可以完全控制其解的精度,并且可以像在LAPACK等库中找到的算法一样可靠。最近几年,RandNLA的某些方法甚至被纳入MATLAB、NAG库、NVIDIA的cuSOLVER和SciKit-Learn中。尽管在其成功中,我们相信RandNLA还未实现其全部潜力。特别是,我们相信科学界将从明确定义类似于BLAS和LAPACK的适当定义的“RandBLAS”和“RandLAPACK”库中受益匪浅。本200页的专著是朝这个方向定义标准的一步。在其中,我们涵盖了基本的草图,最小二乘法和优化,低秩逼近,全矩阵分解,影响得分采样,以及使用张量积结构草图数据(等等)。提供的大量伪代码已通过公开可用的MATLAB和Python实现进行了测试。