We are interested in solving linear systems arising from three applications: (1) kernel methods in machine learning, (2) discretization of boundary integral equations from mathematical physics, and (3) Schur complements formed in the factorization of many large sparse matrices. The coefficient matrices are often data-sparse in the sense that their off-diagonal blocks have low numerical ranks; specifically, we focus on "hierarchically off-diagonal low-rank (HODLR)" matrices. We introduce algorithms for factorizing HODLR matrices and for applying the factorizations on a GPU. The algorithms leverage the efficiency of batched dense linear algebra, and they scale nearly linearly with the matrix size when the numerical ranks are fixed. The accuracy of the HODLR-matrix approximation is a tunable parameter, so we can construct high-accuracy fast direct solvers or low-accuracy robust preconditioners. Numerical results show that we can solve problems with several millions of unknowns in a couple of seconds on a single GPU.
翻译:我们感兴趣的是解决由三种应用产生的线性系统:(1) 机器学习的内核方法,(2) 数学物理的分解分立分界等式,(3) 数学物理的分界分立等方程式,(3) 由许多大型分散矩阵的因子化形成Schur补充。系数矩阵往往数据偏差,因为其离对角区块的数值级别较低;具体地说,我们侧重于“横向离对角低端(HODLR)”矩阵。我们引入了对HODLR矩阵进行乘法和对GPU应用乘法的算法。这些算法利用了分批密度密度直线代数的效率,在数字级固定时,它们几乎与矩阵大小成线性。HODLR-矩阵的精确度是一个金枪鱼分数参数,因此我们可以构建高精度快速直径直径求解器或低精度稳健性前奏器。数字结果显示,我们可以在几秒钟内用几百万个未知的GPU解决问题。