Kernel matrix vector multiplication (KMVM) is a ubiquitous operation in machine learning and scientific computing, spanning from the kernel literature to signal processing. As kernel matrix vector multiplication tends to scale quadratically in both memory and time, applications are often limited by these computational scaling constraints. We propose a novel approximation procedure coined Faster-Fast and Free Memory Method ($\text{F}^3$M) to address these scaling issues for KMVM. Extensive experiments demonstrate that $\text{F}^3$M has empirical \emph{linear time and memory} complexity with a relative error of order $10^{-3}$ and can compute a full KMVM for a billion points \emph{in under one minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We further demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, \emph{improving speed 3-5 times} at the cost of $<$1\% drop in accuracy.
翻译:内核矩阵矢量倍增(KMVM)是机器学习和科学计算中一个无处不在的操作,从内核文献到信号处理。内核矩阵矢量倍增往往在记忆和时间上以二次方式扩大规模,应用往往受到这些计算缩放限制的限制。我们提议了一个新的近似程序,即快速快速和自由记忆方法($\text{F ⁇ 3$M),用于解决KMVM的这些缩放问题。广泛的实验表明,$\text{F ⁇ 3$M 具有经验性\emph{线性时间和记忆} 复杂性,其相对错误为 10 ⁇ -3} 美元,并且可以在高端GPU上计算出一个十亿个点 emph{在一分钟以下的全KMVM,导致与现有的CPU方法相比大幅加速。我们进一步证明我们的程序的效用,将它作为基于GPU-线性溶液的直线性求解器,提高速度3-5倍。