Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since na\"ive implementations scale poorly with data size. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware. Towards this end, we designed a preconditioned gradient solver for kernel methods exploiting both GPU acceleration and parallelization with multiple GPUs, implementing out-of-core variants of common linear algebra operations to guarantee optimal hardware utilization. Further, we optimize the numerical precision of different operations and maximize efficiency of matrix-vector multiplications. As a result we can experimentally show dramatic speedups on datasets with billions of points, while still guaranteeing state of the art performance. Additionally, we make our software available as an easy to use library.
翻译:内核方法为非参数学习提供了一种优雅和有原则的方法,但迄今为止几乎无法在大规模问题中使用,因为“反”执行规模与数据大小相去甚远。最近的进展显示了一系列算法想法的好处,例如将优化、数字线性代数和随机预测结合起来。在这里,我们进一步推动这些努力,以开发和测试一个充分利用GPU硬件的求解器。为此,我们为内核方法设计了一个有先决条件的梯度求解器,利用GPU加速和与多个GPU的平行,实施普通线性代数操作的核心变体,以保证最佳利用硬件。此外,我们优化了不同操作的数字精确度,并最大限度地提高了矩阵性变种的效率。结果我们可以实验性地显示数十亿个点的数据集上的巨大加速,同时仍然保证艺术性能的状态。此外,我们把软件作为方便使用的图书馆。