Gaussian Process Regression (GPR) is an important type of supervised machine learning model with inherent uncertainty measure in its predictions. We propose a new framework, nuGPR, to address the well-known challenge of high computation cost associated with GPR training. Our framework includes several ideas from numerical linear algebra to reduce the amount of computation in key steps of GPR, and we combine them to establish an end-to-end training algorithm. Specifically, we leverage the preconditioned conjugate gradient method to accelerate the convergence of the linear solves required in GPR. We exploit clustering in the input data to identify block-diagonal structure of the covariance matrix and subsequently construct low-rank approximations of the off-diagonal blocks. These enhancements significantly reduce the time and space complexity of our computations. In addition, unlike other frameworks that rely on exact differentiation, we employ numerical gradients to optimize the hyperparameters of our GPR model, further reducing the training cost by eliminating the need for backpropagation. Lastly, we leverage the CUDA Toolkit to efficiently parallelize the training procedure on NVIDIA GPUs. As a result, nuGPR reduces total training time by up to 2x and peak memory consumption by up to 12x on various synthetic and real-world datasets when compared to the best existing GPU-based GPR implementation.
翻译:高斯过程回归(GPR)是一类重要的监督机器学习模型,其预测具有内在的不确定性度量。本文提出一个新框架nuGPR,以应对GPR训练中众所周知的高计算成本挑战。该框架融合了数值线性代数中的多种思路,以减少GPR关键步骤中的计算量,并将它们整合为一个端到端的训练算法。具体而言,我们利用预处理共轭梯度法加速GPR中所需线性求解的收敛速度。通过挖掘输入数据中的聚类结构,我们识别协方差矩阵的块对角结构,并据此构建非对角块的低秩近似。这些改进显著降低了计算的时间与空间复杂度。此外,与其他依赖精确微分的框架不同,我们采用数值梯度来优化GPR模型的超参数,通过避免反向传播进一步降低了训练成本。最后,我们借助CUDA工具包在NVIDIA GPU上高效并行化训练流程。实验表明,与现有最佳的基于GPU的GPR实现相比,nuGPR在多种合成与真实数据集上可将总训练时间最高减少2倍,峰值内存消耗最高降低12倍。