Gaussian process hyperparameter optimization requires linear solves with, and log-determinants of, large kernel matrices. Iterative numerical techniques are becoming popular to scale to larger datasets, relying on the conjugate gradient method (CG) for the linear solves and stochastic trace estimation for the log-determinant. This work introduces new algorithmic and theoretical insights for preconditioning these computations. While preconditioning is well understood in the context of CG, we demonstrate that it can also accelerate convergence and reduce variance of the estimates for the log-determinant and its derivative. We prove general probabilistic error bounds for the preconditioned computation of the log-determinant, log-marginal likelihood and its derivatives. Additionally, we derive specific rates for a range of kernel-preconditioner combinations, showing that up to exponential convergence can be achieved. Our theoretical results enable provably efficient optimization of kernel hyperparameters, which we validate empirically on large-scale benchmark problems. There our approach accelerates training by up to an order of magnitude.
翻译:Gausian 进程超光度优化要求与大型内核矩阵进行线性解析和对数值的确定。循环数字技术正在变得日益流行,以更大的数据集为尺度,依靠线性解析和对日志-确定性估算的共振梯度法(CG)和随机痕量估计。这项工作为这些计算的先决条件引入了新的算法和理论洞察力。在CG的背景下,我们非常理解先决条件,但我们也证明它能够加速对日志-确定性及其衍生物的估计数的趋同并减少其差异。我们证明对日志-确定性、日志-边缘可能性及其衍生物的预设计算存在一般概率误差。此外,我们为一系列内核-预设性组合得出了具体的比率,表明可以达到指数趋同。我们的理论结果可以使内核超常分数的精度优化,我们从经验角度验证了大规模基准问题。我们的方法将培训速度加速到一定的高度。