Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.
翻译:高斯进程与数据集大小相比规模过大。 作为回应, 已经开发了许多近似方法, 这不可避免地会引入近似错误。 由于计算有限, 这个额外的不确定性来源在使用近似后部时被完全忽略。 因此, 在实践上, GP 模型往往与数据一样, 与近近似方法相同。 在这里, 我们开发了一种新的方法类别, 以一致的方式估算所观察到数据的有限数量和计算所花费的有限数量所产生的综合不确定性。 最常用的 GP 近似图与这一类中的一个实例相匹配, 如基于Cholesky 系数化、 conjuge 梯度和导出点的方法。 对于这一类中的任何方法, 我们证明 (一) 其后端值在相关的 RKHS 中的趋同, (二) 其组合后端常相异性在数学和计算变量中的不兼容性, 以及 (三) 合并的差是该方法后端值和潜在函数之间发生正方差的最坏的情况。 最后, 我们用实验性地展示了该方法的模型如何改进了它。