Scalable Gaussian Process methods are computationally attractive, yet introduce modeling biases that require rigorous study. This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features (RFF). We find that both methods introduce a systematic bias on the learned hyperparameters: CG tends to underfit while RFF tends to overfit. We address these issues using randomized truncation estimators that eliminate bias in exchange for increased variance. In the case of RFF, we show that the bias-to-variance conversion is indeed a trade-off: the additional variance proves detrimental to optimization. However, in the case of CG, our unbiased learning procedure meaningfully outperforms its biased counterpart with minimal additional computation.
翻译:可缩放高斯进程方法在计算上具有吸引力,但引入了需要严格研究的模型偏差。本文分析了两种常见技术:早期脱节的同化梯度和随机的Fourier特性。我们发现这两种方法都对学到的超参数有系统性偏差:CG往往不适,而RFF往往过于适中。我们使用随机的短程估计符来解决这些问题,消除偏差以换取更大的差异。在RFF的例子中,我们发现偏向偏差转换确实是一种权衡:额外的差异不利于优化。然而,在CG的例子中,我们不带偏见的学习程序明显优于其偏差的对应程序,只有极少的额外计算。