Gaussian processes are a powerful class of non-linear models, but have limited applicability for larger datasets due to their high computational complexity. In such cases, approximate methods are required, for example, the recently developed class of Hilbert space Gaussian processes. They have been shown to drastically reduce computation time while retaining most of the favorable properties of exact Gaussian processes. However, Hilbert space approximations have so far only been developed for uni-dimensional outputs and manifest (known) inputs. To this end, we generalize Hilbert space methods to multi-output and latent input settings. Through extensive simulations, we show that the developed approximate Gaussian processes are indeed not only faster, but also provide similar or even better uncertainty calibration and accuracy of latent variable estimates compared to exact Gaussian processes. While not necessarily faster than alternative Gaussian process approximations, our new models provide better calibration and estimation accuracy, thus striking an excellent balance between trustworthiness and speed. We additionally validate our findings in a real world case study from single cell biology.
翻译:高斯过程是一类强大的非线性模型,但由于其较高的计算复杂度,在处理较大数据集时应用受限。在此类情况下,需要采用近似方法,例如最近发展起来的希尔伯特空间高斯过程类。研究表明,这类方法在保留精确高斯过程大部分优良性质的同时,能显著减少计算时间。然而,迄今为止希尔伯特空间近似方法仅针对单维输出和显式(已知)输入而开发。为此,我们将希尔伯特空间方法推广至多输出和隐变量输入场景。通过大量仿真实验,我们证明所发展的近似高斯过程不仅速度更快,而且在隐变量估计的不确定性校准和精度方面,与精确高斯过程相比具有相似甚至更优的表现。虽然该方法在速度上未必优于其他高斯过程近似方案,但新模型提供了更好的校准和估计精度,从而在可信度与速度之间实现了卓越的平衡。我们还在单细胞生物学的实际案例研究中进一步验证了上述发现。