Gaussian Processes (GPs) provide a powerful probabilistic framework for interpolation, forecasting, and smoothing, but have been hampered by computational scaling issues. Here we prove that for data sampled on one dimension (e.g., a time series sampled at arbitrarily-spaced intervals), approximate GP inference at any desired level of accuracy requires computational effort that scales linearly with the number of observations; this new theorem enables inference on much larger datasets than was previously feasible. To achieve this improved scaling we propose a new family of stationary covariance kernels: the Latent Exponentially Generated (LEG) family, which admits a convenient stable state-space representation that allows linear-time inference. We prove that any continuous integrable stationary kernel can be approximated arbitrarily well by some member of the LEG family. The proof draws connections to Spectral Mixture Kernels, providing new insight about the flexibility of this popular family of kernels. We propose parallelized algorithms for performing inference and learning in the LEG model, test the algorithm on real and synthetic data, and demonstrate scaling to datasets with billions of samples.
翻译:高斯进程(GPs)为内推、预测和平滑提供了强大的概率框架,但因计算比例问题而受到阻碍。在这里,我们证明,对于在某一维度(例如,在任意间隔的间隔期间对时间序列进行抽样抽样)上取样的数据,任何理想的精确度的近似GP推论都要求计算努力,以线性尺度与观测次数相近;这一新理论使人能够推断出比以前可行的大得多的数据集。为了实现这一改进的缩放,我们建议建立一个新的固定共变内核系列:热中生成的(LEG)组,该组接受一种方便的稳定状态空间代表,允许线性时间推断。我们证明,任何连续的可测量的固定内核内核都可以任意地与专家组的一些成员任意地相近。证据与Spectricmixture Kenels相连接,提供了对这一广受欢迎的内核圈的灵活性的新见解。我们建议平行的算法,用于在GLEG中进行真实的推测和学习数据模型、数十级算。