Gaussian processes provide a framework for nonlinear nonparametric Bayesian inference widely applicable across science and engineering. Unfortunately, their computational burden scales cubically with the training sample size, which in the case that samples arrive in perpetuity, approaches infinity. This issue necessitates approximations for use with streaming data, which to date mostly lack convergence guarantees. Thus, we develop the first online Gaussian process approximation that preserves convergence to the population posterior, i.e., asymptotic posterior consistency, while ameliorating its intractable complexity growth with the sample size. We propose an online compression scheme that, following each a posteriori update, fixes an error neighborhood with respect to the Hellinger metric centered at the current posterior, and greedily tosses out past kernel dictionary elements until its boundary is hit. We call the resulting method Parsimonious Online Gaussian Processes (POG). For diminishing error radius, exact asymptotic consistency is preserved (Theorem 1(i)) at the cost of unbounded memory in the limit. On the other hand, for constant error radius, POG converges to a neighborhood of the population posterior (Theorem 1(ii))but with finite memory at-worst determined by the metric entropy of the feature space (Theorem 2). Experimental results are presented on several nonlinear regression problems which illuminates the merits of this approach as compared with alternatives that fix the subspace dimension defining the history of past points.
翻译:高斯进程提供了一个框架,用于在科学和工程领域广泛应用的非线性非参数性贝亚斯测算。 不幸的是, 他们的计算负担比例与培训样本大小不相上下, 在样本到达永久时间的情况下, 方法不尽相同 。 这个问题需要使用流数据近似, 而流数据迄今为止大多缺乏趋同保证 。 因此, 我们开发了第一个在线高斯进程近似框架, 以保持与人口后端的趋同, 即: 无症状后端线的一致性, 同时用样本大小来缓解其棘手的复杂程度增长 。 我们提议了一个在线压缩计划, 在每次事后更新后, 修正与当前尾端的 Hellinger 标准相邻的误差, 并且贪婪地将过去的内脏元素从上移出来, 直到边界被击中为止 。 我们称由此产生的方法 Parsimonous 在线高斯进程(POGOGOG) 。 为了减少误差半径直径直径直径直的准确性一致性( Theorem 1, i) 是在不伸缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩图中, 。