流流数据分割高斯进程回归 (Splitting Gaussian Process Regression for Streaming Data)

Gaussian processes offer a flexible kernel method for regression. While Gaussian processes have many useful theoretical properties and have proven practically useful, they suffer from poor scaling in the number of observations. In particular, the cubic time complexity of updating standard Gaussian process models make them generally unsuitable for application to streaming data. We propose an algorithm for sequentially partitioning the input space and fitting a localized Gaussian process to each disjoint region. The algorithm is shown to have superior time and space complexity to existing methods, and its sequential nature permits application to streaming data. The algorithm constructs a model for which the time complexity of updating is tightly bounded above by a pre-specified parameter. To the best of our knowledge, the model is the first local Gaussian process regression model to achieve linear memory complexity. Theoretical continuity properties of the model are proven. We demonstrate the efficacy of the resulting model on multi-dimensional regression tasks for streaming data.

翻译：Gausian 进程为回归提供了一个灵活的内核方法。虽然 Gaussian 进程有许多有用的理论属性, 并被证明是实际有用的, 但它们在观测数量上却因测量量的缩放差而受到损害。特别是, 更新标准Gaussian 进程模型的立方时间复杂性使这些模型一般不适合应用于流数据。我们建议了一种算法, 用于按顺序分割输入空间, 并将一个本地化的高斯进程进程与每个脱节区域相配。算法显示, 对现有方法来说, 其时间和空间的复杂度较高, 其顺序性质允许将数据应用到流中。算法构建了一个模型, 其更新的时间复杂性被一个预先指定的参数严格地绑在以上。根据我们的知识, 模型是第一个本地的高斯进程回归模型, 以达到线性记忆复杂性。模型的理论连续性特性得到证明。我们展示了流数据多维回归任务模型的功效。