超越状态空间表示：核包络的通用理论 (Beyond State Space Representation: A General Theory for Kernel Packets)

Gaussian process (GP) regression provides a flexible, nonparametric framework for probabilistic modeling, yet remains computationally demanding in large-scale applications. For one-dimensional data, state space (SS) models achieve linear-time inference by reformulating GPs as stochastic differential equations (SDEs). However, SS approaches are confined to gridded inputs and cannot handle multi-dimensional scattered data. We propose a new framework based on kernel packet (KP), which overcomes these limitations while retaining exactness and scalability. A KP is a compactly supported function defined as a linear combination of the GP covariance functions. In this article, we prove that KPs can be identified via the forward and backward SS representations. We also show that the KP approach enables exact inference with linear-time training and logarithmic or constant-time prediction, and extends naturally to multi-dimensional gridded or scattered data without low-rank approximations. Numerical experiments on large-scale additive and product-form GPs with millions of samples demonstrate that KPs achieve exact, memory-efficient inference where SDE-based and low-rank GP methods fail.

翻译：高斯过程（GP）回归为概率建模提供了一个灵活的非参数框架，但在大规模应用中仍存在计算负担。对于一维数据，状态空间（SS）模型通过将高斯过程重新表述为随机微分方程（SDE），实现了线性时间推断。然而，SS方法仅限于网格化输入，无法处理多维散点数据。我们提出了一种基于核包络（KP）的新框架，该框架克服了这些限制，同时保持了精确性和可扩展性。核包络是一种紧支撑函数，定义为高斯过程协方差函数的线性组合。本文证明了核包络可以通过前向和后向状态空间表示来识别。我们还表明，核包络方法支持线性时间训练和对数或常数时间预测的精确推断，并能自然地扩展到多维网格化或散点数据，无需低秩近似。在包含数百万样本的大规模加性和乘积形式高斯过程上的数值实验表明，核包络实现了精确且内存高效的推断，而基于随机微分方程和低秩高斯过程的方法在此类场景中失效。