关于Sparse Gaussian 过程和变数推断的教程 (A Tutorial on Sparse Gaussian Processes and Variational Inference)

Gaussian processes (GPs) provide a framework for Bayesian inference that can offer principled uncertainty estimates for a large range of problems. For example, if we consider regression problems with Gaussian likelihoods, a GP model enjoys a posterior in closed form. However, identifying the posterior GP scales cubically with the number of training examples and requires to store all examples in memory. In order to overcome these obstacles, sparse GPs have been proposed that approximate the true posterior GP with pseudo-training examples. Importantly, the number of pseudo-training examples is user-defined and enables control over computational and memory complexity. In the general case, sparse GPs do not enjoy closed-form solutions and one has to resort to approximate inference. In this context, a convenient choice for approximate inference is variational inference (VI), where the problem of Bayesian inference is cast as an optimization problem -- namely, to maximize a lower bound of the log marginal likelihood. This paves the way for a powerful and versatile framework, where pseudo-training examples are treated as optimization arguments of the approximate posterior that are jointly identified together with hyperparameters of the generative model (i.e. prior and likelihood). The framework can naturally handle a wide scope of supervised learning problems, ranging from regression with heteroscedastic and non-Gaussian likelihoods to classification problems with discrete labels, but also multilabel problems. The purpose of this tutorial is to provide access to the basic matter for readers without prior knowledge in both GPs and VI. A proper exposition to the subject enables also access to more recent advances (like importance-weighted VI as well as interdomain, multioutput and deep GPs) that can serve as an inspiration for new research ideas.

翻译：高斯进程( GPs) 为 Bayesian 推算提供了一个框架, 它可以为大量问题提供有原则的不确定性估计。例如, 如果我们考虑高斯概率的回归问题, 一个 GP 模型可以使用封闭形式的后端。但是, 将后端的 GP 比例与培训示例数进行对比, 并需要将所有实例存储在记忆中。为了克服这些障碍, 提出了将真实的后端GP 与假培训示例相近的稀薄 GP 。重要的是, 假培训范例的数量是用户定义的, 并且能够控制计算和记忆的复杂性。在一般情况下, 稀有的 GP 模型并不享有封闭式的解决方案解决方案解决方案, 并且不得不使用近端推法。在这方面, 选择近端推法的方便性选择是变推法( VI), 此时Bayesian 推论的问题被描绘成一个优化问题 -- 即尽可能降低对日志的低值, 并将其作为一个强大和通用的框架。这为一个强大的框架的路径, 。伪培训示例的例子被处理成一个不精准的版本的模型的版本的版本的版本。与前正版的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本, 也被共同被识别的版本的版本的版本的版本的版本的版本的版本。