Gaussian Processes (GPs) are Bayesian models that provide uncertainty estimates associated to the predictions made. They are also very flexible due to their non-parametric nature. Nevertheless, GPs suffer from poor scalability as the number of training instances N increases. More precisely, they have a cubic cost with respect to $N$. To overcome this problem, sparse GP approximations are often used, where a set of $M \ll N$ inducing points is introduced during training. The location of the inducing points is learned by considering them as parameters of an approximate posterior distribution $q$. Sparse GPs, combined with variational inference for inferring $q$, reduce the training cost of GPs to $\mathcal{O}(M^3)$. Critically, the inducing points determine the flexibility of the model and they are often located in regions of the input space where the latent function changes. A limitation is, however, that for some learning tasks a large number of inducing points may be required to obtain a good prediction performance. To address this limitation, we propose here to amortize the computation of the inducing points locations, as well as the parameters of the variational posterior approximation q. For this, we use a neural network that receives the observed data as an input and outputs the inducing points locations and the parameters of $q$. We evaluate our method in several experiments, showing that it performs similar or better than other state-of-the-art sparse variational GP approaches. However, with our method the number of inducing points is reduced drastically due to their dependency on the input data. This makes our method scale to larger datasets and have faster training and prediction times.
翻译:高斯进程(GPs)是提供与预测相关的不确定性估计的巴伊西亚模型,由于其非参数性质,这些模型也非常灵活。然而,随着培训次数的增加,GPs的可缩缩度较低。更精确地说,他们拥有与美元有关的立方成本。为了解决这一问题,常常使用零散的GP近似值,在培训期间引入一套美元=1美元/ll N$的诱导点。通过将这些诱导点视为近似后方分配值的参数来学习这些点的位置。粗略的GPs,加上调值$的变差,加上调值美元的差异推算,使得GPs的培训成本降低到$mathcal{O}(M3/3)美元。关键是,导出点决定了模型的灵活性,而且它们往往位于潜在功能变化的输入空间区域。然而,对于某些学习任务,可能需要大量的导引点,以获得良好的预测性能。为了应对这一限制,我们在这里建议采用更精确的递增值推算,我们所观测到的网络输入点的精确度,从而计算出数据。