Gaussian processes with derivative information are useful in many settings where derivative information is available, including numerous Bayesian optimization and regression tasks that arise in the natural sciences. Incorporating derivative observations, however, comes with a dominating $O(N^3D^3)$ computational cost when training on $N$ points in $D$ input dimensions. This is intractable for even moderately sized problems. While recent work has addressed this intractability in the low-$D$ setting, the high-$N$, high-$D$ setting is still unexplored and of great value, particularly as machine learning problems increasingly become high dimensional. In this paper, we introduce methods to achieve fully scalable Gaussian process regression with derivatives using variational inference. Analogous to the use of inducing values to sparsify the labels of a training set, we introduce the concept of inducing directional derivatives to sparsify the partial derivative information of a training set. This enables us to construct a variational posterior that incorporates derivative information but whose size depends neither on the full dataset size $N$ nor the full dimensionality $D$. We demonstrate the full scalability of our approach on a variety of tasks, ranging from a high dimensional stellarator fusion regression task to training graph convolutional neural networks on Pubmed using Bayesian optimization. Surprisingly, we find that our approach can improve regression performance even in settings where only label data is available.
翻译:含有衍生信息的高斯进程在许多有衍生信息的环境下非常有用, 包括许多在自然科学中出现的贝叶斯优化和回归任务。 但是, 包含衍生意见的观察, 在以美元输入维度计的点数培训时, 以美元( $D$ 3) 计算成本为以美元计的计算成本。 这甚至对于规模较小的问题都是难以解决的。 虽然最近的工作解决了低D美元设置中的这种可吸引性, 高美元、 高D美元设置仍未被探索, 并且价值巨大, 特别是随着机器学习问题日益变得高度。 在本文中, 我们引入了方法, 实现完全可缩放高的高斯进程回归, 并使用变异性推导值来调一套培训集的标签。 我们引入了引导性衍生工具的概念, 将一组培训中的一部分衍生信息放大。 这使得我们只能构建一个变异式的后继信息, 但其大小既不取决于完整的数据设置大小 $N$, 也不取决于完全的平面的正值网络, 。 我们用一个高维度的平面的平面的平面化任务, 我们的平面的平面的平面的平流任务 。