Gaussian Processes (GPs) are known to provide accurate predictions and uncertainty estimates even with small amounts of labeled data by capturing similarity between data points through their kernel function. However traditional GP kernels are not very effective at capturing similarity between high dimensional data points. Neural networks can be used to learn good representations that encode intricate structures in high dimensional data, and can be used as inputs to the GP kernel. However the huge data requirement of neural networks makes this approach ineffective in small data settings. To solves the conflicting problems of representation learning and data efficiency, we propose to learn deep kernels on probabilistic embeddings by using a probabilistic neural network. Our approach maps high-dimensional data to a probability distribution in a low dimensional subspace and then computes a kernel between these distributions to capture similarity. To enable end-to-end learning, we derive a functional gradient descent procedure for training the model. Experiments on a variety of datasets show that our approach outperforms the state-of-the-art in GP kernel learning in both supervised and semi-supervised settings. We also extend our approach to other small-data paradigms such as few-shot classification where it outperforms previous approaches on mini-Imagenet and CUB datasets.
翻译:已知高斯进程(GPs) 即使通过通过内核功能获取数据点之间的相似性,以少量贴标签数据来提供准确的预测和不确定性估计,即使有少量的标签数据,也可以提供准确的预测和不确定性估计。然而,传统的GP内核在获取高维数据点之间的相似性方面并不十分有效。神经网络可以用来学习将高维数据的复杂结构编码成高维数据点的良好表达方式,并用作GP内核的投入。然而,神经网络的巨大数据要求使得该方法在小型数据设置中无效。为了解决代表性学习和数据效率方面的相互冲突问题,我们建议使用概率神经网络来学习关于概率性嵌入的深内核。我们的方法将高维数据绘制成低维亚空间的概率分布,然后将这些分布之间的内核内核编用于获取相似性。为了便于端到终端学习,我们为培训模型而采用了功能性梯度下降程序。在各种数据集上的实验表明,我们的方法超越了当前状态的内核内核内核定位内核方法,在前几处的GP内核数据分类中,我们作为前一模级的GP- 的外的内核外的模型学习模式中,也将它作为前的GP- 。