We propose two approaches to extend the notion of knowledge distillation to Gaussian Process Regression (GPR) and Gaussian Process Classification (GPC); data-centric and distribution-centric. The data-centric approach resembles most current distillation techniques for machine learning, and refits a model on deterministic predictions from the teacher, while the distribution-centric approach, re-uses the full probabilistic posterior for the next iteration. By analyzing the properties of these approaches, we show that the data-centric approach for GPR closely relates to known results for self-distillation of kernel ridge regression and that the distribution-centric approach for GPR corresponds to ordinary GPR with a very particular choice of hyperparameters. Furthermore, we demonstrate that the distribution-centric approach for GPC approximately corresponds to data duplication and a particular scaling of the covariance and that the data-centric approach for GPC requires redefining the model from a Binomial likelihood to a continuous Bernoulli likelihood to be well-specified. To the best of our knowledge, our proposed approaches are the first to formulate knowledge distillation specifically for Gaussian Process models.
翻译:我们提出了两种方法来扩展知识蒸馏的概念到高斯过程回归(GPR)和高斯过程分类(GPC);数据中心方法和分布中心方法。数据中心方法类似于当前机器学习的大多数蒸馏技术,对老师的确定性预测重新拟合模型,而分布中心方法则复用完整的概率后验用于下一次迭代。通过分析这些方法的属性,我们展示了高斯过程回归的数据中心方法与已知的核岭回归自蒸馏的结果密切相关,并且GPR的分布中心方法对应于具有特定超参数选择的普通GPR。此外,我们证明了GPC的分布中心方法大致相当于数据复制和协方差的特定缩放,并且GPC的数据中心方法需要重新定义模型从二项式似然到连续伯努利似然,才能是完备的。据我们所知,我们提出的方法是第一个专门针对高斯过程模型制定知识蒸馏的方法。