有条件的深高斯进程:经验性贝耶斯超数据学习 (Conditional Deep Gaussian Processes: empirical Bayes hyperdata learning)

from arxiv, Accepted by Special Issue "Probabilistic Methods for Deep Learning" of Entropy, 15 pages, reference to recent papers of finite Bayesian neural network added

It is desirable to combine the expressive power of deep learning with Gaussian Process (GP) in one expressive Bayesian learning model. Deep kernel learning showed success in adopting a deep network for feature extraction followed by a GP used as function model. Recently,it was suggested that, albeit training with marginal likelihood, the deterministic nature of feature extractor might lead to overfitting while the replacement with a Bayesian network seemed to cure it. Here, we propose the conditional Deep Gaussian Process (DGP) in which the intermediate GPs in hierarchical composition are supported by the hyperdata and the exposed GP remains zero mean. Motivated by the inducing points in sparse GP, the hyperdata also play the role of function supports, but are hyperparameters rather than random variables. We follow our previous moment matching approach to approximate the marginal prior for conditional DGP with a GP carrying an effective kernel. Thus, as in empirical Bayes, the hyperdata are learned by optimizing the approximate marginal likelihood which implicitly depends on the hyperdata via the kernel. We shall show the equivalence with the deep kernel learning in the limit of dense hyperdata in latent space. However, the conditional DGP and the corresponding approximate inference enjoy the benefit of being more Bayesian than deep kernel learning. Preliminary extrapolation results demonstrate expressive power from the depth of hierarchy by exploiting the exact covariance and hyperdata learning, in comparison with GP kernel composition, DGP variational inference and deep kernel learning. We also address the non-Gaussian aspect of our model as well as way of upgrading to a full Bayes inference.

翻译：将深层学习的表达力与高斯进程(GP)结合到一个直观的贝叶西亚学习模型中是可取的。深内核学习显示,在采用深网络进行地貌提取的成功,然后是用作功能模型的GP。最近,有人提出,尽管培训可能性很小,但地貌提取器的确定性性质可能导致过度适应,而代之以巴伊西亚网络似乎能治好它。在这里,我们提议有条件的深高斯进程(DGP),在这个进程中,等级构成的中级GP得到超数据的支持,暴露的GP仍然为零平均值。受稀有的GP的导点驱动,超数据也起到功能支持的作用,但具有超光度的参数,而不是随机变量。我们沿用我们以前的做法,将有条件的DGPP提取器的确定性特性与带有有效内核细胞的GPGP网络网络网络的替代。因此,我们从优化极低的深度概率可能性中了解到,这隐含地取决于通过内核超数据。我们要显示,在深度的深度的深度数据中,从深层GPGPL变变的深度中,从深层的深度学习中,从深层的深度学习到深层GGG的深度,从深层的深层的深层数据,从深层深层的深层的深层数据,从深层的深度学习到深层深层的深层的深层的深度学习,从深层的深层的深度学习,从深层的深度,从深层的深度学习到深层的深层的深度,从深层的深层的深层的深度,从深度,从深度学习,从深度学习到深层的深层的深层的深层的深层的深层的深层的深层的深层,从深度,从深度,从深度,从深度学习,从深度学习到深层的深度,从深度学习到深层的深层的深度,从深度,从深度,从深度,从深度,从深度,从深度的深度的深度的深度的深度的深度的深度的深度的深度,从深度,从深度的深度的深度的深度的深度的深层GGGGGGGGGPGPA学习到深层学习,从深度,从深度,从深度