Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressive power in function composition. DGPs also offer diverse modeling capabilities, but inference is challenging because marginalization in latent function space is not tractable. With Bochner's theorem, DGP with squared exponential kernel can be viewed as a deep trigonometric network consisting of the random feature layers, sine and cosine activation units, and random weight layers. In the wide limit with a bottleneck, we show that the weight space view yields the same effective covariance functions which were obtained previously in function space. Also, varying the prior distributions over network parameters is equivalent to employing different kernels. As such, DGPs can be translated into the deep bottlenecked trig networks, with which the exact maximum a posteriori estimation can be obtained. Interestingly, the network representation enables the study of DGP's neural tangent kernel, which may also reveal the mean of the intractable predictive distribution. Statistically, unlike the shallow networks, deep networks of finite width have covariance deviating from the limiting kernel, and the inner and outer widths may play different roles in feature learning. Numerical simulations are present to support our findings.
翻译:深海高斯进程(DGP) 是巴伊西亚学习之前的一个模型, 直观地探索了功能构成中的表达力。 DGP还提供不同的建模能力, 但推论却具有挑战性, 因为潜伏功能空间的边缘化是无法移动的。 有了Bochner 的理论, 带有正方形指数内核的DGP可以被看成是一个深三角网络, 由随机地貌层、 弦和弦激活器以及随机重力层组成。 在有瓶颈的宽限中, 我们显示, 重量空间视图产生与先前在功能空间中获得的相同的有效变量功能。 另外, 之前对网络参数的不同分布相当于使用不同的内核。 因此, DGPs可以被转化成深处有瓶颈的三角网络, 由此可以获取最精确的外壳估计值。 有趣的是, 网络代表可以研究DGP的神经性内核内核内核, 这也可能揭示了难以预测的分布的平均值。 统计上, 不同于浅端网络,, 深度的外部宽度和内层的网络可以学习。