Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressive power in function composition. DGPs also offer diverse modeling capabilities, but inference becomes Achilles' heel as marginalization in latent function space is not tractable. With Bochner's theorem, DGP with squared exponential kernel can be viewed as a deep trigonometric network consisting of the random feature layers, sine and cosine activation units, and random weight layers. In the wide limit with a bottleneck, we show that the weight space view yield the same effective covariance functions which were obtained previously in function space. As such, DGPs can be translated into the deep trig networks, which is flexible and expressive as one can freely adopt different prior distributions over the parameters.Interestingly, the network representation enables the study of DGP's neural tangent kernel, which may reveal the mean of the intractable predictive distribution. Statistically, unlike the shallow networks, deep networks of finite width have covariance deviating from the limiting kernel, and the inner and outer widths may play different roles in learning.
翻译:深海高斯进程( DGP) 是巴伊西亚学习之前的一个模型, 直观地探索了功能构成中的表达力。 DGP 也提供了不同的模型能力, 但推论变成了 Achilles 的脚跟, 因为潜伏功能空间的边缘化是无法移动的。 随着Bochner 的理论, 带有正方形指数内核的DGP可以被视为由随机地物层、 弦和共弦激活器以及随机重力层组成的深三角测量网络。 在宽限中, 有瓶颈, 我们显示加权空间视图产生与先前在功能空间中获取的相同的有效共变函数。 因此, DGP可以被转化成深三角网络, 因为它可以灵活和表达, 因为人们可以自由地对参数进行不同的先前分布。 网络代表可以让 DGP的神经内核内核内核的内核内核内核内核内核内核内核内核内核内核内核内核内核可能揭示出难以预测分布的平均值。 在统计上,, 与浅端网络不同,,, 深度的内核内核内核内核的内核内核内核内核内核内核内核内核内核内核内核内核内核之间具有不同, 。