We define deep kernel processes in which positive definite Gram matrices are progressively transformed by nonlinear kernel functions and by sampling from (inverse) Wishart distributions. Remarkably, we find that deep Gaussian processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite BNNs with bottlenecks can all be written as deep kernel processes. For DGPs the equivalence arises because the Gram matrix formed by the inner product of features is Wishart distributed, and as we show, standard isotropic kernels can be written entirely in terms of this Gram matrix -- we do not need knowledge of the underlying features. We define a tractable deep kernel process, the deep inverse Wishart process, and give a doubly-stochastic inducing-point variational inference scheme that operates on the Gram matrices, not on the features, as in DGPs. We show that the deep inverse Wishart process gives superior performance to DGPs and infinite BNNs on standard fully-connected baselines.
翻译:我们定义了深内核过程,在这些过程中,肯定的Gram基质通过非线性内核功能和(反)Wishart分布的抽样逐步转变。值得注意的是,我们发现深高斯过程(DGPs)、巴耶斯神经网络(BNNs)、无限的BNNs和有瓶颈的无限BNNS都可以写成深内核过程。对于DGPs来说,等值的产生是因为由地物的内产物所形成的Gram基质是Wishart的分布,而且正如我们所显示的那样,标准的异内核内核内核可以完全用这个Gram基质来写 -- -- 我们不需要了解这些基本特征。我们定义了一个可伸缩的深内核过程,即深深的Windart过程,并给出了在Gram基质基质上运行的、而不是在DGPS中运行的双重感测导点变推导法计划。我们表明,深反Wrist进程使DGP和无限的完全连接基准上的无限BNNS。