Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly learn good top-layer representations by alternately sampling the kernel from a distribution over positive semi-definite matrices and performing nonlinear transformations. A particular deep kernel process, the deep Wishart process (DWP), is of particular interest because its prior can be made equivalent to deep Gaussian process (DGP) priors for kernels that can be expressed entirely in terms of Gram matrices. However, inference in DWPs has not yet been possible due to the lack of sufficiently flexible distributions over positive semi-definite matrices. Here, we give a novel approach to obtaining flexible distributions over positive semi-definite matrices by generalising the Bartlett decomposition of the Wishart probability density. We use this new distribution to develop an approximate posterior for the DWP that includes dependency across layers. We develop a doubly-stochastic inducing-point inference scheme for the DWP and show experimentally that inference in the DWP can improve performance over doing inference in a DGP with the equivalent prior.
翻译:最近作为完全以内核为主的替代NNs的深内核进程(Aitchison等人,2020年);深内核进程通过从正半确定基质和进行非线性变换的分布中轮流取样取出内核,灵活学习良好的顶层表现;一个特殊的深内核进程,即深Wishart进程(DWP),特别令人感兴趣,因为其先前的深度内核进程可以等同于深海高斯进程(DGP)的前置内核,该前置可以完全以Gram基质表示;然而,由于在正半确定基质矩阵上没有足够灵活的分布,所以在DWP中尚无法推断出良好的上层表现;在这里,我们提出了一个新颖的办法,即通过对Wishart概率密度的Bartlett分解,在正半确定内核质矩阵上取得灵活的分布。我们利用这种新分布来为DWP开发一个包括跨层依赖性的近似后序号。我们为DWP制定了一个双重的推导引点,因为DWP在等值中缺乏足够的实验性能。