Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly learn good top-layer representations by alternately sampling the kernel from a distribution over positive semi-definite matrices and performing nonlinear transformations. A particular deep kernel process, the deep Wishart process (DWP), is of particular interest because its prior is equivalent to deep Gaussian process (DGP) priors. However, inference in DWPs has not yet been possible due to the lack of sufficiently flexible distributions over positive semi-definite matrices. Here, we give a novel approach to obtaining flexible distributions over positive semi-definite matrices by generalising the Bartlett decomposition of the Wishart probability density. We use this new distribution to develop an approximate posterior for the DWP that includes dependency across layers. We develop a doubly-stochastic inducing-point inference scheme for the DWP and show experimentally that inference in the DWP gives improved performance over doing inference in a DGP with the equivalent prior.
翻译:最近作为完全以内核为基础的非NNs的替代品而引入的深内核过程(Aitchison等人,2020年);深内核过程通过从正半确定基质分布和进行非线性变换中轮流取样,从正半确定基质分布中取出内核,灵活地学习良好的顶层表现;一个特殊的深内核过程,即深Wishart进程,特别令人感兴趣,因为它以前等同于深高山进程(DGP)的前身;然而,由于在正半确定基质上没有足够灵活的分布,所以仍然无法在DWP中作出推论;在这里,我们提出了一个新办法,通过概括Bartlett对Wirart概率密度的分解,在正半确定基质表上获得灵活的分布;我们利用这一新分配法,为DWP开发一个包括跨层依赖性的近似外表;我们为DWP制定了一个双重的随机导出点计划,并实验性地显示,DWP的推论在前称DGPGP中改进了前置。