We consider the optimal approximate posterior over the top-layer weights in a Bayesian neural network for regression, and show that it exhibits strong dependencies on the lower-layer weights. We adapt this result to develop a correlated approximate posterior over the weights at all layers in a Bayesian neural network. We extend this approach to deep Gaussian processes, unifying inference in the two model classes. Our approximate posterior uses learned "global" inducing points, which are defined only at the input layer and propagated through the network to obtain inducing inputs at subsequent layers. By contrast, standard, "local", inducing point methods from the deep Gaussian process literature optimise a separate set of inducing inputs at every layer, and thus do not model correlations across layers. Our method gives state-of-the-art performance for a variational Bayesian method, without data augmentation or tempering, on CIFAR-10 of 86.7%, which is comparable to SGMCMC without tempering but with data augmentation (88% in Wenzel et al. 2020).
翻译:我们考虑贝叶西亚神经网络顶层重量的最佳近似近似后部,以进行回归,并表明其下层重量有很强的依附性。我们调整这一结果,以开发一个比贝叶斯神经网络所有层重量相近的近似后部。我们将这一方法推广到深高斯进程,在两个模型类别中统一推论。我们近似后部使用仅由输入层界定并通过网络传播的“全球”诱导点,以获取下层的引导。相比之下,标准、“本地”引点方法在深高斯进程文献中各层优化一套单独的引导输入方法,因此不做跨层的模型。我们的方法为不同贝叶斯方法提供了最新的艺术性能,而没有数据增强或调节,在86.7%的CIFAR-10上,该方法与SGMC相比,没有调节,但与数据增强值相当(Wenzel等人,2020年,88%)。