Bayesian神经网络忽略数据 (Wide Mean-Field Bayesian Neural Networks Ignore the Data)

Bayesian neural networks (BNNs) combine the expressive power of deep learning with the advantages of Bayesian formalism. In recent years, the analysis of wide, deep BNNs has provided theoretical insight into their priors and posteriors. However, we have no analogous insight into their posteriors under approximate inference. In this work, we show that mean-field variational inference entirely fails to model the data when the network width is large and the activation function is odd. Specifically, for fully-connected BNNs with odd activation functions and a homoscedastic Gaussian likelihood, we show that the optimal mean-field variational posterior predictive (i.e., function space) distribution converges to the prior predictive distribution as the width tends to infinity. We generalize aspects of this result to other likelihoods. Our theoretical results are suggestive of underfitting behavior previously observered in BNNs. While our convergence bounds are non-asymptotic and constants in our analysis can be computed, they are currently too loose to be applicable in standard training regimes. Finally, we show that the optimal approximate posterior need not tend to the prior if the activation function is not odd, showing that our statements cannot be generalized arbitrarily.

翻译：Bayesian 神经网络(BNNs) 将深度学习的显性力量与Bayesian正式主义的优势结合起来。近年来,对广而深的BNNs的分析提供了对其前科和后部的理论洞察。但是,我们没有类似的洞察力,根据大致的推断,对后部没有类似的洞察力。在这项工作中,我们显示,当网络宽度大而激活功能奇特时,平均场变异推论完全无法模拟数据。具体地说,对于完全连接的BNNs和奇特激活功能以及同质高斯的可能性,我们对广泛而深的BNNS的分析表明,最佳的中位变异场后后预测(即功能空间)分布与先前的预测分布相交汇,因为宽度往往不精确。我们将这一结果的各个方面归纳为其他可能性。我们的理论结果显示,在网络宽度和激活功能上,我们的趋同线是非被动和常态的,但目前却太松,无法在标准的培训制度中适用。最后,我们无法显示最优的模拟的模拟状态是无法显示我们最普通化的状态。

相关内容

激活函数

关注 44

在人工神经网络中，给定一个输入或一组输入，节点的激活函数定义该节点的输出。一个标准集成电路可以看作是一个由激活函数组成的数字网络，根据输入的不同，激活函数可以是开(1)或关(0)。这类似于神经网络中的线性感知器的行为。然而，只有非线性激活函数允许这样的网络只使用少量的节点来计算重要问题，并且这样的激活函数被称为非线性。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日