The Bayesian treatment of neural networks dictates that a prior distribution is specified over their weight and bias parameters. This poses a challenge because modern neural networks are characterized by a large number of parameters, and the choice of these priors has an uncontrolled effect on the induced functional prior, which is the distribution of the functions obtained by sampling the parameters from their prior distribution. We argue that this is a hugely limiting aspect of Bayesian deep learning, and this work tackles this limitation in a practical and effective way. Our proposal is to reason in terms of functional priors, which are easier to elicit, and to "tune" the priors of neural network parameters in a way that they reflect such functional priors. Gaussian processes offer a rigorous framework to define prior distributions over functions, and we propose a novel and robust framework to match their prior with the functional prior of neural networks based on the minimization of their Wasserstein distance. We provide vast experimental evidence that coupling these priors with scalable Markov chain Monte Carlo sampling offers systematically large performance improvements over alternative choices of priors and state-of-the-art approximate Bayesian deep learning approaches. We consider this work a considerable step in the direction of making the long-standing challenge of carrying out a fully Bayesian treatment of neural networks, including convolutional neural networks, a concrete possibility.
翻译:贝叶斯对神经网络的处理表明,事先的分布取决于其重量和偏差参数。这构成一个挑战,因为现代神经网络的特征是有大量参数,而选择这些前科对诱发性功能前科具有不受控制的影响,即通过取样从先前分布参数获得的功能的分布。我们争辩说,这是贝叶斯深刻学习的一个极为有限的方面,这项工作以实际和有效的方式解决了这一限制。我们的提议是以功能前科为根据,因为功能前科较容易获得,而“调”前科反映这些前科的神经网络参数,从而反映这些前科的功能前科。高斯进程为界定先前功能前科提供了严格的框架,界定先前功能前科的分布提供了严格的框架,我们提出了一个新的和健全的框架,以尽量减少其瓦塞斯特斯坦距离为基础,使其先前的神经网络的运作与前科网络的功能相匹配。我们提供了大量实验证据,证明这些前科与可伸缩的马克夫链蒙卡洛抽样取样为系统提供了大规模的业绩改进改进,取代了前科和前科先科的先科选择,从而反映巴伊斯河网络长期学习方向的一个相当大的挑战。我们认为,认为,认为,这是海湾网络的深层次研究的一个方向。