Markov Chain Monte Carlo (MCMC) algorithms do not scale well for large datasets leading to difficulties in Neural Network posterior sampling. In this paper, we apply a generalization of the Metropolis Hastings algorithm that allows us to restrict the evaluation of the likelihood to small mini-batches in a Bayesian inference context. Since it requires the computation of a so-called "noise penalty" determined by the variance of the training loss function over the mini-batches, we refer to this data subsampling strategy as Penalty Bayesian Neural Networks - PBNNs. Its implementation on top of MCMC is straightforward, as the variance of the loss function merely reduces the acceptance probability. Comparing to other samplers, we empirically show that PBNN achieves good predictive performance for a given mini-batch size. Varying the size of the mini-batches enables a natural calibration of the predictive distribution and provides an inbuilt protection against overfitting. We expect PBNN to be particularly suited for cases when data sets are distributed across multiple decentralized devices as typical in federated learning.
翻译:Markov链条蒙特卡洛( MCMC) 算法对于导致神经网络后部取样困难的大型数据集来说规模并不大。 在本文中, 我们应用了大都会黑斯廷斯算法的概括化法, 使我们能够限制在巴伊西亚推论背景下对小型小插管可能性的评估。 由于它要求计算由小型篮子培训损失功能差异所决定的所谓“ 噪音罚则 ”, 我们用惩罚贝伊西亚神经网络- PBNNS 来形容这个数据子抽样战略。 它在 MMC 上方的应用是直截了当的, 因为损失函数的差异仅仅降低了接受概率。 我们与其他取样者相比, 我们实验性地表明 PBNNN 取得了给定的微型篮子尺寸的良好预测性能。 对微型篮子的大小进行变换, 使得预测性分布能够自然校准, 并提供不过度校准的内保护。 我们期望 PBNNN, 特别适合在数据组分布在多个分散装置上的情况中( 联邦学习的典型做法是 ) 。