Bayesian neural networks (BNNs) have become a principal approach to alleviate overconfident predictions in deep learning, but they often suffer from scaling issues due to a large number of distribution parameters. In this paper, we discover that the first layer of a deep network possesses multiple disparate optima when solely retrained. This indicates a large posterior variance when the first layer is altered by a Bayesian layer, which motivates us to design a spatial-temporal-fusion BNN (STF-BNN) for efficiently scaling BNNs to large models: (1) first normally train a neural network from scratch to realize fast training; and (2) the first layer is converted to Bayesian and inferred by employing stochastic variational inference, while other layers are fixed. Compared to vanilla BNNs, our approach can greatly reduce the training time and the number of parameters, which contributes to scale BNNs efficiently. We further provide theoretical guarantees on the generalizability and the capability of mitigating overconfidence of STF-BNN. Comprehensive experiments demonstrate that STF-BNN (1) achieves the state-of-the-art performance on prediction and uncertainty quantification; (2) significantly improves adversarial robustness and privacy preservation; and (3) considerably reduces training time and memory costs.
翻译:Bayesian神经网络(BNNs)已成为减轻深层学习中过度自信预测的主要方法,但由于分布参数众多,这些网络往往面临规模问题。在本文中,我们发现深网络的第一层在仅经过再培训时就拥有多种不同的optima。这表明当第一个层被一个Bayesian层改变时,后层差异很大,这促使我们设计一个空间时空融合BNN(STF-BNN),以便有效地将BNN(STF-BNN)推广到大型模型:(1) 通常首先从零到快速培训神经网络;(2) 第一层转换为Bayesian,通过采用随机变异推论推断得出,而其他层则是固定的。与Vanilla BNNS相比,我们的方法可以大大减少培训时间和参数的数量,从而有助于高效率地扩大BNNNs的规模。我们进一步从理论上保证STF-BNN(S)的可普遍适用性和减轻过度信任的能力。全面实验表明,STF-BNNN(1) 能够实现状态的可靠性和可靠性的可靠度;(3) 大大降低对等度的预测和对等度的可靠性的可靠性的预测;(2) 和降低的可靠性和降低的可靠性的可靠性和降低。