Bayesian Neural Networks (BNNs) offer a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferencing a posterior distribution based on observed data. The model sampled from the posterior distribution can be used for providing ensemble predictions and quantifying prediction uncertainty. It is well-known that deep learning models with a lower sharpness have a better generalization ability. Nonetheless, existing posterior inferences are not aware of sharpness/flatness, hence possibly leading to high sharpness for the models sampled from it. In this paper, we develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior. Specifically, the models sampled from our sharpness-aware posterior and the optimal approximate posterior estimating this sharpness-aware posterior have a better flatness, hence possibly possessing a higher generalization ability. We conduct experiments by leveraging the sharpness-aware posterior with the state-of-the-art Bayesian Neural Networks, showing that the flat-seeking counterparts outperform their baselines in all metrics of interest.
翻译:对深层学习模型来说,BNNS(BNNs)提供一种概率解释,先对模型参数进行分布,然后根据观察到的数据推断后方分布。从后方分布中取样的模型可用于提供共同预测和量化预测不确定性。众所周知,低锐度的深层学习模型具有更好的概括性能力。尽管如此,现有的后方推论并不了解锐度/增缩性,从而可能导致从中取样的模型的高度清晰度。在本文中,我们开发了理论、贝叶斯设置和锐度-觉悟远端网络的变推法方法。具体地说,从我们的锐度-敏度-觉后方分布中取样的模型和估计这种锐度-敏度-觉的远端图像的最佳近似近似的远端模型具有更好的统一性,因此可能具有更高的概括性能力。我们通过利用直度-觉后方的后方和状态-Bayesian神经网络的基线来进行实验。我们开发了精确度-觉测深度-辨识度-觉测深度后方网络的理论和变推法方法。具体地说,从我们的锐度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-敏度-光度-光度-光度网络网格网络中显示所有定的模型的模型的模型中显示所有定的模型的模型的模型的模型的模型的模型显示出其所有的基线的模型的模型中显示其所有定式模型的底表。