Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this surprising result, showing how a Bayesian model average can in fact be problematic under covariate shift, particularly in cases where linear dependencies in the input features cause a lack of posterior contraction. We additionally show why the same issue does not affect many approximate inference procedures, or classical maximum a-posteriori (MAP) training. Finally, we propose novel priors that improve the robustness of BNNs to many sources of covariate shift.
翻译:对神经网络的近似贝叶斯推论被认为是标准培训的一种强有力的替代方法,常常在分配数据之外提供良好的表现;然而,通过全尺寸的汉密尔顿·蒙特卡洛全盘盘式汉密尔顿·蒙特卡洛全盘式推论具有高度不纯度近似推论的贝叶斯神经网络在共变换(即使古典估计不尽人意)下取得了不完善的概括化。我们解释了这一令人惊讶的结果,表明贝叶斯模式的平均数在共变换中实际上会遇到问题,特别是在输入特征的线性依赖导致后继体收缩的情况下。我们进一步说明了为什么同一问题不会影响许多近似推论程序,或者传统的最大前瞻(MAP)培训。最后,我们提出了新的前例,可以提高BNs对多种共变换源的稳健性。