While Bayesian neural networks (BNNs) provide a sound and principled alternative to standard neural networks, an artificial sharpening of the posterior usually needs to be applied to reach comparable performance. This is in stark contrast to theory, dictating that given an adequate prior and a well-specified model, the untempered Bayesian posterior should achieve optimal performance. Despite the community's extensive efforts, the observed gains in performance still remain disputed with several plausible causes pointing at its origin. While data augmentation has been empirically recognized as one of the main drivers of this effect, a theoretical account of its role, on the other hand, is largely missing. In this work we identify two interlaced factors concurrently influencing the strength of the cold posterior effect, namely the correlated nature of augmentations and the degree of invariance of the employed model to such transformations. By theoretically analyzing simplified settings, we prove that tempering implicitly reduces the misspecification arising from modeling augmentations as i.i.d. data. The temperature mimics the role of the effective sample size, reflecting the gain in information provided by the augmentations. We corroborate our theoretical findings with extensive empirical evaluations, scaling to realistic BNNs. By relying on the framework of group convolutions, we experiment with models of varying inherent degree of invariance, confirming its hypothesized relationship with the optimal temperature.
翻译:虽然巴耶斯神经网络(BNNs)为标准神经网络提供了一个合理和有原则的替代标准神经网络(BNNs),但人为地提高后台效应通常需要加以应用,才能达到可比较的性能。这与理论形成鲜明对比,因为理论表明,在具备适当的先期和明确模型的情况下,未受刺激的巴耶斯后台应取得最佳绩效。尽管社区做出了广泛努力,但观察到的绩效收益仍然存在争议,从源头看,有若干可信的原因。虽然从经验上承认数据增强是这一效应的主要驱动因素之一,但对其作用的理论描述却基本上缺失了。在这项工作中,我们找出了两个同时影响冷后台效应强度的相互交叉因素,即增强的关联性以及采用的模式与这种转变的差别程度。我们从理论上分析简化的环境,证明调和隐含地减少了模型增强的误差,如.d.数据。温度模拟有效样本规模的作用,反映了在现实化中所获得的信息,反映了在升级过程中获得的收获。我们通过不断的实验,我们用模型来验证我们的理论结论,我们通过不断修正的变化的模型,我们通过不断调整的变换的模型,我们通过不断的变换的模型的实验,将模型的模型的实验的实验的实验结果的实验结论。