The "cold posterior effect" (CPE) in Bayesian deep learning describes the uncomforting observation that the predictive performance of Bayesian neural networks can be significantly improved if the Bayes posterior is artificially sharpened using a temperature parameter T<1. The CPE is problematic in theory and practice and since the effect was identified many researchers have proposed hypotheses to explain the phenomenon. However, despite this intensive research effort the effect remains poorly understood. In this work we provide novel and nuanced evidence relevant to existing explanations for the cold posterior effect, disentangling three hypotheses: 1. The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength. 2. The data augmentation hypothesis of Izmailov et al. (2021) and Fortuin et al. (2021): we show empirically that data augmentation is sufficient but not necessary for the CPE to be present. 3. The bad prior hypothesis of Wenzel et al. (2020): we use a simple experiment evaluating the relative importance of the prior and the likelihood, strongly linking the CPE to the prior. Our results demonstrate how the CPE can arise in isolation from synthetic curation, data augmentation, and bad priors. Cold posteriors observed "in the wild" are therefore unlikely to arise from a single simple cause; as a result, we do not expect a simple "fix" for cold posteriors.
翻译:Bayesian深层学习中的“冷后退效应”(CPE)描述了令人不解的观察,即如果Bayes 后脑网络使用温度参数T < 1 人工变色,Bayes 后脑网络的预测性能可以大大改善。 CPE在理论和实践上存在问题,而且自发现其影响后,许多研究人员提出了解释这一现象的假设。尽管进行了这种密集的研究努力,但效果仍然不甚为人知。在这项工作中,我们提供了与冷后退效应现有解释相关的新颖和细微的证据,使三种假设变得模糊不清:1. Aitchison(202020年)的数据集曲线假设:我们从经验上表明,CPE并不是在实际变焦化数据集中产生的,而是在有控制的实验中产生的。 2. Izmailov 等人(2021年)和Fortin 等人(2021年)的数据放大假设:我们从经验上表明,数据增增量对于C的简单而不必存在。 3.Wenzel et al(2020年)的错误的假设:我们用一个简单的实验结果来证明我们先前和之前的概率, 将C.