Bayesian inference has theoretical attractions as a principled framework for reasoning about beliefs. However, the motivations of Bayesian inference which claim it to be the only 'rational' kind of reasoning do not apply in practice. They create a binary split in which all approximate inference is equally 'irrational'. Instead, we should ask ourselves how to define a spectrum of more- and less-rational reasoning that explains why we might prefer one Bayesian approximation to another. I explore approximate inference in Bayesian neural networks and consider the unintended interactions between the probabilistic model, approximating distribution, optimization algorithm, and dataset. The complexity of these interactions highlights the difficulty of any strategy for evaluating Bayesian approximations which focuses entirely on the method, outside the context of specific datasets and decision-problems. For given applications, the expected utility of the approximate posterior can measure inference quality. To assess a model's ability to incorporate different parts of the Bayesian framework we can identify desirable characteristic behaviours of Bayesian reasoning and pick decision-problems that make heavy use of those behaviours. Here, we use continual learning (testing the ability to update sequentially) and active learning (testing the ability to represent credence). But existing continual and active learning set-ups pose challenges that have nothing to do with posterior quality which can distort their ability to evaluate Bayesian approximations. These unrelated challenges can be removed or reduced, allowing better evaluation of approximate inference methods.
翻译:贝叶斯的推论具有理论吸引力,是推理信仰的原则框架。然而,巴伊西亚的推论认为它是唯一“合理”推理的动机在实践中并不适用。这些推论的动机造成二进制分裂,其中所有近似推论都具有同等“合理”性。相反,我们应该自问如何定义一个更多和较少合理推理的范围,从而解释为什么我们可能偏爱一个贝叶斯近比另一个近似。我探讨了拜伊斯的扭曲神经网络中的近似推论,并考虑了巴伊斯人模型、近似分布、优化算法和数据集之间意外的相互作用。这些相互作用的复杂性凸显了评估巴伊斯近似近似近似推理法的任何战略的难度,这种战略完全侧重于具体数据集和决策问题之外的方法。对于特定应用,近似近似近似近似近似近似推理法的预期效用可以测量推理质量。评估一种模型能够纳入贝伊西亚框架不同部分的能力,我们可以确定贝伊斯的正确推理学和选取决定性算法和数据集的精确性行为。我们在这里可以比较地评估贝伊斯的典型的特征行为,但是测试这些持续学习能力,可以用来评估其持续学习和不断学习的正确的行为。