How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.
翻译:我们如何比较完全与观察相一致的假设? 边缘可能性(aka Bayesian evidence)代表着从先前的观察中得出我们观察的概率,它为这个基础问题提供了一种独特的方法,自动编码Occam的剃刀。虽然人们已经注意到,这种边际可能性可以过大,并且对先前的假设十分敏感,但是它对于超光谱学习和离散模型比较的局限性还没有彻底调查。我们首先重新审视学习限制和假设测试的边际可能性的吸引力。我们然后强调在使用边际可能性作为一般化的代言方面的概念和实际问题。 也就是说,我们表明边际可能性与一般化有多么消极的关系,对神经结构的搜索有影响,并可能导致超光谱度学习的不完善和过度。 我们通过一种有条件的边际可能性提供了部分的补救,我们所展示的这种可能性更符合一般化,对大型超光谱度学习具有实际价值,例如深内核学习。