How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive to prior assumptions, its limitations for hyperparameter learning and discrete model comparison have not been thoroughly investigated. We first revisit the appealing properties of the marginal likelihood for learning constraints and hypothesis testing. We then highlight the conceptual and practical issues in using the marginal likelihood as a proxy for generalization. Namely, we show how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning. We also re-examine the connection between the marginal likelihood and PAC-Bayes bounds and use this connection to further elucidate the shortcomings of the marginal likelihood for model selection. We provide a partial remedy through a conditional marginal likelihood, which we show is more aligned with generalization, and practically valuable for large-scale hyperparameter learning, such as in deep kernel learning.
翻译:我们如何比较与观测完全一致的假设之间的差异?边缘似然(也称为贝叶斯证据),它表示从先验中生成观测结果的概率,提供了一种独特的方法来回答这个基本问题,并自动编码了奥卡姆剃刀。虽然观察到边际似然可能会过度拟合,并且对先验假设敏感,但它在超参数学习和离散模型比较方面的局限性尚未得到充分研究。我们首先重新审视了边际似然学习约束和假设测试的吸引力属性。然后,我们强调使用边缘似然作为泛化代理的概念和实际问题。换句话说,我们展示了边际似然与泛化呈现负相关,这对神经架构搜索产生影响,并且在超参数学习中可能导致欠拟合和过度拟合。我们还重新检查了边际似然和PAC-Bayes界限之间的连接,并利用这种连接进一步阐明边际似然在模型选择方面的局限性。我们通过条件边缘似然提供了部分解决方案,我们展示了它更加符合泛化规律,并在深度内核学习等大规模超参数学习方面具有实际价值。