Bayesian modeling has become a staple for researchers to articulate assumptions and develop methods tailored for specific data applications. Thanks to recent developments in approximate posterior inference, researchers can easily build, use, and revise complicated Bayesian models for large and rich data. These new abilities, however, bring into focus the problem of model criticism. Researchers need tools to diagnose the fitness of their models, to understand where they fall short, and to guide their revision. In this paper we develop a new method for Bayesian model criticism, the population predictive check (POP-PC). POP-PCs are built on posterior predictive checks (PPCs), a seminal method that checks a model by assessing the posterior predictive distribution on the observed data. However, PPC use the data twice -- both to calculate the posterior predictive and to evaluate it -- which can lead to overconfident assessments of the quality of a model. POP-PCs, in contrast, compare the posterior predictive distribution to a draw from the population distribution, which in practice is a heldout dataset. We prove this strategy, which blends Bayesian modeling with frequentist assessment, is calibrated, unlike the PPC. Moreover, we demonstrate that calibrating PPC p-values post-hoc does not resolve the "double use of the data" problem. Finally, we study POP-PCs on classical regression and a hierarchical model of text data.
翻译:贝叶斯模型已成为研究人员阐明假设和制定具体数据应用方法的主菜。由于近似后方推断的最新发展,研究人员可以很容易地建立、使用和修改复杂的贝叶斯模型,以获得大量和丰富的数据。然而,这些新的能力使模型批评问题成为了焦点。研究人员需要工具来诊断其模型的适合性,了解其缺陷,并指导其修订。在本文件中,我们为巴伊西亚模型的批评、人口预测检查(POP-PC)开发了一种新的方法。POP-PC建在后方预测检查(PPCs)上,这是一种原始方法,通过评估所观察到的数据的远端预测性分布来检查模型。然而,PPC使用这些数据两次 -- -- 两者都用于计算其模型的预测性,了解其缺陷,并指导其修改。PPPP-PC的后端预测性分布问题与从人口分布中提取的图谱,而在实践中则是搁置的数据集。我们证明这一战略与BePPPC的等级模型相比,最终的校准与我们的校准数据比。