Multilevel linear models allow flexible statistical modelling of complex data with different levels of stratification. Identifying the most appropriate model from the large set of possible candidates is a challenging problem. In the Bayesian setting, the standard approach is a comparison of models using the model evidence or the Bayes factor. Explicit expressions for these quantities are available for the simplest linear models with unrealistic priors, but in most cases, direct computation is impossible. In practice, Markov Chain Monte Carlo approaches are widely used, such as sequential Monte Carlo, but it is not always clear how well such techniques perform. We present a method for estimation of the log model evidence, by an intermediate marginalisation over non-variance parameters. This reduces the dimensionality of any Monte Carlo sampling algorithm, which in turn yields more consistent estimates. The aim of this paper is to show how this framework fits together and works in practice, particularly on data with hierarchical structure. We illustrate this method on simulated multilevel data and on a popular dataset containing levels of radon in homes in the US state of Minnesota.
翻译:多层次线性模型可以灵活地对具有不同层次分层的复杂数据进行统计建模。从大量可能的候选者中确定最合适的模型是一个具有挑战性的问题。在巴伊西亚环境中,标准办法是比较使用模型证据或贝雅系数的模型。这些数量的明确表达方式可以用于最简单的线性模型,不切实际的前科,但在大多数情况下,直接计算是不可能的。实际上,Markov Call Call Monte Carlo等方法被广泛使用,但这种技术如何运作并不总是十分清楚。我们提出了一个对日志模型证据进行估计的方法,即对非变量参数进行中间边际化。这减少了任何蒙特卡洛取样算法的维度,这反过来可以得出更加一致的估计数。本文的目的是展示这一框架如何在实际中,特别是在有等级结构的数据方面,如何配合和发挥作用。我们用模拟多层次数据和载有明尼苏达州家庭拉德度的流行数据集来说明这一方法。