Multilevel linear models allow flexible statistical modelling of complex data with different levels of stratification. Identifying the most appropriate model from the large set of possible candidates is a challenging problem. In the Bayesian setting, the standard approach is a comparison of models using the model evidence or the Bayes factor. Explicit expressions for these quantities are available for simple linear models, but in most cases, direct computation is impossible. In practice, Markov Chain Monte Carlo approaches are widely used, such as sequential Monte Carlo, but it is not always clear how well such techniques perform. We present a method for estimation of the log model evidence, by an intermediate marginalisation over non-variance parameters. This reduces the dimensionality of the Monte Carlo sampling algorithm, which in turn yields more consistent estimates. The aim of this paper is to show how this framework fits together and works in practice, particularly on data with hierarchical structure. We illustrate this method on a popular multilevel dataset containing levels of radon in homes in the US state of Minnesota.
翻译:多层次线性模型允许对复杂数据进行灵活的统计建模,并具有不同层次的分层。从大量可能的候选者中确定最合适的模型是一个具有挑战性的问题。在巴伊西亚环境中,标准方法是比较使用模型证据或贝雅系数的模型。这些数量的明确表达方式可以用于简单的线性模型,但在大多数情况下,直接计算是不可能的。实际上,马可夫链子蒙特卡洛(Markov Chain Calle Monte Carlo)方法被广泛使用,例如相继的蒙特卡洛(Monte Carlo)方法,但这种技术的效果并不总是十分清楚。我们提出了一个方法,用一种中间偏向于非变量参数的方法来估计日志模型证据。这减少了蒙特卡洛取样算法的维度,从而得出更一致的估计数。本文的目的是说明这一框架如何配合并在实践中发挥作用,特别是在与等级结构有关的数据方面。我们用一个流行的多层次数据集来说明这种方法,其中包含美国明尼苏达州家庭中的雷达水平。