Bayesian 模型比较:学习的口音平均估计值 (Machine learning assisted Bayesian model comparison: learnt harmonic mean estimator)

We resurrect the infamous harmonic mean estimator for computing the marginal likelihood (Bayesian evidence) and solve its problematic large variance. The marginal likelihood is a key component of Bayesian model selection since it is required to evaluate model posterior probabilities; however, its computation is challenging. The original harmonic mean estimator, first proposed in 1994 by Newton and Raftery, involves computing the harmonic mean of the likelihood given samples from the posterior. It was immediately realised that the original estimator can fail catastrophically since its variance can become very large and may not be finite. A number of variants of the harmonic mean estimator have been proposed to address this issue although none have proven fully satisfactory. We present the learnt harmonic mean estimator, a variant of the original estimator that solves its large variance problem. This is achieved by interpreting the harmonic mean estimator as importance sampling and introducing a new target distribution. The new target distribution is learned to approximate the optimal but inaccessible target, while minimising the variance of the resulting estimator. Since the estimator requires samples of the posterior only it is agnostic to the strategy used to generate posterior samples. We validate the estimator on a variety of numerical experiments, including a number of pathological examples where the original harmonic mean estimator fails catastrophically. In all cases our learnt harmonic mean estimator is shown to be highly accurate. The estimator is computationally scalable and can be applied to problems of dimension $\mathcal{O}(10^3)$ and beyond. Code implementing the learnt harmonic mean estimator is made publicly available.

翻译：我们重现了用于计算边际概率( Bayesian 证据) 并解决其问题巨大的差异的无名的调和平均估计值。边际可能性是Bayesian 模型选择的关键组成部分, 因为它需要评估模型的外表概率; 然而, 其计算具有挑战性。最初由 Newton 和 Rafey 首次于1994 和 Newton 和 Rafey 提议的调和平均估计值, 包括计算从远端点采集的样本的可能性的调和平均值。立即意识到原估算值可能灾难性地失败, 因为其差异会变得非常大, 并且可能不会是有限的。调和平均估计值模型中的一些变异性, 虽然没有被证明完全令人满意; 我们展示了经校正值的中间值值, 也就是将测算法的变异性, 也就是将测算结果的精确度的变异性, 也就是将结果的变异性。我们的测算法是用来对数值进行。