利用分区数据进行贝叶斯推断的重要性抽样方法 (Importance Sampling Methods for Bayesian Inference with Partitioned Data)

from arxiv, Replacement of Figures 11 and 14. The previous version used incorrect values for methods CMC1, CMC2, NDPE and SDPE (an incorrect prior was used in the sampling algorithm). The change has no impact on the conclusions drawn from these figures

This article presents new methodology for sample-based Bayesian inference when data are partitioned and communication between the parts is expensive, as arises by necessity in the context of "big data" or by choice in order to take advantage of computational parallelism. The method, which we call the Laplace enriched multiple importance estimator, uses new multiple importance sampling techniques to approximate posterior expectations using samples drawn independently from the local posterior distributions (those conditioned on isolated parts of the data). We construct Laplace approximations from which additional samples can be drawn relatively quickly and improve the methods in high-dimensional estimation. The methods are "embarrassingly parallel", make no restriction on the sampling algorithm (including MCMC) to use or choice of prior distribution, and do not rely on any assumptions about the posterior such as normality. The performance of the methods is demonstrated and compared against some alternatives in experiments with simulated data.

翻译：本文介绍了在数据被分割而各部分之间的通信费用昂贵的情况下,根据“大数据”或为利用计算平行性而选择的必然情况,对基于抽样的贝叶斯人作出新的推断的方法。我们称之为拉帕特富含多重重要性估测器的方法使用新的多重重要取样技术,使用与当地后方分布(以数据中的孤立部分为条件)分开的样品来估计后方期望值。我们建造了拉贝特近似值,从中可以较快地提取更多的样品,并改进了高维估计方法。这种方法是“混合平行的”方法,没有限制取样算法(包括MCMCMC)使用或选择先前的分布,也不依赖对后方分布的任何假设,例如正常性。在模拟数据的实验中,这些方法的性能得到了演示,并与某些替代方法的实验进行比较。