Divide-and-conquer strategies for Monte Carlo algorithms are an increasingly popular approach to making Bayesian inference scalable to large data sets. In its simplest form, the data are partitioned across multiple computing cores and a separate Markov chain Monte Carlo algorithm on each core targets the associated partial posterior distribution, which we refer to as a sub-posterior, that is the posterior given only the data from the segment of the partition associated with that core. Divide-and-conquer techniques reduce computational, memory and disk bottle-necks, but make it difficult to recombine the sub-posterior samples. We propose SwISS: Sub-posteriors with Inflation, Scaling and Shifting; a new approach for recombining the sub-posterior samples which is simple to apply, scales to high-dimensional parameter spaces and accurately approximates the original posterior distribution through affine transformations of the sub-posterior samples. We prove that our transformation is asymptotically optimal across a natural set of affine transformations and illustrate the efficacy of SwISS against competing algorithms on synthetic and real-world data sets.
翻译:Monte Carlo算法的分化策略是一种越来越受欢迎的方法,使Bayesian推论可向大型数据集伸缩。以最简单的形式,数据被分解成多个计算核心,每个核心目标的单独的Markov连锁Monte Carlo算法是相关的部分后部分布,我们称之为子部,即后部,只是从与该核心相关的分区部分数据得到的后部。分化和制解技术减少了计算、内存和磁盘瓶颈,但使得难以对子部样品进行重新测量。我们提议Swiss:通货膨胀、缩放和移动的子部;一种对子部位样品进行重组的新办法,该办法简单易应用,对高维参数空间进行尺度,并通过子部样品的直线转换准确地接近原始后部分布。我们证明,我们的转换在一系列自然直系变形变形中是同样最佳的,并表明SwISIS相对于合成和现实世界数据组上相互竞争的算法的功效。