Sampling from complex target distributions is a challenging task fundamental to Bayesian inference. Parallel tempering (PT) addresses this problem by constructing a Markov chain on the expanded state space of a sequence of distributions interpolating between the posterior distribution and a fixed reference distribution, which is typically chosen to be the prior. However, in the typical case where the prior and posterior are nearly mutually singular, PT methods are computationally prohibitive. In this work we address this challenge by constructing a generalized annealing path connecting the posterior to an adaptively tuned variational reference. The reference distribution is tuned to minimize the forward (inclusive) KL divergence to the posterior distribution using a simple, gradient-free moment-matching procedure. We show that our adaptive procedure converges to the forward KL minimizer, and that the forward KL divergence serves as a good proxy to a previously developed measure of PT performance. We also show that in the large-data limit in typical Bayesian models, the proposed method improves in performance, while traditional PT deteriorates arbitrarily. Finally, we introduce PT with two references -- one fixed, one variational -- with a novel split annealing path that ensures stable variational reference adaptation. The paper concludes with experiments that demonstrate the large empirical gains achieved by our method in a wide range of realistic Bayesian inference scenarios.
翻译:从复杂的目标分布中取样是巴耶斯人推算的一项根本挑战性任务。 平行调试( PT) 解决了这一问题, 在后端分布和固定参考分布之间的分解序列的扩展状态空间上建造了Markov 链条, 将后端分布和固定参考分布之间的分解序列( 通常选择前端分布和后端分布为前端分布为前端分布) 。 但是, 在前端和后端几乎是相互异的典型情况下, PT 方法在计算上令人望而却步。 在这项工作中, 我们通过建立一条将后部后端与适应适应适应性适应性调整参考连接起来的普遍一条通路来应对这一挑战。 参考分布的调整是为了利用简单、 无梯度的瞬间配比程序, 最大限度地减少后端分布在后端分布上的远端分布( 包括) KL 。 我们显示我们的适应性程序与前端分配程序相趋一致, 而前端 KL 偏差作为此前开发的PT 性能衡量标准的良好代号。 我们还表明, 在典型Bayesian模式的大数据限制中, 和传统的PTTP 不断的参照路径中, 我们引入了两种参考方法, 一种固定的分变。