Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings (MH) correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose a differentiable AIS algorithm by abandoning MH steps, which further unlocks mini-batch computation. We provide a detailed convergence analysis for Bayesian linear regression which goes beyond previous analyses by explicitly accounting for non-perfect transitions. Using this analysis, we prove that our algorithm is consistent in the full-batch setting and provide a sublinear convergence rate. However, we show that the algorithm is inconsistent when mini-batch gradients are used due to a fundamental incompatibility between the goals of last-iterate convergence to the posterior and elimination of the pathwise stochastic error. This result is in stark contrast to our experience with stochastic optimization and stochastic gradient Langevin dynamics, where the effects of gradient noise can be washed out by taking more steps of a smaller size. Our negative result relies crucially on our explicit consideration of convergence to the stationary distribution, and it helps explain the difficulty of developing practically effective AIS-like algorithms that exploit mini-batch gradients.
翻译:Annaal 重要取样( AIS) 和相关算法是极有效的可能性估计工具,但因使用大都会-哈奇斯(MH) 校正步骤而不能完全区别, 差异性是一种可取的属性, 因为它承认有可能优化边际可能性, 作为使用梯度方法的一个目标。 为此, 我们提出一个不同的 AIS 算法, 放弃 MH 步骤, 从而进一步解开小型批量计算 。 我们为Bayesian 线性回归提供了详细的趋同分析, 这超出了我们先前的分析范围, 明确核算非完美过渡。 我们通过这一分析, 证明我们的算法在全批设置中是一致的, 并且提供了一条亚线性趋同率。 但是, 我们表明, 当使用迷性梯度梯度梯度梯度梯度梯度时, 算法是不一致的, 因为它与最后一级趋同点与后端和消除路径偏差误差错误之间的根本不相容。 这与我们以前的分析经验形成鲜明的对比, 明确核算非完美过渡性梯度变的兰氏动态动态, 使用这种算的算法的影响, 我们的噪噪噪噪噪噪噪噪噪噪噪噪噪的噪音效应会影响可以通过更精确的变小的推算法来解释我们 。