可区别的 Annaaled 重要程度抽样和梯度噪音的边缘 (Differentiable Annealed Importance Sampling and the Perils of Gradient Noise)

Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation, but are not fully differentiable due to the use of Metropolis-Hastings correction steps. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective using gradient-based methods. To this end, we propose Differentiable AIS (DAIS), a variant of AIS which ensures differentiability by abandoning the Metropolis-Hastings corrections. As a further advantage, DAIS allows for mini-batch gradients. We provide a detailed convergence analysis for Bayesian linear regression which goes beyond previous analyses by explicitly accounting for the sampler not having reached equilibrium. Using this analysis, we prove that DAIS is consistent in the full-batch setting and provide a sublinear convergence rate. Furthermore, motivated by the problem of learning from large-scale datasets, we study a stochastic variant of DAIS that uses mini-batch gradients. Surprisingly, stochastic DAIS can be arbitrarily bad due to a fundamental incompatibility between the goals of last-iterate convergence to the posterior and elimination of the accumulated stochastic error. This is in stark contrast with other settings such as gradient-based optimization and Langevin dynamics, where the effect of gradient noise can be washed out by taking smaller steps. This indicates that annealing-based marginal likelihood estimation with stochastic gradients may require new ideas.

翻译：Annaal 重要取样( AIS) 和相关算法是极有效的可能性估计工具,但因使用大都会-Hasting 校正步骤而不能完全区别,但因使用大都会-Hasting 校正步骤而不能完全区别。差异性是一种可取的属性,因为它承认有可能优化边际可能性,以此作为使用梯度方法的一个目标。为此,我们提议了不同的 AIS (DAIS),这是AIS 的一种变式,它通过放弃大都会-Hasting 校正确保差异性。作为进一步的好处,DAIS 允许使用小型批次梯度。我们为Bayesian 线性回归提供了详细的趋同性分析,它超越了先前的分析,明确核算了取样器没有达到平衡。我们通过这一分析,我们证明DAIS 在全批设置中具有一致性,提供亚线性趋异性趋同率率率率。此外,由于从大型数据集学习的问题,我们研究DAIS 的随机性变异性变异性变量,它可以使用小型的梯度估计值。令人怀疑性DAIS 可以任意地坏坏坏坏,因为在最后的递增的变平级定位中, 的变异性定位中, 的渐变的渐变为的渐变的渐渐渐渐渐渐变。